Skip to content

Add recursive descent XQuery parser (opt-in via feature flag)#6220

Draft
joewiz wants to merge 4 commits intoeXist-db:developfrom
joewiz:v2/new-parser
Draft

Add recursive descent XQuery parser (opt-in via feature flag)#6220
joewiz wants to merge 4 commits intoeXist-db:developfrom
joewiz:v2/new-parser

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented Apr 6, 2026

Summary

Adds a recursive descent (rd) parser as an alternative to the ANTLR 2 generated parser. Opt-in via -Dexist.parser=rd (default remains ANTLR 2). Zero impact on existing behavior.

Note: This branch depends on v2/w3c-xquery-update-3.0, v2/xqft-phase2, and v2/xquery-4.0-parser. Must merge after those PRs.

What Changed

  • 7 source files in org.exist.xquery.parser.next (~5,700 lines)
  • 5 test files (335/339 tests pass)
  • Feature flag: System.getProperty("exist.parser", "antlr2") — set to "rd" to activate

Benchmark Results

  • 15-82x faster than ANTLR 2 across all query patterns
  • Within 0.5-1.1x of BaseX (reference XQ4 implementation)
  • 2.6-3.5x slower than REx pure recognizer on complex queries (remarkable given rd builds full Expression trees)
  • Parse-only compliance: 80.6% of QT4 prod-* tests (vs ANTLR 2's 84.9%, BaseX's 83.0%)

Tests

  • Parser tests: 335/339 pass (3 version-gating, 1 missing function registration)
  • exist-core: no regressions (parser defaults to ANTLR 2)

Test plan

  • Default (ANTLR 2) tests unaffected
  • -Dexist.parser=rd activates the rd parser
  • rd parser handles XQuery 3.1, 4.0, XQUF 3.0, XQFT 3.0 syntax (335/339 parser tests)

🤖 Generated with Claude Code

@joewiz joewiz changed the title Add hand-written recursive descent XQuery parser (opt-in via feature flag) Add recursive descent XQuery parser (opt-in via feature flag) Apr 7, 2026
@joewiz joewiz force-pushed the v2/new-parser branch 7 times, most recently from e352bc1 to b53f2ee Compare April 13, 2026 13:26
joewiz and others added 3 commits April 14, 2026 10:23
Add a recursive descent parser as an alternative to the ANTLR 2
generated parser. Enable with -Dexist.parser=rd (default remains antlr2).

The parser supports XQuery 3.1, XQuery 4.0, XQUF 3.0, and XQFT 3.0 in
5,500 lines (6x smaller than the 914KB ANTLR 2 generated code) at 3-5x
faster parse times with zero keyword overhead for XQUF/XQFT.

Parser (exist-core/src/main/java/org/exist/xquery/parser/next/):
- XQueryParser.java: recursive descent, builds Expression tree directly
- XQueryLexer.java: tokenizer with character-level XML scanning
- Token.java, Keywords.java, ParseError.java: supporting infrastructure
- FTExpressions.java, XQUFExpressions.java: stub expression classes
  (to be replaced with real org.exist.xquery.ft/xquf imports)

Feature flag (XQuery.java):
- exist.parser system property: "antlr2" (default) or "rd"
- compileWithNativeParser() bypasses ANTLR 2 pipeline entirely

Tests (247 parser tests + 23 integration tests):
- XQueryParserTest: 170 tests covering expressions, FLWOR, constructors,
  type expressions, XQ4 syntax, XQUF, XQFT
- NativeParserIntegrationTest: 23 tests via eXist's XQuery.execute() path
- XQueryLexerTest: 54 lexer unit tests
- LexerBenchmark, ParserBenchmark: performance validation

Validation: 93.1% pass rate on exist-core's full test suite (3,942 tests)
with -Dexist.parser=rd. Remaining 7% is long-tail edge cases and
infrastructure test failures unrelated to parsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
XQ4 syntax is now only available when the query declares
xquery version "4.0". Default (no declaration) is 3.1 behavior.

Gated features:
- Pipeline operator (->)
- Mapping arrow (=!>)
- Otherwise expression
- Braced if (no else clause)
- Keyword arguments (name := value)
- Focus functions (fn { })
- String templates (``[...]``)
- QName literals (#name)
- Default parameter values
- for member clause
- while clause
- try/finally

Not gated (available in 3.1):
- Arrow operator (=>)
- Simple map (!)
- Arrays, maps, lookups
- Inline functions, function references
- try/catch (without finally)
- XQUF, XQFT, eXist legacy update

Error messages are helpful:
  Pipeline operator '->' requires xquery version "4.0".
  Add 'xquery version "4.0";' to enable XQuery 4.0 features.

Responds to community call feedback (line-o, 2026-03-23).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidates several test improvements and fixes accumulated during
rd parser validation against ANTLR 2 and the XQTS:

- Add regression tests confirming array:get#3 and parse-xml behave
  identically in both parsers (the 32 apparent XQTS regressions were
  build-version differences, not parser bugs)
- Update XQueryParserTest assertions to use real XQUF/FT class names
  (XQUFTransformExpr, FTContainsExpr) instead of stub names; fixes 14 tests
- Add 18 FunctX-pattern integration tests exercising real-world XQuery
  patterns through both parsers; all 18 pass, confirming rd handles
  higher-order functions, FLWOR, typeswitch, namespaces, etc.
- Add grammar-dispatch-audit.py: cross-references 367 EBNF productions
  against 113 rd parse methods, flags Expr/ExprSingle mismatches;
  result is ALL CLEAR after the FLWOR return fix
- Fix XQUF keyword conflicts in test modules: add missing closing brace
  in flwor.xql and rename $copy → $expanded-set in test.xq to avoid
  conflict with the XQUF `copy` keyword

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The rebase onto develop picked up the three orderby-empty-ordering-spec
tests added by PR eXist-db#6073. The v2/new-parser test commit had independently
added the same three tests, causing XQST0034 duplicate function errors.
Remove the duplicates introduced by this branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@duncdrum duncdrum added enhancement new features, suggestions, etc. java Issues or pull requests that change Java code or are related to the JVM labels Apr 14, 2026
@duncdrum duncdrum added this to v7.0.0 Apr 14, 2026
@duncdrum duncdrum added the blocked blocked by a 3rd party label Apr 14, 2026
@duncdrum duncdrum added this to the eXist-7.0.0 milestone Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked blocked by a 3rd party enhancement new features, suggestions, etc. java Issues or pull requests that change Java code or are related to the JVM

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants