Implement W3C XQuery Update Facility 3.0 alongside deprecated legacy syntax#6111
Closed
joewiz wants to merge 7 commits intoeXist-db:developfrom
Closed
Implement W3C XQuery Update Facility 3.0 alongside deprecated legacy syntax#6111joewiz wants to merge 7 commits intoeXist-db:developfrom
joewiz wants to merge 7 commits intoeXist-db:developfrom
Conversation
This was referenced Mar 7, 2026
8f72c61 to
9a998b4
Compare
Add ANTLR 2 grammar rules for the W3C XQuery Update Facility 3.0 (https://www.w3.org/TR/xquery-update-30/) syntax: - InsertExpr: insert node(s) into/as first/as last/before/after - DeleteExpr: delete node(s) - ReplaceExpr: replace (value of) node - RenameExpr: rename node as - TransformExpr: copy $var := expr modify expr return expr - FunctionDecl: updating keyword support XQuery.g changes add lexer tokens and parser rules that produce XQUF-specific AST nodes. XQueryTree.g changes walk those AST nodes to instantiate the corresponding Java expression classes. Mutual exclusion: the tree grammar detects conflicting use of W3C XQUF syntax and eXist-db's legacy update extensions in the same query, reporting XUST0002 at compile time. Ref: W3C XQuery Update Facility 3.0, Sections 2.1-2.6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the static typing rules required by the W3C XQuery Update
Facility 3.0 specification:
XUST0001 — updating expressions in non-updating context:
- Add Expression.isUpdating() method with overrides in all
expression subclasses to propagate the updating flag
- XQuery.execute() checks that the top-level expression is not
updating unless in an updating context
XUST0002 — non-updating expressions in updating context:
- Add Expression.isVacuous() with recursive detection through
TypeswitchExpression, SwitchExpression, SequenceConstructor,
and PathExpr to identify vacuous expressions (empty sequence,
error, ()) that are valid in both contexts
Expression infrastructure:
- XQueryContext: add copy-namespaces preserve/inherit accessors,
xqufEnabled flag, and legacy-vs-XQUF mutual exclusion tracking
- FunctionSignature: add isUpdating flag for declaring updating
functions
- FunctionCall/UserDefinedFunction: propagate updating status
- ErrorCodes: add XUST0001, XUST0002, XUDY0009, XUDY0014-0016,
XUDY0021, XUDY0023-0024, XUDY0027, XUTY0004-0008, XUTY0010,
XUTY0012-0013, XUTY0022, XQTY0153
Ref: W3C XQuery Update Facility 3.0, Sections 2.2, 3.1, 3.2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the core expression classes for the W3C XQuery Update Facility 3.0 (https://www.w3.org/TR/xquery-update-30/): Expression classes (org.exist.xquery.xquf): - XQUFInsertExpr: insert into/as first/as last/before/after with target type validation (XUTY0004-0008) and compatibility checks - XQUFDeleteExpr: delete with node-type validation - XQUFReplaceNodeExpr: replace node with content type constraints (XUTY0010, XUTY0012-0013) - XQUFReplaceValueExpr: replace value of node - XQUFRenameExpr: rename with QName computation and validation - XQUFTransformExpr: copy-modify-return with deep copy that preserves complete document structure (all document-level children), copy-namespaces semantics (preserve/no-preserve, inherit/no-inherit), and namespace scope materialization - XQUFFnPut: fn:put() stub (XQUF Section 2.6) - UpdatePrimitive: typed container for pending update primitives Pending Update List (PUL): - Five-phase application per W3C spec Section 3.3.2: Phase 1: inserts, renames, replaceValue (non-element) Phase 3: replaceNode Phase 4: replaceElementContent Phase 5: deletes (reverse document order) - In-memory application via memtree mutation methods - Persistent (stored) document application via existing eXist-db Modification/transaction infrastructure with broker locking - Conflict detection: XUDY0009 (duplicate targets), XUDY0014-0016 (conflicting inserts/renames/replaces), XUDY0021/0023/0024 (namespace conflict detection using NamePool approach, credit BaseX for the pattern) - Attribute replace: pre-captured original indices processed in descending order to handle rename+replace interactions and array-shift invalidation Ref: W3C XQuery Update Facility 3.0, Sections 2.1-2.6, 3.3 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend the eXist-db in-memory (memtree) document model to support
the mutation operations required by the W3C XQuery Update Facility.
The memtree uses a flat-array architecture where nodes are stored in
parallel arrays (nodeKind[], nodeName[], alpha[], alphaLen[], etc.)
with a next[] chain for sibling navigation.
DocumentImpl mutation methods:
- insertChildren: insert nodes before/after/into elements using
MemTreeBuilder serialization with next[] chain restitching
- insertAttributes: add/replace attributes on elements
- removeNode: soft-delete (nodeKind=-1) with subtree marking and
predecessor-based next[] chain restitching
- removeAttribute: shift attr arrays with alpha[] index fixup
- replaceNode: remove target + insert replacement content
- replaceValue: update text/comment/PI/attribute values in-place
- renameNode/renameAttribute: change element/attribute QNames
- replaceElementContent: clear children + insert text
- mergeAdjacentTextNodes: W3C spec post-update normalization
- compact: full tree rebuild via serialization/deserialization
- copyNodeIntoDocument: namespace-aware deep copy with
scope map for no-inherit materialization and no-preserve
filtering
- stripUnusedNamespacesInSubtree: no-preserve implementation
- findAttribute: QName-based attribute lookup for PUL application
NodeImpl axis navigation fixes:
- selectFollowing: fix document element to find document-level
siblings (comments, PIs) after the element
- selectPreceding/selectFollowing: skip soft-deleted nodes
(nodeKind=-1) left by mergeAdjacentTextNodes
ElementImpl: add setNodeName for in-place rename support.
FunInScopePrefixes: respect inheritNamespaces context for in-memory
nodes (self-only when no-inherit).
Ref: W3C XQuery Update Facility 3.0, Section 3.3 (Update Routines)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive test coverage for the W3C XQuery Update Facility
3.0 implementation:
XQUFBasicTest (94 tests):
- Insert expressions: into, as first, as last, before, after,
with attributes, multiple inserts, insert into empty elements
- Delete expressions: elements, attributes, text, comments, PIs
- Replace node: elements, attributes, swap, text nodes
- Replace value: elements, attributes, text, comments, PIs
- Rename: elements, attributes, with namespace handling
- Copy-modify (transform): basic, nested, multiple variables,
independent copies, persistent document sources
- Static type checking: XUST0001 (updating in non-updating
context), XUST0002 (legacy/XQUF mutual exclusion)
- Dynamic errors: XUDY0009 (duplicate targets), XUDY0015
(duplicate rename), XUDY0027 (target type), XUTY0004-0008
(insert type constraints), XUTY0010/0012-0013 (replace type
constraints)
- Copy-namespaces: preserve/no-preserve, inherit/no-inherit
with namespace propagation tests
- Attribute replacement interactions: swap, rename+replace
- Following/preceding axis after document-level mutations
- Persistent document operations via XMLDB API
XQUFBenchmark: performance comparison of W3C XQUF vs eXist-db
legacy update extensions for insert, delete, replace, and rename.
XQTS QT4 results (non-schema): 684/684 (100%)
XQTS QT4 results (overall): 691/788 (87.7%)
Remaining 97 failures are XML Schema revalidation (out of scope).
Ref: W3C XQuery Update Facility 3.0
Closes eXist-db#3634
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f449db1 to
7a856c4
Compare
Register XQUFFnPut.SIGNATURE in the fn namespace function registry so that fn:put() is recognized as a built-in function. Without this registration, queries using fn:put() fail with XPST0017 (function not defined). fn:put() adds a put primitive to the Pending Update List for deferred document persistence per W3C XQUF Section 2.6. Ref: W3C XQuery Update Facility 3.0, Section 2.6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the applyPersistentPut method in PendingUpdateList, which was previously a TODO stub (LOG.warn only). fn:put now stores the target node as an XML document at the specified URI in the database. Implementation: - Parses target URI to collection path + document name - Gets or creates the target collection - Serializes the node to XML text - Stores via broker.storeDocument() - Error handling: FODC0002 for storage failures, FODC0005 for invalid URI The fn:put PUL primitive was already correctly constructed by XQUFFnPut.java and added to the PUL. Only the persistent application step was missing. XQUFBasicTest: 94/94 pass (no regressions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
Member
Author
|
[This comment was co-authored with Claude Code. -Joe] Closing — superseded by #6214 (v2/w3c-xquery-update-3.0). This work has been consolidated into a clean v2/ branch as part of the eXist-db 7.0 PR reorganization. The new PR includes all commits from this PR plus additional related work, with reviewer feedback incorporated where applicable. See the reviewer guide for the full context. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the W3C XQuery Update Facility 3.0 specification alongside eXist-db's existing proprietary
updatesyntax, which is retained as deprecated. The new standardinsert node,delete node,replace node,replace value of node,rename node, andcopy ... modify ... returnexpressions bring eXist-db in line with other XQuery processors (BaseX, Saxon) and enable in-memory node updates via thecopy-modify(transform) expression.The legacy
update insert/delete/replace/rename/valuesyntax continues to work but is deprecated. Applications should migrate to the W3C standard syntax. See the Migration Guide below.Closes #3634
Closes #628
Supersedes #6109.
What Changed
New package:
org.exist.xquery.xqufPendingUpdateList.javacheckFragmentation()utility (moved from oldModificationclass).UpdatePrimitive.javaXQUFInsertExpr.javainsert node(s) ... (into|as first into|as last into|before|after) ...XQUFDeleteExpr.javadelete node(s) ...XQUFReplaceNodeExpr.javareplace node ... with ...XQUFReplaceValueExpr.javareplace value of node ... with ...XQUFRenameExpr.javarename node ... as ...XQUFTransformExpr.javacopy $var := expr modify expr return expr— deep-copies nodes, creates nested PUL scope, applies updates to copyXQUFFnPut.javafn:put($node, $uri)— adds PUT primitive to PULRetained (deprecated):
org.exist.xquery.updatepackageThe legacy proprietary update implementation classes (
Modification,Insert,Delete,Replace,Rename,Update) are retained and continue to work. ThecheckFragmentation()utility methods were copied toPendingUpdateListfor use by the new XQUF system.Grammar changes
XQuery.gxqufInsertExpr,xqufDeleteExpr,xqufReplaceExpr,xqufRenameExpr,xqufTransformExprproductions alongside the existingupdateExprrule. Both syntaxes are available inexprSingle. Legacy and XQUF sections are clearly delineated with comments and removal instructions.XQueryTree.g%updatingannotation support. Legacy and XQUF sections are clearly delineated with comments and removal instructions.In-memory node mutation support
memtree/DocumentImpl.javainsertBefore,insertAfter,insertInto,insertIntoAsFirst,insertIntoAsLast,removeChild,replaceChild,replaceElementContent,renameNode,deepCopy,compact. The memtree flat-array structure required careful index management for structural modifications.memtree/ElementImpl.javagetFirstChildFor/getNextSiblingForthat skip deleted nodes; addedrenameNodememtree/NodeImpl.javaisDeleted/markDeleted),renameNode,getNextSiblingForStatic type checking (XUST0001/XUST0002)
Expression.javaisUpdating()andisVacuous()methods for W3C updating/vacuous expression classificationConditionalExpression.javaisUpdating()/isVacuous()for if/then/else branchesTypeswitchExpression.javaSwitchExpression.javaPathExpr.javaSequenceConstructor.javaisUpdating()overrides where neededXQueryContext / XQuery integration
XQueryContext.javaPendingUpdateListfield; PUL accumulates during query execution; updatedcheckFragmentationimport toPendingUpdateListXQuery.javaError codes
ErrorCodes.javaTests
XQUFBasicTest.javaXQUFBenchmark.javabindingConflictXQUF.xqmbindingConflict.xqmAll existing legacy update tests (
UpdateInsertTest,UpdateReplaceTest,UpdateValueTest, etc.) are unchanged and continue to pass.Spec Reference
XQTS Results
QT4 XQuery Update test sets (
upd-*), run against the consolidated branch (2026-03-13):All 97 failures are XML Schema revalidation tests (out of scope — eXist-db does not support XSD
revalidation strictorrevalidation lax). The 77 revalidation tests span 4Revalidation*test sets plussetToUntypedandNilUpdates. This matches BaseX's scope, which also only supportsrevalidation skip.Non-schema compliance: 100% — every test that does not require XML Schema validation passes.
Test suite results
Benchmark Results
Run with
mvn test -pl exist-core -Dtest=XQUFBenchmark -Dexist.run.benchmarks=true -Ddependency-check.skip=truePersistent update operations — W3C XQUF syntax
All persistent operations use the PUL deferred execution model: updates accumulate during query evaluation, then are applied atomically in a single transaction at the snapshot boundary.
insert node ... intodelete nodereplace value of nodereplace noderename nodePersistent update operations — Legacy syntax (deprecated)
Legacy operations apply immediately during query evaluation — each update is its own mini-transaction.
update insert ... intoupdate deleteupdate value ... withupdate replace ... withupdate rename ... asComparison notes: XQUF
insert/delete/replace nodeare faster than their legacy equivalents at all sizes, likely due to the PUL batching updates into a single transaction commit. XQUFrenamehas higher base cost (~14ms at N=10) due to namespace handling/NamePool overhead but scales sub-linearly. Legacyrenamescales linearly but starts lower.replace valueis comparable between the two systems.In-memory copy-modify operations (XQUF-only)
These are entirely new — the legacy update syntax does not support in-memory node updates.
Known Limitations
Batch
replaceNodeon many siblings in same document: When replacing hundreds of sibling elements in the same persistent document within a single PUL, stale B-tree node references can occur. TheXQueryUpdateTest.replacetest is@Ignoredwith explanation. This is a deep B-tree storage issue, not a PUL logic issue.XML Schema revalidation: The W3C spec's revalidation mode (
revalidation strict/lax/skip) is not implemented. eXist-db does not currently support XSD-based revalidation after updates.fn:put(): Implemented as a PUL primitive but limited to eXist-db's storage model.Legacy and XQUF syntax cannot be mixed in the same module: Both syntaxes are available, but using both in the same module raises XPST0003 at compile time. The legacy system applies updates immediately during evaluation while XQUF defers them to a snapshot boundary — mixing the two would produce undefined behavior. The mutual exclusion check is enforced during tree walking via
XQueryContext.markLegacyUpdate()/markXQUFUpdate().Credits
The namespace conflict detection approach (XUDY0021/0023/0024 via NamePool) was informed by BaseX's XQUF implementation.
Test Plan
XQUFBasicTestcovering all W3C update expressions + mutual exclusion checksCI Status
All checks green (unit tests, integration tests on macOS/ubuntu/windows, XQTS, container image, license check).
Migration Guide
This guide covers how to migrate XQuery code from eXist-db's deprecated proprietary
updatesyntax to the W3C standard syntax. Both syntaxes currently work, but the legacy syntax will be removed in a future release.Conceptual Changes
Pending Update List (PUL) and Deferred Execution
The most significant architectural difference between the two syntaxes is how updates are executed:
updateexpression executes immediately, modifying the database as it is encountered during query evaluation. This allows freely mixing updates with reads in the same expression.Practical impact: With W3C syntax, you can no longer observe the effect of an update within the same query that performs it. Updates are only visible after the query completes.
Separation of Updating and Non-Updating Expressions (XUST0001)
The W3C spec introduces a strict static type system that classifies expressions as either updating or non-updating. These cannot be mixed in certain contexts:
insert node <x/> into /root, doc('/db/test.xml')is a static error (XUST0001) because it mixes an updating expression (insert) with a non-updating one (doc()).returnclause must be entirely updating or entirely non-updating.<result>{ insert node ... }</result>.Updating Functions
Functions containing W3C updating expressions must be declared with the
%updatingannotation (W3C 3.0 syntax) or theupdatingkeyword (W3C 1.0 backward-compatible syntax):Both forms are equivalent per the W3C XQuery Update Facility 3.0 spec.
Key rules:
declare %updating functionmust actually be an updating expression (or vacuous/empty).declare %updating function local:foo() as item()* { ... }is an error.)Note: Functions using the legacy
updatesyntax do not need the%updatingannotation, since legacy expressions are not classified as "updating" in the W3C sense.Copy-Modify Expressions (NEW)
The W3C spec adds
copy ... modify ... return(also called "transform" expressions) for functional, non-destructive updates on in-memory copies of nodes:This deep-copies
$node, applies updates to the copy, and returns the modified copy. The original node is unchanged. This works on both persistent and in-memory nodes and is the primary way to do "read-and-modify" in a single expression. There is no legacy equivalent.Syntax Migration Reference
update insert <x/> into /rootinsert node <x/> into /rootupdate insert <x/> preceding /nodeinsert node <x/> before /nodeupdate insert <x/> following /nodeinsert node <x/> after /nodeinsert node <x/> as first into /rootinsert node <x/> as last into /rootupdate delete /nodedelete node /nodeupdate replace /node with <x/>replace node /node with <x/>update value /node with 'text'replace value of node /node with 'text'update rename /node as 'newname'rename node /node as 'newname'Insert attributes
Attributes can only be inserted into an element with W3C syntax, not before/after another attribute:
Replace value semantics
W3C
replace value of nodeatomizes the replacement content to a string. If you pass element nodes, they are atomized and joined with spaces:Common Migration Patterns
Pattern 1: Split update + read into separate queries
The most common migration pattern. Legacy code that combined updates with reads in a single expression must be split when using W3C syntax.
Legacy:
W3C — Option A: Separate queries:
W3C — Option B:
copy-modifyfor in-memory result:Pattern 2: Wrap updates in
util:eval()as a last resortutil:eval()is an escape hatch that lets you encapsulate a W3C updating expression and execute it from a non-updating context. It works by running the update in a separate, independent query context — so the W3C static typing rules don't apply across the boundary. This is a last resort when the alternatives (separate queries,copy-modify, or%updatingfunctions) aren't practical.Common situations where
util:eval()is needed:%updatingbecause they need to return assertion values)letbindings where you want to sequence an update before a read within a single FLWOR expressionLegacy:
W3C (using
util:eval()):Note: Since
util:eval()runs the string in a fresh query context, namespace declarations from the outer module are not inherited. You must declare namespaces inline (e.g.,xmlns:mods='...'on element constructors).Pattern 3: Updating functions in triggers/modules
Functions that perform W3C updates must be declared with
%updating. If that's not practical (e.g., XQSuite test functions or trigger callbacks that can't change their signature), useutil:eval().W3C — Option A:
util:eval():W3C — Option B:
%updatingannotation (if you control the function signature):Note:
declare %updating functionmust not declare a return type (XUST0028).Pattern 4: Batch updates in FLWOR expressions
FLWOR expressions that return W3C updating expressions work naturally — each iteration adds primitives to the PUL, and they're all applied atomically at the end:
However, you cannot observe intermediate results within the same query. All replacements happen after the query completes. (With legacy syntax, each iteration's update was visible to subsequent iterations.)
Pattern 5: Multiple updates on the same node
The W3C spec has conflict detection rules. Some operations that the legacy syntax silently allowed are now errors:
replace value of nodeon the same target node in the same PUL.rename nodeon the same target node.replace nodeon the same target node.If your legacy code performed multiple updates on the same node in a loop, you may need to restructure to update each node at most once per query.
Quick Checklist
update insertwithinsert nodeupdate deletewithdelete nodeupdate replacewithreplace node ... withupdate valuewithreplace value of node ... withupdate renamewithrename node ... aspreceding/followingwithbefore/afterbefore/after @attrtointo elementdeclare %updating functionto functions containing updates, or wrap withutil:eval()replace value of nodefor atomization behavior changes