Fix duplicate node elimination for function calls in path expressions#6110
Fix duplicate node elimination for function calls in path expressions#6110joewiz wants to merge 3 commits intoeXist-db:developfrom
Conversation
|
I think this is a valuable addition to the codebase but the to failures in XQTS are alarming ( K2-Axes-48declare variable $myVar := <e/>;
$myVar/(<a/>, <b/>, <?d ?>, <!-- e-->, attribute name {}, document {()})/3should return K2-Axes-49declare variable $myVar := <e/>;
$myVar/(<a/>, <b/>, <?d ?>, <!-- e-->, attribute name {}, document {()})/number()should return I do know that exist-db's processor does struggle to with static items as the last path step. But introduction a new NPE is not acceptable. |
|
[This response was co-authored with Claude Code. -Joe] Thanks for catching this, @line-o — you're absolutely right that introducing an NPE is not acceptable. I've pushed a fix in f342031. Root cause: The Two fixes:
Both K2-Axes-48 and K2-Axes-49 now pass, along with all existing dedup tests and the full XQueryTest suite. |
|
I can squash if the fix looks good. |
duncdrum
left a comment
There was a problem hiding this comment.
looks good overall, one comment
f342031 to
97e4a7c
Compare
…pressions Per XPath 3.1 §3.3.1.1, the path operator '/' must eliminate duplicate nodes by identity and return results in document order. This was not happening when the last step in a path expression was a function call (or other non-Step PostfixExpr), because the dedup guard only checked `getLastExpression() instanceof Step`. Replace the heuristic with a `hasSlash` flag set by the grammar tree walker when SLASH, DSLASH, ABSOLUTE_SLASH, or ABSOLUTE_DSLASH tokens are encountered. This correctly identifies genuine path expressions without the performance cost of iterating the steps list, and without risking false positives on PathExprs used as generic expression containers (e.g., variable declarations followed by FLWOR expressions). Closes https://github.com/eXist-db/exist/issues/TBD Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The hasSlash-based duplicate elimination introduced in 3b47500 correctly extends dedup to function calls in path expressions, but triggers a NullPointerException when the path contains constructed nodes from independent document contexts (e.g., standalone attribute or document constructors). Two fixes: 1. PathExpr.eval: Guard removeDuplicates() with a node-type check so it is only called when the result actually contains nodes, not atomic values (which cannot have duplicates in the XPath sense). 2. NodeImpl.compareTo: Handle null document references defensively. DocumentImpl nodes always have document=null (by design in the constructor), so comparing two DocumentImpl instances from different contexts would NPE at document.docId. This is a latent bug exposed by calling removeDuplicates on sequences containing nodes from independent constructor documents. Adds test cases for XQTS K2-Axes-48 and K2-Axes-49 which exercise path expressions ending with integer literals and number() calls over sequences of diverse constructed node types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
97e4a7c to
01085a0
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
/to correctly eliminate duplicate nodes when the right-hand step is a function call or other non-Stepexpression/must combine results, eliminate duplicates by node identity, and return in document order — regardless of the expression type of each stepNodeImpl.compareTowhen sorting in-memory nodes from independent document contexts (e.g., standalone attribute or document constructors)Root Cause
In
PathExpr.eval(), duplicate elimination was guarded by:This only triggered when the last expression was a
Step(LocationStep, RootNode, SimpleStep). Function calls (Function extends PathExpr), filtered expressions, and other PostfixExpr types were excluded, so paths like$root/*/local:getroot(.)could return duplicate nodes.This guard was added in 2009 (SF #2880394) to prevent over-aggressive dedup on FLWOR results in PathExprs used as generic expression containers.
Fix
Replace the
instanceof Stepheuristic with ahasSlashboolean flag that the grammar tree walker sets when it encountersSLASH,DSLASH,ABSOLUTE_SLASH, orABSOLUTE_DSLASHtokens. This correctly distinguishes genuine path expressions from PathExprs that are merely expression containers.Additionally:
PathExpr.eval: Guard
removeDuplicates()with!result.isEmpty() && Type.subTypeOf(result.getItemType(), Type.NODE)so it is only called when the result actually contains nodes, not atomic values. This prevents unnecessary processing and avoids issues with paths like$x/(...)/3where the final step returns atomic values.NodeImpl.compareTo: Handle null
documentreferences defensively.DocumentImplnodes always havedocument=null(by design —super(expression, null, 0)in the constructor), so comparing twoDocumentImplinstances from different constructor contexts would NPE atdocument.docId. This is a latent bug exposed whenremoveDuplicatessorts sequences containing nodes from independent constructor documents.getLastExpression() instanceof StephasSlashflag + node-type guardWhat Changed
exist-core/.../xquery/PathExpr.javahasSlashflag with setter; replaceinstanceof Stepguard with flag check + node-type guardexist-core/.../parser/XQueryTree.gpath.setHasSlash()atSLASH,DSLASH,ABSOLUTE_SLASH,ABSOLUTE_DSLASHexist-core/.../dom/memtree/NodeImpl.javacompareToto handle nulldocumentfield (DocumentImpl nodes)exist-core/.../xquery/PathExprDedupTest.javaSpec Reference
XPath 3.1 §3.3.1.1 Path operator (/):
Related
//with non-LocationStep expressions — complements this fix; together they fully resolve XQTS K2-Steps-31)XQTS Results (W3C XQTS 3.1, prod-AxisStep)
K2-Axes-48 and K2-Axes-49 both pass (previously NPE).
Test Plan
PathExprDedupTest(7 tests, including K2-Axes-48/49)XQueryTestsuite: 99 tests, 0 failures, 0 errors🤖 Generated with Claude Code