Skip to content

Fix duplicate node elimination for function calls in path expressions#6110

Open
joewiz wants to merge 3 commits intoeXist-db:developfrom
joewiz:fix/path-expr-dedup-function-calls
Open

Fix duplicate node elimination for function calls in path expressions#6110
joewiz wants to merge 3 commits intoeXist-db:developfrom
joewiz:fix/path-expr-dedup-function-calls

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented Mar 7, 2026

Summary

  • Fix the path operator / to correctly eliminate duplicate nodes when the right-hand step is a function call or other non-Step expression
  • Per XPath 3.1 §3.3.1.1, / must combine results, eliminate duplicates by node identity, and return in document order — regardless of the expression type of each step
  • Fix NPE in NodeImpl.compareTo when sorting in-memory nodes from independent document contexts (e.g., standalone attribute or document constructors)

Root Cause

In PathExpr.eval(), duplicate elimination was guarded by:

if (steps.size() > 1 && getLastExpression() instanceof Step)
    result.removeDuplicates();

This only triggered when the last expression was a Step (LocationStep, RootNode, SimpleStep). Function calls (Function extends PathExpr), filtered expressions, and other PostfixExpr types were excluded, so paths like $root/*/local:getroot(.) could return duplicate nodes.

This guard was added in 2009 (SF #2880394) to prevent over-aggressive dedup on FLWOR results in PathExprs used as generic expression containers.

Fix

Replace the instanceof Step heuristic with a hasSlash boolean flag that the grammar tree walker sets when it encounters SLASH, DSLASH, ABSOLUTE_SLASH, or ABSOLUTE_DSLASH tokens. This correctly distinguishes genuine path expressions from PathExprs that are merely expression containers.

Additionally:

  1. PathExpr.eval: Guard removeDuplicates() with !result.isEmpty() && Type.subTypeOf(result.getItemType(), Type.NODE) so it is only called when the result actually contains nodes, not atomic values. This prevents unnecessary processing and avoids issues with paths like $x/(...)/3 where the final step returns atomic values.

  2. NodeImpl.compareTo: Handle null document references defensively. DocumentImpl nodes always have document=null (by design — super(expression, null, 0) in the constructor), so comparing two DocumentImpl instances from different constructor contexts would NPE at document.docId. This is a latent bug exposed when removeDuplicates sorts sequences containing nodes from independent constructor documents.

Approach Correctness Performance 2009 regression risk
Old: getLastExpression() instanceof Step Misses function calls O(1) None (too conservative)
New: hasSlash flag + node-type guard Correct by construction O(1) None (flag not set for FLWOR containers)

What Changed

File Change
exist-core/.../xquery/PathExpr.java Add hasSlash flag with setter; replace instanceof Step guard with flag check + node-type guard
exist-core/.../parser/XQueryTree.g Call path.setHasSlash() at SLASH, DSLASH, ABSOLUTE_SLASH, ABSOLUTE_DSLASH
exist-core/.../dom/memtree/NodeImpl.java Fix compareTo to handle null document field (DocumentImpl nodes)
exist-core/.../xquery/PathExprDedupTest.java New test class: 7 tests covering dedup for function calls in paths, preservation of FLWOR duplicates, and K2-Axes-48/49 regression tests

Spec Reference

XPath 3.1 §3.3.1.1 Path operator (/):

If every evaluation of E2 returns a (possibly empty) sequence of nodes, these sequences are combined, and duplicate nodes are eliminated based on node identity. The resulting node sequence is returned in document order.

Related

XQTS Results (W3C XQTS 3.1, prod-AxisStep)

K2-Axes-48 and K2-Axes-49 both pass (previously NPE).

Test Plan

  • New JUnit tests pass: PathExprDedupTest (7 tests, including K2-Axes-48/49)
  • XQueryTest suite: 99 tests, 0 failures, 0 errors
  • XQTS prod-AxisStep: K2-Axes-48 ✅, K2-Axes-49 ✅

🤖 Generated with Claude Code

@joewiz joewiz requested a review from a team as a code owner March 7, 2026 01:26
@line-o
Copy link
Copy Markdown
Member

line-o commented Mar 7, 2026

I think this is a valuable addition to the codebase but the to failures in XQTS are alarming (K2-Axes-48, K2-Axes-49).
They both fail with a null pointer exception even though they are valid queries and quite simple ones, too.

K2-Axes-48

declare variable $myVar := <e/>;
$myVar/(<a/>, <b/>, <?d ?>, <!-- e-->, attribute name {}, document {()})/3

should return 3 3 3 3 3 3

K2-Axes-49

declare variable $myVar := <e/>;
$myVar/(<a/>, <b/>, <?d ?>, <!-- e-->, attribute name {}, document {()})/number()

should return NaN NaN NaN NaN NaN

I do know that exist-db's processor does struggle to with static items as the last path step. But introduction a new NPE is not acceptable.

@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Mar 7, 2026

[This response was co-authored with Claude Code. -Joe]

Thanks for catching this, @line-o — you're absolutely right that introducing an NPE is not acceptable. I've pushed a fix in f342031.

Root cause: The hasSlash-based removeDuplicates() call was being applied to intermediate step results containing in-memory nodes from independent constructor documents. Specifically, document {()} creates a DocumentImpl whose document field is null (by design — super(expression, null, 0) in the constructor). When sortInDocumentOrder() tried to compare these nodes via NodeImpl.compareTo(), it hit document.docId without a null check.

Two fixes:

  1. PathExpr: Guard removeDuplicates() with a node-type check so it skips atomic results entirely (paths ending in /3 or /number() don't need node deduplication)
  2. NodeImpl.compareTo: Handle null document defensively — this was a latent bug that could affect any code path sorting in-memory nodes from different constructor contexts

Both K2-Axes-48 and K2-Axes-49 now pass, along with all existing dedup tests and the full XQueryTest suite.

@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Mar 8, 2026

I can squash if the fix looks good.

Comment thread exist-core/src/test/java/org/exist/xquery/PathExprDedupTest.java
Copy link
Copy Markdown
Contributor

@duncdrum duncdrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall, one comment

@joewiz joewiz force-pushed the fix/path-expr-dedup-function-calls branch from f342031 to 97e4a7c Compare April 13, 2026 13:28
joewiz and others added 2 commits April 13, 2026 15:03
…pressions

Per XPath 3.1 §3.3.1.1, the path operator '/' must eliminate duplicate
nodes by identity and return results in document order. This was not
happening when the last step in a path expression was a function call
(or other non-Step PostfixExpr), because the dedup guard only checked
`getLastExpression() instanceof Step`.

Replace the heuristic with a `hasSlash` flag set by the grammar tree
walker when SLASH, DSLASH, ABSOLUTE_SLASH, or ABSOLUTE_DSLASH tokens
are encountered. This correctly identifies genuine path expressions
without the performance cost of iterating the steps list, and without
risking false positives on PathExprs used as generic expression
containers (e.g., variable declarations followed by FLWOR expressions).

Closes https://github.com/eXist-db/exist/issues/TBD

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The hasSlash-based duplicate elimination introduced in 3b47500
correctly extends dedup to function calls in path expressions,
but triggers a NullPointerException when the path contains
constructed nodes from independent document contexts (e.g.,
standalone attribute or document constructors).

Two fixes:

1. PathExpr.eval: Guard removeDuplicates() with a node-type check
   so it is only called when the result actually contains nodes,
   not atomic values (which cannot have duplicates in the XPath
   sense).

2. NodeImpl.compareTo: Handle null document references defensively.
   DocumentImpl nodes always have document=null (by design in the
   constructor), so comparing two DocumentImpl instances from
   different contexts would NPE at document.docId. This is a
   latent bug exposed by calling removeDuplicates on sequences
   containing nodes from independent constructor documents.

Adds test cases for XQTS K2-Axes-48 and K2-Axes-49 which exercise
path expressions ending with integer literals and number() calls
over sequences of diverse constructed node types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@joewiz joewiz force-pushed the fix/path-expr-dedup-function-calls branch from 97e4a7c to 01085a0 Compare April 13, 2026 19:03
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Apr 19, 2026

[This response was co-authored with Claude Code. -Joe]

Good call, @duncdrum — converted all XQuery test strings to Java 15 text blocks in 0ae5b26.

@line-o line-o self-requested a review April 23, 2026 15:18
@line-o line-o requested review from a team and duncdrum April 23, 2026 15:19
@line-o line-o added this to the eXist-7.0.0 milestone Apr 24, 2026
@line-o line-o added this to v7.0.0 Apr 24, 2026
@line-o line-o moved this to In review in v7.0.0 Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants