Fix fn:not() predicate crashes on empty context and atomic sequences#6083
Closed
joewiz wants to merge 3 commits intoeXist-db:developfrom
Closed
Fix fn:not() predicate crashes on empty context and atomic sequences#6083joewiz wants to merge 3 commits intoeXist-db:developfrom
joewiz wants to merge 3 commits intoeXist-db:developfrom
Conversation
c996bd5 to
e5a0638
Compare
bcfa6fd to
aced6d7
Compare
…Xist-db#2159) When fn:not() is used inside a predicate (e.g. $doc/*[not(self::abc)]) and the context sequence is empty, the set-difference optimization in eval() case 1 fell through to evalBoolean(), which returns a BooleanValue. However, Predicate.selectByNodeSet() expects a node set and throws "cannot convert xs:boolean('true') to a node set". The fix returns EMPTY_SEQUENCE when inside a predicate with an empty context — filtering an empty set always yields an empty set regardless of the predicate. Outside predicates (e.g. standalone not(())), the boolean evaluation path is preserved. Tests cover: - not(self::x) on empty derived path (the error scenario) - not(self::x) on non-empty path (correct filtering) - not(*) on empty path (empty result) - Standalone not(()) (boolean path unaffected by the fix) - not(child) on persistent nodes (set-difference optimization works) - not(@type = 'a') on persistent nodes (general predicate path) Closes eXist-db#2159 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
(true(), false())[not(.)] throws err:FORG0006 instead of returning (false()). The root cause is that LocationStep.getDependencies() suppresses CONTEXT_ITEM dependency for the self axis when inPredicate=true (to enable the set-difference optimization). This causes Predicate.recomputeExecutionMode() to pre-evaluate fn:not(.) against the full sequence instead of item-by-item. The fix adds CONTEXT_ITEM to FunNot.getDependencies() when the argument is "." (self::node() LocationStep with node() type test) and the function is inside a predicate. This is targeted and narrow: it only affects the context item expression ".", not typed self-axis steps like self::element, preserving the set-difference optimization for all node-set predicates (not(child), not(@attr), not(descendant::x), not(self::element)). Tests cover fn:not(.) on integers, strings, booleans, and nodes. Includes FunNotBenchmark.java (100 warmup + 500 measured iterations, 5 query patterns on 200-item XML) to verify the set-difference optimization is preserved with no regression vs develop. Closes eXist-db#2308 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The expression @*[name() ! contains(., 'DateTime')] on persistent (stored) documents throws "Type error: the sequence cannot be converted into a node set. Item type is xs:boolean". This is also fixed by the FunNot.getDependencies() change in this PR. Tests cover: - The exact error pattern from the bug report (persistent doc) - Workaround patterns that already worked: contains(name(.), ...) and @* ! name()[contains(., ...)] - The same pattern on in-memory nodes (already worked, regression guard) Closes eXist-db#3289 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
aced6d7 to
fd02482
Compare
2 tasks
Open
3 tasks
Member
Author
|
[This comment was co-authored with Claude Code. -Joe] Closing — superseded by #6207 (v2/xq31-compliance-fixes). This work has been consolidated into a clean v2/ branch as part of the eXist-db 7.0 PR reorganization. The new PR includes all commits from this PR plus additional related work, with reviewer feedback incorporated where applicable. See the reviewer guide for the full context. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two long-standing bugs in
fn:not()that cause unexpected errors when used in predicates:$doc/*[not(self::abc)]throws "cannot convert xs:boolean('true') to a node set" when$docresolves to empty(true(), false())[not(.)]throwserr:FORG0006instead of returning(false())Both stem from optimizations in
FunNotthat assume node-set contexts but encounter empty sets or atomic values. These bugs have been open since 2018.The
FunNot.getDependencies()fix for #2308 also fixes #3289:@*[name() ! contains(., 'DateTime')]throws "the sequence cannot be converted into a node set. Item type is xs:boolean" on persistent (stored) documents. The fix ensuresnot(.)reportsCONTEXT_ITEMdependency inside predicates, which preventsPredicate.recomputeExecutionMode()from pre-evaluatingnot(.)against the full context sequence.Approach
Per @wolfgangmm's concern: "eXist handles fn:not in a particular way by trying to evaluate it as a set operation. A naive fix would likely result in many expressions becoming slower by an order of magnitude."
These fixes preserve the set-difference optimization for all performance-critical patterns (
[not(child)],[not(@attr)],[not(descendant::x)],[not(self::element)]).Changes
Fix #2159 —
FunNot.eval()empty-context guardWhen
fn:not()is used inside a predicate and the context sequence is empty, the set-difference path returned aBooleanValuewhichPredicate.selectByNodeSet()cannot consume. Fix: returnEMPTY_SEQUENCEwhen in a predicate with empty context (filtering an empty set always yields empty). Outside predicates, the boolean path is preserved. This matches the intent of the original commented-out TODO code.Fix #2308 —
FunNot.getDependencies()targetedCONTEXT_ITEMLocationStep.getDependencies()suppressesCONTEXT_ITEMfor the self axis inside predicates to enable the set-difference optimization. This is correct for node-set predicates but causesnot(.)on atomic sequences to be pre-evaluated against the full sequence instead of item-by-item, throwingFORG0006.Fix:
FunNot.getDependencies()addsCONTEXT_ITEMto the dependency flags when the argument is the context item expression.— specifically, aLocationStepwithSELF_AXISand anode()type test — and the function is inside a predicate. This is targeted and narrow: it only affects.(self::node()), not typed self-axis steps likeself::element, forcingPredicateto use per-item boolean evaluation for that specific case. All other predicates (not(child::bar),not(@attr),not(descendant::x),not(self::element)) are unaffected because their arguments use different axes, different type tests, or are notLocationStepinstances.FunNot.returnsType()andLocationStep.getDependencies()are unchanged from develop.Why the set-difference optimization is preserved
The optimization in
FunNot.eval()runs when the argument returns NODE, context is a persistent set, and argument has noCONTEXT_ITEMdependency. After these changes:[not(child::bar)][not(@attr)][not(descendant::x)][not(self::abc)][not(.)]on atomics[not(.)]on nodesThe promotion path via
Predicate.recomputeExecutionMode()is unchanged.Performance benchmark
FunNotBenchmark.javameasures 5 query patterns (100 warmup + 500 measured iterations each) against a 200-item generated XML document with no indexes (structural index only).Run with:
Results
//item[not(child)]//item[not(@attr)]//item[not(descendant::x)]//item[not(.)]//item[not(@id > 100)]Takeaway: All queries are within noise margin (~2%). The set-difference optimization is fully preserved for child, attribute, and descendant axis patterns. The
not(.)pattern (boolean fallback path) also shows no regression. Environment: macOS, Zulu JDK 21, single-threaded, in-process embedded server.Test plan
fn-not.xqXQSuite tests covering empty paths, persistent node set-difference,not(.)on integers/strings/booleans/nodes%test:pendingfromboolseq:countNegativesContextIteminboolean-sequences.xqsimple-map-predicate.xqtests for Type error: eXist misreads an XPath node sequence with a predicate as xs:boolean [BUG] #3289:@*[name() ! contains(., 'DateTime')]on persistent and in-memory documentsFunNotBenchmark.java): 5 queries × 500 iterations, no regression vs developCoreTestssuite: 992 tests, 0 failures, 0 errorsOptimizerTest: 6 tests, 0 failuresDateTest: 46 tests, 0 failures🤖 Generated with Claude Code