Fix XQuery 3.1 compliance: casting, error codes, fn:not, path dedup, format-date, and more#6207
Open
joewiz wants to merge 21 commits intoeXist-db:developfrom
Open
Fix XQuery 3.1 compliance: casting, error codes, fn:not, path dedup, format-date, and more#6207joewiz wants to merge 21 commits intoeXist-db:developfrom
joewiz wants to merge 21 commits intoeXist-db:developfrom
Conversation
This was referenced Apr 6, 2026
0021f11 to
fd29848
Compare
…d-text, fn:json-doc Resolve relative URIs against file: base URI with direct file: handling. Only allow direct file: access for URIs resolved from relative paths (absolute file: URIs go through SourceFactory security checks). Separate FOJS0001 from FOUT1170 in fn:json-doc. Add iso-8859 → iso-8859-1 charset fallback in fn:unparsed-text. XQTS: misc-HtmlTestSuite 0→1105/1379, misc-JsonTestSuite 0→299/318 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per XQuery spec section 14.2, the xmlns prefix must not be included in the result of fn:in-scope-prefixes(). eXist was including it because collectPrefixes() adds all namespace declarations from the node tree, including the xmlns pseudo-prefix. XQTS: fixes 8 fn-in-scope-prefixes tests that expected xmlns to be absent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…esults Per XQuery spec: "If the text resource has a BOM (byte-order-mark), the BOM is excluded from the result." fn:unparsed-text was returning the BOM character (U+FEFF) at the beginning of the string. Adds stripBOM() helper that removes U+FEFF from the start of the result string. Applied to both unparsed-text and unparsed-text-lines (first line only for lines). XQTS: expected to fix ~5 fn-unparsed-text BOM tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…space collapse gDay/gMonth/gYear comparison: fill in missing year/month/day reference fields in getImplicitCalendar() per XPath spec §10.4. Without these, XMLGregorianCalendar cannot properly normalize timezone offsets for comparison. gDay uses 1972-12, gMonth uses 1972-xx-01, etc. xs:token whitespace collapse: rewrite collapseWhitespace() to properly implement the XML Schema "collapse" facet — replace whitespace chars with space, collapse consecutive spaces, strip leading/trailing spaces. Previously the method only collapsed consecutive whitespace but didn't strip leading/trailing or normalize single tab/newline chars. XQTS: op-gDay-equal +1, op-gMonth-equal 40/40, op-gYear-equal 45/45, xs-token +2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Many XPathExceptions were thrown without error codes, defaulting to exerr:ERROR. The XQTS runner checks for specific error codes, so these all appeared as wrong-error-code failures. Fixes: - DocumentConstructor: attribute in document → XPTY0004 (14 tests) - AttributeConstructor: enclosed expr in namespace → XQST0022 (16 tests) - AtomicValue.toNodeSet/toMemNodeSet → XPTY0019 (21 tests) - AbstractDateTimeValue mult/div → XPTY0004 (2 tests) - DayTimeDurationValue plus type error → XPTY0004 (6 tests) - DynamicAttributeConstructor name expr → XPTY0004 (5 tests) XQTS: estimated +64 tests from error code alignment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CastExpression/CastableExpression: - xs:anySimpleType now throws XPST0080 (was XPST0051), matching xs:anyAtomicType and xs:NOTATION per the XQuery specification FunctionFactory: - Unknown type names in xs: namespace (like xs:name, xs:anyAtomic) now throw XPST0017 (no such function) instead of generic ERROR StringValue: - String-to-restricted-type validation (xs:language, xs:Name, xs:NCName, xs:NMTOKEN, xs:ID, xs:IDREF, xs:ENTITY) now throws FORG0001 instead of generic ERROR Base64BinaryValueType: - Invalid base64 data now throws FORG0001 with proper ErrorCode instead of embedding the code in the message string Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When formatting negative exponents, the minus sign was counted in the string length for padding calculation. For pattern "0.0000e00" with exponent -2: "-2" has length 2, minimumExponentSize 2, padLen=0, so no padding → "-2" instead of "-02". Fix: use Math.abs(exp) for the digit string, pad the absolute value, then prepend the minus sign. Now "-2" becomes "02" → "-02". Fixes: numberformat117, numberformat141, numberformat142 (+3 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tection Two JSON serialization fixes: 1. XDM serialization bypass (XQuerySerializer): When XDM_SERIALIZATION=yes (from fn:serialize or declare option), always use JSONSerializer directly instead of the backwards-compat path that converts single elements via the old JSON writer. The backwards-compat path produces wrong output (null) for element nodes, which should be serialized as their XML string value per the W3C JSON output method spec. 2. Fix allow-duplicate-names logic (JSONSerializer): The STRICT_DUPLICATE_DETECTION flag was inverted — it was enabled when allow-duplicate-names=yes (should be disabled) and disabled when allow-duplicate-names=no (should be enabled). Fix the boolean logic and change default to "no" per W3C spec. Target: +5 method-json tests (null output + duplicate keys) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:json-doc now checks context.getDynamicallyAvailableTextResource() before attempting URL resolution. This allows the XQTS runner to provide JSON test data as dynamically registered text resources (mapped from http://www.w3.org/qt3/json/* URIs to local files). Previously these tests all returned empty sequence because json-doc couldn't resolve the HTTP URIs to local files. XQTS: expected to fix ~34 fn-json-doc wrong-result tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:parse-json($text, ()) should be valid (treating () as no options), but the 2-arg signature required Type.MAP_ITEM with EXACTLY_ONE cardinality. Changed to optParam (ZERO_OR_ONE) so empty sequence is accepted. Investigation summary for remaining failures across 5 function groups: fn:xml-to-json (80 wrong results): FOJS0006 for D-series tests that extract elements from XSLT data files. The element namespace handling in FunXmlToJson needs investigation — complex fix. fn:replace (11 fixable): 4 FORX0003 validation gaps (replacement string $ and \ validation), 4 regex \# and \b escape handling — requires Saxon regex translator changes. fn:parse-json (37 fixable): Surrogate pair handling (\udead → U+FFFD), \r escape, number precision (double vs decimal), key ordering — deep Jackson parsing issues. All remaining fixable failures across these 5 sets total ~128 tests. ~240 of the total failures are XPST0003 parser-blocked (map literals, arrow operators) — will be unlocked by parser-next branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…parison Three XQuery 3.1 compliance fixes: 1. fn:deep-equal: Replace RuntimeException with break in default case for unexpected node types during compareContents(). Comments and PIs are correctly skipped, so encountering them is not an error. XQTS: +2 tests (K2-SeqDeepEqualFunc-20, K2-SeqDeepEqualFunc-22) 2. fn:round: DecimalValue.convertTo(INTEGER) used BigDecimal.longValue() which silently truncates values > Long.MAX_VALUE. Now uses toBigInteger() with IntegerValue's BigInteger constructor. XQTS: +7 tests (fn-round with 23-digit integers) 3. Collations.compare: Unicode codepoint comparison for supplementary characters. String.compareTo() compares UTF-16 code units, giving wrong ordering for characters above U+FFFF (encoded as surrogate pairs). Now compares actual Unicode codepoints. Spec: W3C XQuery 3.1 §7.3.1 (Unicode Codepoint Collation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per XPath F&O 3.1 Section 19, casting between types that have no valid conversion path (e.g., xs:time to xs:date, xs:anyURI to xs:hexBinary) should raise XPTY0004, not FORG0001. FORG0001 is reserved for when the cast IS allowed but the specific value is invalid. Add Type.isCastable() implementing the XQuery 3.1 casting table to pre-validate cast operations before attempting them. CastExpression now checks castability first and raises XPTY0004 for impossible casts. CastableExpression returns false immediately for impossible casts. Fixes ~580 XQTS 3.1 test failures in prod-CastExpr and related sets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:min and fn:max threw FORG0006 when given mixed numeric types (e.g., xs:integer and xs:double) because getCommonSuperType() returned ANY_ATOMIC_TYPE for cross-numeric-family types. Per XQuery 3.1, mixed numeric types should be promoted to a common type before comparison. Add a check: if both types are members of the xs:numeric union, allow the comparison to proceed to the existing numeric promotion code. Fixes ~60 XQTS 3.1 test failures in fn-min and fn-max. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When casting xs:double NaN, INF, or -INF to xs:integer or xs:decimal, eXist incorrectly raised FORG0001. Per XPath F&O 3.1 Section 4.1.16, FOCA0002 should be raised when the cast value is outside the target type's value space (NaN and infinities have no integer/decimal representation). Fixes ~44 XQTS 3.1 test failures in prod-CastExpr. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
xs:boolean cast to integer subtypes like xs:nonPositiveInteger or xs:negativeInteger hit the default case and threw an incorrect error. Now routes through IntegerValue conversion which properly validates the value against the subtype's range (e.g., true=1 is invalid for xs:nonPositiveInteger, producing the correct FORG0001). Also fixes the default error code from XPTY0004 to FORG0001 for BooleanValue, since any cast that reaches convertTo() has already passed the casting table validation. Fixes ~5 XQTS 3.1 test failures in prod-CastExpr. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Several IETF date parsing issues: - Day and month names were case-sensitive (rejected "SAT", "aug"). Now uses case-insensitive matching. - Hours required exactly 2 digits but the IETF grammar allows 1-2 (digit digit?). Changed parseInt minimum from 2 to 1. - Whitespace between time and timezone was mandatory but should be optional per the grammar. - Seconds detection relied on whitespace check instead of colon check, failing when timezone immediately followed time. - 24:00:00 was rejected; now normalized to 00:00:00 of the next day. - Timezone is now optional (grammar: (S? timezone)?). Fixes ~8 XQTS HEAD test failures in fn-parse-ietf-date. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Xist-db#2159) When fn:not() is used inside a predicate (e.g. $doc/*[not(self::abc)]) and the context sequence is empty, the set-difference optimization in eval() case 1 fell through to evalBoolean(), which returns a BooleanValue. However, Predicate.selectByNodeSet() expects a node set and throws "cannot convert xs:boolean('true') to a node set". The fix returns EMPTY_SEQUENCE when inside a predicate with an empty context — filtering an empty set always yields an empty set regardless of the predicate. Outside predicates (e.g. standalone not(())), the boolean evaluation path is preserved. Tests cover: - not(self::x) on empty derived path (the error scenario) - not(self::x) on non-empty path (correct filtering) - not(*) on empty path (empty result) - Standalone not(()) (boolean path unaffected by the fix) - not(child) on persistent nodes (set-difference optimization works) - not(@type = 'a') on persistent nodes (general predicate path) Closes eXist-db#2159 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
(true(), false())[not(.)] throws err:FORG0006 instead of returning (false()). The root cause is that LocationStep.getDependencies() suppresses CONTEXT_ITEM dependency for the self axis when inPredicate=true (to enable the set-difference optimization). This causes Predicate.recomputeExecutionMode() to pre-evaluate fn:not(.) against the full sequence instead of item-by-item. The fix adds CONTEXT_ITEM to FunNot.getDependencies() when the argument is "." (self::node() LocationStep with node() type test) and the function is inside a predicate. This is targeted and narrow: it only affects the context item expression ".", not typed self-axis steps like self::element, preserving the set-difference optimization for all node-set predicates (not(child), not(@attr), not(descendant::x), not(self::element)). Tests cover fn:not(.) on integers, strings, booleans, and nodes. Includes FunNotBenchmark.java (100 warmup + 500 measured iterations, 5 query patterns on 200-item XML) to verify the set-difference optimization is preserved with no regression vs develop. Closes eXist-db#2308 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The expression @*[name() ! contains(., 'DateTime')] on persistent (stored) documents throws "Type error: the sequence cannot be converted into a node set. Item type is xs:boolean". This is also fixed by the FunNot.getDependencies() change in this PR. Tests cover: - The exact error pattern from the bug report (persistent doc) - Workaround patterns that already worked: contains(name(.), ...) and @* ! name()[contains(., ...)] - The same pattern on in-memory nodes (already worked, regression guard) Closes eXist-db#3289 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…comparison (eXist-db#4327) When comparing XML nodes in test:assertEquals, the actual result was normalized (whitespace stripped) but the expected value from the annotation was not. This caused assertions with whitespace around XML in %test:assertEquals annotations to fail unexpectedly. Now both values are normalized before comparison via deep-equal. Closes eXist-db#4327 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The // abbreviation in XPath expands to /descendant-or-self::node()/. When followed by a reverse axis (preceding, ancestor, etc.), the tree walker incorrectly overwrote the reverse axis with DESCENDANT_SELF_AXIS, destroying the original semantics. For example, $node//preceding::node() behaved like $node/descendant-or-self::node() instead of the correct $node/descendant-or-self::node()/preceding::node(). Fix both the DSLASH and ABSOLUTE_DSLASH handlers to detect reverse axes (constants 0-4) and insert an explicit descendant-or-self::node() step before the reverse axis step, rather than merging them. Closes eXist-db#691 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
29592c0 to
4e543e4
Compare
Member
Author
|
[This response was co-authored with Claude Code. -Joe] CI state: 7/9 checks pass. The 2 remaining failures (ubuntu and macOS integration) are pre-existing test hangs unrelated to this PR — the surefire fork timeout fires and the CI job reports FAILURE, but no tests in this PR are failing. See the CI Health Note in the reviewer guide for details. Dependencies: None. This PR is Wave 1 — it can be reviewed and merged independently. For full context on all 7.0 PRs and the merge order, see the Reviewer Guide. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
23 bugfixes improving XQuery 3.1 spec compliance. Consolidates work from PRs #6165 (approved by @duncdrum), #6083, #6085, and #6080 (approved by @duncdrum).
What Changed
%test:assertEqualsto allow for easier comparison of XML trees #4327)Spec References
Tests
Supersedes
Test plan
🤖 Generated with Claude Code