Skip to content

Fix XQuery 3.1 compliance: casting, error codes, fn:not, path dedup, format-date, and more#6207

Open
joewiz wants to merge 21 commits intoeXist-db:developfrom
joewiz:v2/xq31-compliance-fixes
Open

Fix XQuery 3.1 compliance: casting, error codes, fn:not, path dedup, format-date, and more#6207
joewiz wants to merge 21 commits intoeXist-db:developfrom
joewiz:v2/xq31-compliance-fixes

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented Apr 6, 2026

Summary

23 bugfixes improving XQuery 3.1 spec compliance. Consolidates work from PRs #6165 (approved by @duncdrum), #6083, #6085, and #6080 (approved by @duncdrum).

What Changed

Spec References

Tests

Supersedes

Test plan

  • exist-core unit tests pass (6,562 run, 0 failures)
  • XQTS 3.1 score stable or improved (92.8%, up from 89.7%)
  • No regressions in casting, fn:not, path expressions

🤖 Generated with Claude Code

joewiz and others added 21 commits April 13, 2026 09:26
…d-text, fn:json-doc

Resolve relative URIs against file: base URI with direct file: handling.
Only allow direct file: access for URIs resolved from relative paths
(absolute file: URIs go through SourceFactory security checks).
Separate FOJS0001 from FOUT1170 in fn:json-doc.
Add iso-8859 → iso-8859-1 charset fallback in fn:unparsed-text.

XQTS: misc-HtmlTestSuite 0→1105/1379, misc-JsonTestSuite 0→299/318

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per XQuery spec section 14.2, the xmlns prefix must not be included
in the result of fn:in-scope-prefixes(). eXist was including it because
collectPrefixes() adds all namespace declarations from the node tree,
including the xmlns pseudo-prefix.

XQTS: fixes 8 fn-in-scope-prefixes tests that expected xmlns to be absent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…esults

Per XQuery spec: "If the text resource has a BOM (byte-order-mark),
the BOM is excluded from the result." fn:unparsed-text was returning
the BOM character (U+FEFF) at the beginning of the string.

Adds stripBOM() helper that removes U+FEFF from the start of the
result string. Applied to both unparsed-text and unparsed-text-lines
(first line only for lines).

XQTS: expected to fix ~5 fn-unparsed-text BOM tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…space collapse

gDay/gMonth/gYear comparison: fill in missing year/month/day reference
fields in getImplicitCalendar() per XPath spec §10.4. Without these,
XMLGregorianCalendar cannot properly normalize timezone offsets for
comparison. gDay uses 1972-12, gMonth uses 1972-xx-01, etc.

xs:token whitespace collapse: rewrite collapseWhitespace() to properly
implement the XML Schema "collapse" facet — replace whitespace chars
with space, collapse consecutive spaces, strip leading/trailing spaces.
Previously the method only collapsed consecutive whitespace but didn't
strip leading/trailing or normalize single tab/newline chars.

XQTS: op-gDay-equal +1, op-gMonth-equal 40/40, op-gYear-equal 45/45,
  xs-token +2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Many XPathExceptions were thrown without error codes, defaulting to
exerr:ERROR. The XQTS runner checks for specific error codes, so
these all appeared as wrong-error-code failures.

Fixes:
- DocumentConstructor: attribute in document → XPTY0004 (14 tests)
- AttributeConstructor: enclosed expr in namespace → XQST0022 (16 tests)
- AtomicValue.toNodeSet/toMemNodeSet → XPTY0019 (21 tests)
- AbstractDateTimeValue mult/div → XPTY0004 (2 tests)
- DayTimeDurationValue plus type error → XPTY0004 (6 tests)
- DynamicAttributeConstructor name expr → XPTY0004 (5 tests)

XQTS: estimated +64 tests from error code alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CastExpression/CastableExpression:
- xs:anySimpleType now throws XPST0080 (was XPST0051), matching
  xs:anyAtomicType and xs:NOTATION per the XQuery specification

FunctionFactory:
- Unknown type names in xs: namespace (like xs:name, xs:anyAtomic)
  now throw XPST0017 (no such function) instead of generic ERROR

StringValue:
- String-to-restricted-type validation (xs:language, xs:Name,
  xs:NCName, xs:NMTOKEN, xs:ID, xs:IDREF, xs:ENTITY) now throws
  FORG0001 instead of generic ERROR

Base64BinaryValueType:
- Invalid base64 data now throws FORG0001 with proper ErrorCode
  instead of embedding the code in the message string

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When formatting negative exponents, the minus sign was counted in the
string length for padding calculation. For pattern "0.0000e00" with
exponent -2: "-2" has length 2, minimumExponentSize 2, padLen=0, so
no padding → "-2" instead of "-02".

Fix: use Math.abs(exp) for the digit string, pad the absolute value,
then prepend the minus sign. Now "-2" becomes "02" → "-02".

Fixes: numberformat117, numberformat141, numberformat142 (+3 tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tection

Two JSON serialization fixes:

1. XDM serialization bypass (XQuerySerializer):
   When XDM_SERIALIZATION=yes (from fn:serialize or declare option),
   always use JSONSerializer directly instead of the backwards-compat
   path that converts single elements via the old JSON writer. The
   backwards-compat path produces wrong output (null) for element
   nodes, which should be serialized as their XML string value per
   the W3C JSON output method spec.

2. Fix allow-duplicate-names logic (JSONSerializer):
   The STRICT_DUPLICATE_DETECTION flag was inverted — it was enabled
   when allow-duplicate-names=yes (should be disabled) and disabled
   when allow-duplicate-names=no (should be enabled). Fix the boolean
   logic and change default to "no" per W3C spec.

Target: +5 method-json tests (null output + duplicate keys)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:json-doc now checks context.getDynamicallyAvailableTextResource()
before attempting URL resolution. This allows the XQTS runner to
provide JSON test data as dynamically registered text resources
(mapped from http://www.w3.org/qt3/json/* URIs to local files).

Previously these tests all returned empty sequence because json-doc
couldn't resolve the HTTP URIs to local files.

XQTS: expected to fix ~34 fn-json-doc wrong-result tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:parse-json($text, ()) should be valid (treating () as no options),
but the 2-arg signature required Type.MAP_ITEM with EXACTLY_ONE
cardinality. Changed to optParam (ZERO_OR_ONE) so empty sequence
is accepted.

Investigation summary for remaining failures across 5 function groups:

fn:xml-to-json (80 wrong results): FOJS0006 for D-series tests that
extract elements from XSLT data files. The element namespace handling
in FunXmlToJson needs investigation — complex fix.

fn:replace (11 fixable): 4 FORX0003 validation gaps (replacement
string $ and \ validation), 4 regex \# and \b escape handling —
requires Saxon regex translator changes.

fn:parse-json (37 fixable): Surrogate pair handling (\udead → U+FFFD),
\r escape, number precision (double vs decimal), key ordering —
deep Jackson parsing issues.

All remaining fixable failures across these 5 sets total ~128 tests.
~240 of the total failures are XPST0003 parser-blocked (map literals,
arrow operators) — will be unlocked by parser-next branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…parison

Three XQuery 3.1 compliance fixes:

1. fn:deep-equal: Replace RuntimeException with break in default case
   for unexpected node types during compareContents(). Comments and PIs
   are correctly skipped, so encountering them is not an error.
   XQTS: +2 tests (K2-SeqDeepEqualFunc-20, K2-SeqDeepEqualFunc-22)

2. fn:round: DecimalValue.convertTo(INTEGER) used BigDecimal.longValue()
   which silently truncates values > Long.MAX_VALUE. Now uses
   toBigInteger() with IntegerValue's BigInteger constructor.
   XQTS: +7 tests (fn-round with 23-digit integers)

3. Collations.compare: Unicode codepoint comparison for supplementary
   characters. String.compareTo() compares UTF-16 code units, giving
   wrong ordering for characters above U+FFFF (encoded as surrogate
   pairs). Now compares actual Unicode codepoints.
   Spec: W3C XQuery 3.1 §7.3.1 (Unicode Codepoint Collation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per XPath F&O 3.1 Section 19, casting between types that have no valid
conversion path (e.g., xs:time to xs:date, xs:anyURI to xs:hexBinary)
should raise XPTY0004, not FORG0001. FORG0001 is reserved for when the
cast IS allowed but the specific value is invalid.

Add Type.isCastable() implementing the XQuery 3.1 casting table to
pre-validate cast operations before attempting them. CastExpression
now checks castability first and raises XPTY0004 for impossible casts.
CastableExpression returns false immediately for impossible casts.

Fixes ~580 XQTS 3.1 test failures in prod-CastExpr and related sets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:min and fn:max threw FORG0006 when given mixed numeric types (e.g.,
xs:integer and xs:double) because getCommonSuperType() returned
ANY_ATOMIC_TYPE for cross-numeric-family types. Per XQuery 3.1, mixed
numeric types should be promoted to a common type before comparison.

Add a check: if both types are members of the xs:numeric union, allow
the comparison to proceed to the existing numeric promotion code.

Fixes ~60 XQTS 3.1 test failures in fn-min and fn-max.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When casting xs:double NaN, INF, or -INF to xs:integer or xs:decimal,
eXist incorrectly raised FORG0001. Per XPath F&O 3.1 Section 4.1.16,
FOCA0002 should be raised when the cast value is outside the target
type's value space (NaN and infinities have no integer/decimal
representation).

Fixes ~44 XQTS 3.1 test failures in prod-CastExpr.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
xs:boolean cast to integer subtypes like xs:nonPositiveInteger or
xs:negativeInteger hit the default case and threw an incorrect error.
Now routes through IntegerValue conversion which properly validates
the value against the subtype's range (e.g., true=1 is invalid for
xs:nonPositiveInteger, producing the correct FORG0001).

Also fixes the default error code from XPTY0004 to FORG0001 for
BooleanValue, since any cast that reaches convertTo() has already
passed the casting table validation.

Fixes ~5 XQTS 3.1 test failures in prod-CastExpr.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Several IETF date parsing issues:

- Day and month names were case-sensitive (rejected "SAT", "aug").
  Now uses case-insensitive matching.
- Hours required exactly 2 digits but the IETF grammar allows 1-2
  (digit digit?). Changed parseInt minimum from 2 to 1.
- Whitespace between time and timezone was mandatory but should be
  optional per the grammar.
- Seconds detection relied on whitespace check instead of colon
  check, failing when timezone immediately followed time.
- 24:00:00 was rejected; now normalized to 00:00:00 of the next day.
- Timezone is now optional (grammar: (S? timezone)?).

Fixes ~8 XQTS HEAD test failures in fn-parse-ietf-date.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Xist-db#2159)

When fn:not() is used inside a predicate (e.g. $doc/*[not(self::abc)])
and the context sequence is empty, the set-difference optimization in
eval() case 1 fell through to evalBoolean(), which returns a BooleanValue.
However, Predicate.selectByNodeSet() expects a node set and throws
"cannot convert xs:boolean('true') to a node set".

The fix returns EMPTY_SEQUENCE when inside a predicate with an empty
context — filtering an empty set always yields an empty set regardless
of the predicate. Outside predicates (e.g. standalone not(())), the
boolean evaluation path is preserved.

Tests cover:
- not(self::x) on empty derived path (the error scenario)
- not(self::x) on non-empty path (correct filtering)
- not(*) on empty path (empty result)
- Standalone not(()) (boolean path unaffected by the fix)
- not(child) on persistent nodes (set-difference optimization works)
- not(@type = 'a') on persistent nodes (general predicate path)

Closes eXist-db#2159

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
(true(), false())[not(.)] throws err:FORG0006 instead of returning
(false()). The root cause is that LocationStep.getDependencies()
suppresses CONTEXT_ITEM dependency for the self axis when
inPredicate=true (to enable the set-difference optimization).
This causes Predicate.recomputeExecutionMode() to pre-evaluate
fn:not(.) against the full sequence instead of item-by-item.

The fix adds CONTEXT_ITEM to FunNot.getDependencies() when the
argument is "." (self::node() LocationStep with node() type test)
and the function is inside a predicate. This is targeted and narrow:
it only affects the context item expression ".", not typed self-axis
steps like self::element, preserving the set-difference optimization
for all node-set predicates (not(child), not(@attr), not(descendant::x),
not(self::element)).

Tests cover fn:not(.) on integers, strings, booleans, and nodes.
Includes FunNotBenchmark.java (100 warmup + 500 measured iterations,
5 query patterns on 200-item XML) to verify the set-difference
optimization is preserved with no regression vs develop.

Closes eXist-db#2308

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The expression @*[name() ! contains(., 'DateTime')] on persistent
(stored) documents throws "Type error: the sequence cannot be
converted into a node set. Item type is xs:boolean". This is also
fixed by the FunNot.getDependencies() change in this PR.

Tests cover:
- The exact error pattern from the bug report (persistent doc)
- Workaround patterns that already worked: contains(name(.), ...)
  and @* ! name()[contains(., ...)]
- The same pattern on in-memory nodes (already worked, regression guard)

Closes eXist-db#3289

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…comparison (eXist-db#4327)

When comparing XML nodes in test:assertEquals, the actual result was
normalized (whitespace stripped) but the expected value from the
annotation was not. This caused assertions with whitespace around XML
in %test:assertEquals annotations to fail unexpectedly.

Now both values are normalized before comparison via deep-equal.

Closes eXist-db#4327

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The // abbreviation in XPath expands to /descendant-or-self::node()/.
When followed by a reverse axis (preceding, ancestor, etc.), the tree
walker incorrectly overwrote the reverse axis with DESCENDANT_SELF_AXIS,
destroying the original semantics. For example, $node//preceding::node()
behaved like $node/descendant-or-self::node() instead of the correct
$node/descendant-or-self::node()/preceding::node().

Fix both the DSLASH and ABSOLUTE_DSLASH handlers to detect reverse axes
(constants 0-4) and insert an explicit descendant-or-self::node() step
before the reverse axis step, rather than merging them.

Closes eXist-db#691

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@joewiz joewiz force-pushed the v2/xq31-compliance-fixes branch from 29592c0 to 4e543e4 Compare April 13, 2026 13:26
@joewiz joewiz marked this pull request as ready for review April 14, 2026 13:43
@joewiz joewiz requested a review from a team as a code owner April 14, 2026 13:43
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Apr 14, 2026

[This response was co-authored with Claude Code. -Joe]

CI state: 7/9 checks pass. The 2 remaining failures (ubuntu and macOS integration) are pre-existing test hangs unrelated to this PR — the surefire fork timeout fires and the CI job reports FAILURE, but no tests in this PR are failing. See the CI Health Note in the reviewer guide for details.

Dependencies: None. This PR is Wave 1 — it can be reviewed and merged independently.

For full context on all 7.0 PRs and the merge order, see the Reviewer Guide.

@duncdrum duncdrum added xquery issue is related to xquery implementation bug issue confirmed as bug labels Apr 14, 2026
@duncdrum duncdrum added this to v7.0.0 Apr 14, 2026
@duncdrum duncdrum moved this to Ready in v7.0.0 Apr 14, 2026
@duncdrum duncdrum added this to the eXist-7.0.0 milestone Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug issue confirmed as bug xquery issue is related to xquery implementation

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

2 participants