Improve W3C serialization compliance across all output methods#6138
Closed
joewiz wants to merge 84 commits intoeXist-db:developfrom
Closed
Improve W3C serialization compliance across all output methods#6138joewiz wants to merge 84 commits intoeXist-db:developfrom
joewiz wants to merge 84 commits intoeXist-db:developfrom
Conversation
Contributor
|
needs a rebase |
7f478be to
331d112
Compare
fn:compare: XQ4 numeric/duration/dateTime total order via BigDecimal. fn:min/fn:max: fn:compare-based mutual comparability. fn:round 3-arg. fn:deep-equal: full XQ4 options engine, text node merging. fn:every/fn:some, fn:all-equal/different, fn:atomic-equal, fn:duplicate-values, fn:highest/fn:lowest, fn:scan-left/right, fn:contains/starts-with/ends-with-subsequence. Fix: SequenceComparator o2Count typo, AtomicValueComparator cause preservation, Collations instanceof for non-RuleBasedCollator, BigInteger comparison via string (not truncating getLong()). XQTS: fn-min +73, fn-max +73, fn-deep-equal +20, fn-every/some +50 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
String: fn:characters, fn:graphemes (ICU4J), fn:char, fn:decode-from-uri, fn:insert-separator, fn:replicate Parsing: fn:parse-html (NekoHTML+XHTML), fn:parse-integer, fn:parse-QName, fn:parse-uri, fn:build-uri, fn:html-doc, fn:collation/-available Type: fn:atomic-type-annotation, fn:node-type-annotation, fn:type-of, fn:is-NaN, fn:identity, fn:void Nav: fn:transitive-closure, fn:element-to-map, fn:siblings, fn:in-scope-namespaces, fn:distinct/ordered-nodes Higher-order: fn:partition, fn:partial-apply, fn:sort-by, fn:op, fn:subsequence-where Numeric: fn:seconds, fn:divide-decimals, fn:unix-dateTime, fn:civil-timezone, fn:hash, fn:expanded-QName, fn:unparsed-binary Date: fn:build-dateTime, fn:parts-of-dateTime (record-compatible) Data: fn:items-at, fn:slice, fn:message, fn:highest, fn:lowest XQTS: fn-graphemes 1086/1189, fn-characters 45/45, misc-HtmlTestSuite 1105/1379, fn-unparsed-binary 14/15 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
array:slice (4 overloads), array:index-where, array:sort-with, array:sort-by, array:empty, array:foot, array:trunk, array:items, array:members, array:build, array:index-of, array:of-members, array:split. Fix array:sort ClassCastException unwrap, ArraySortBy key validation, ArraySortWith RuntimeException unwrap. XQTS: array-slice 71/71, array-foot 9/9, array-trunk 6/6, array-items 8/8, math-cosh/sinh/tanh 27/27 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hyperbolic trigonometric functions via Java Math.cosh/sinh/tanh. Euler's number constant via Math.E. XQTS: math-cosh 9/9, math-sinh 9/9, math-tanh 9/9, math-e 4/5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unicode block name fallback (\p{Is<Block>} → \p{In<Block>}).
XQ4 fn:replace: 'c' flag, empty match, function replacement.
XQ4 fn:matches and fn:tokenize enhancements.
FunAnalyzeString: use reflection proxy for RegexIterator.MatchHandler
to avoid NoClassDefFoundError when the inner class is stripped from
fat JARs. Falls back to text-only output when unavailable.
XQTS: fn-matches.re +45, fn-replace +12, fn-tokenize +8
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fractional seconds: left-aligned digit semantics. Word/Roman via ICU4J: W/w/Ww cardinal, Wo/wo/Wwo ordinal, I/i Roman. Timezone: picture-driven rewrite with digit family support. Era [E]/[C], calendar validation, grouping separators, optional digit validation, ordinal suffix teens fix, whitespace stripping, military TZ "J", name width truncation (max not min). XQTS: format-time 46→77/92, format-date 79→111/133 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d-text, fn:json-doc Resolve relative URIs against file: base URI with direct file: handling. Only allow direct file: access for URIs resolved from relative paths (absolute file: URIs go through SourceFactory security checks). Separate FOJS0001 from FOUT1170 in fn:json-doc. Add iso-8859 → iso-8859-1 charset fallback in fn:unparsed-text. XQTS: misc-HtmlTestSuite 0→1105/1379, misc-JsonTestSuite 0→299/318 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:parse-csv, fn:csv-to-arrays, fn:csv-to-xml, fn:csv-to-json. Custom streaming CSV parser with configurable delimiter, quote char, header handling, and column naming. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- fnXQuery40.xql: tests for 50+ new XQ4 functions - deep-equal-options-test.xq: deep-equal options engine tests - Re-enable arr:get-invalid-type (XPTY0004 now works) - Update json-to-xml pending comments - fn:replace test updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parser and tree walker extensions for XQ4: focus functions, keyword args, string templates, pipeline, mapping arrow, for member, otherwise, braced if, while, try/finally, ternary, QName/hex/binary literals, array/map filter, choice/union/enum types, method call, let destructure, fn() shorthand, record types, gnode(), 4 new axes, reservedKeywords sub-rules, expr split for code-too-large fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New expression classes: FocusFunction, KeywordArgumentExpression, MappingArrowOperator, MethodCallOperator, PipelineExpression, OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr, LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression, EnumCastExpression, FunctionParameterFunctionSequenceType. Modified: Function (keyword arg resolution), FunctionFactory (XQ4 no-namespace override, unknown type XPST0017), FunctionSignature (default params), UserDefinedFunction (default param binding), TryCatchExpression (finally), SwitchExpression (XQ4 version gating), StringConstructor (atomization fixes), XQueryContext (version 4.0, XQST0060 relaxed, compileModuleFromSource), Constants (4 new axes), LocationStep (or-self axis evaluation with document node guard). Type infrastructure: Type.RECORD constant, SequenceType.RecordField, record type structural checking, record(*) and record() support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- convertTo(): FORG0001→XPTY0004 for type-incompatible casts (20 files) - DoubleValue: NaN/INF→integer/decimal throws FOCA0002 - DynamicCardinalityCheck: ERROR→XPTY0004 (or XPDY0050 for treat-as) - DynamicTypeCheck: FOCH0002→XPTY0004 (overridable for treat-as) - CastExpression: xs:anySimpleType→XPST0080 (was XPST0051) - StringValue: validation errors→FORG0001 (was generic ERROR) - Base64BinaryValueType: FORG0001 with proper ErrorCode - ErrorCodes: added convenience constructor XQTS impact: prod-CastExpr 745→141F, prod-TreatExpr 18→1F Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compile modules from provided source strings instead of loading from URIs. Required by misc-Subtyping XQTS tests (146 tests). Relaxed version compatibility check for content-loaded modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse invisible XML grammars using the Markup Blitz iXML library. Two signatures: fn:invisible-xml(grammar) returns a parsing function, and fn:invisible-xml(grammar, input) parses directly. Updated pom.xml with Markup Blitz dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Primitive long start/end instead of IntegerValue objects. Pre-computed size with overflow protection. O(1) count/isEmpty/contains. Prevents OOM on large ranges like 1 to 10000000000. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enhanced: fn:compare (XQ4 anyAtomicType, total order), fn:min/max (comparison function), fn:deep-equal (options map), fn:matches/ fn:tokenize (XQ4 regex flags, ! flag version-gating), fn:replace (function replacement, ! flag), fn:round (3-arg mode). Collations: supplementary codepoint fix, ASCII case-insensitive collator. InspectModule: keyword arg introspection. DocUtils: URI resolution. Parameter name alignment across 59 fn: module files to match W3C XQuery 4.0 Functions and Operators catalog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive fnXQuery40.xql with tests for all XQ4 features. Updated fnHigherOrderFunctions.xql, replace.xqm, fnLanguage.xqm, InspectModuleTest.java. New deep-equal-options-test.xq and fnInvisibleXml.xqm. Fixed stray backtick in Lucene facets.xql. Updated map ordering test assertions for LinkedHashMap insertion order. XQSuite: 1341 tests, 0 failures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
Fix multiple issues in the JSON output method (method="json") and JSON function option validation: JSONSerializer: - Enable forward slash escaping (ESCAPE_FORWARD_SLASHES) per JSON spec - Handle INF/NaN/negative-zero per QT4 spec (1e9999, -1e9999, null) - Fix inverted allow-duplicate-names logic: "yes" now correctly allows duplicates (was enabling STRICT_DUPLICATE_DETECTION) - Add manual duplicate key detection in serializeMap for SERE0022 errors when allow-duplicate-names="no" - Extract numeric serialization into dedicated serializeAtomicValue method XQuerySerializer: - Remove backwards-compatibility check in serializeJSON() that routed single element/document nodes to XML serialization instead of JSON JSON.java (fn:parse-json, fn:json-to-xml, fn:json-doc): - Validate option types: 'liberal' must be boolean, 'duplicates' must be string (XPTY0004) - Check that options parameter is a map before casting XQTS QT4 results: method-json 8/81 → 46/81 (+38) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove 'map' prefix from map serialization: output '{...}' not
'map{...}' per W3C Serialization 3.1 Section 11 (Adaptive Output
Method)
- Fix double INF/NaN serialization: use 'INF'/'-INF'/'NaN' string
representations instead of Unicode symbols that DecimalFormat produces
XQTS QT4 results: method-adaptive 23/101 → 85/102 (+62)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
XQuerySerializer:
- Add item-separator support: when item-separator is set and the
sequence has multiple items, serialize each item individually with the
separator between them (the internal Serializer doesn't handle
item-separator)
XMLWriter:
- Output XML declaration when standalone parameter is set, even if
omit-xml-declaration is not explicitly "no" (per W3C Serialization 3.1)
- Add CDATA section output for cdata-section-elements: when
xdmSerialization is active and the current element is in the
cdata-section-elements set, wrap text content in CDATA sections
instead of character-escaping it
IndentingXMLWriter:
- Implement suppress-indentation parameter: parse space-separated
element names and skip indentation inside those elements and their
descendants
Option.java:
- Allow URIQualifiedName (Q{namespace}local) in declare option
statements; was rejecting them because it required a prefix
XQTS QT4 results: method-xml 11/47 → 20/47 (+9),
method-text 1/20 → 17/20 (+16)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AbstractSerializer: - Default html-version to 5.0 per W3C Serialization 3.1 spec (was 1.0, causing method="html" to use XHTML 1.0 writer instead of HTML5) - Map output:version to html-version for html/xhtml methods per W3C spec (version controls HTML version, not XML version, for these methods) HTML5Writer: - Add include-content-type support: inject <meta> content-type tag in <head> when include-content-type=yes (the default) - Add HTML5 processing instruction format: output <?pi data> instead of <?pi data?> per HTML5 spec XHTMLWriter: - Add 'embed' to void elements set (was missing, causing <embed></embed> instead of <embed />) XQTS QT4 results: method-html 31/69 → 34/69 (+3), method-xhtml 20/53 → 25/53 (+5) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite fn:xml-to-json to use DOM traversal instead of XMLStreamReader. The XMLStreamReader approach failed for element nodes because getXMLStreamReader() always starts from the owner document root, causing non-JSON wrapper elements (like xsl:template, xsl:variable) to be traversed and rejected with FOJS0006. The new DOM-based approach: - Directly navigates the element's DOM tree - Handles map, array, string, number, boolean, null elements - Supports key/escaped/escaped-key attributes - Works correctly for both document and element node inputs - Keeps the old XMLStreamReader-based method for reference XQTS QT4 results: fn-xml-to-json 82/166 → 97/166 (+15) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
331d112 to
443870e
Compare
Grammar (XQuery.g): - fn() and function() type tests now accept named parameters: fn($name as xs:string, $age as xs:integer) as xs:boolean The names are parsed and discarded — only the sequence types matter for type checking. This matches the XQ4 spec. CastExpression/CastableExpression: - xs:anyType and xs:untyped now throw XPST0080 (was bypassing the abstract type check or using XPST0051) XQTS: misc-BuiltInKeywords 227→234 (+7 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore the backwards-compatibility check in XQuerySerializer.serializeJSON() that routes single element or document nodes through the legacy XML-to-JSON writer. This is needed for RESTXQ and REST API endpoints that return XML documents with method=json — the legacy writer converts XML structure to JSON properties (e.g., <firstName>Adam</firstName> → "firstName":"Adam"). Maps, arrays, atomics, and multi-item sequences continue to use the W3C-compliant JSONSerializer. Fixes MediaTypeIntegrationTest.mediaTypeJson1 and mediaTypeJson2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a11919c to
e1eb77d
Compare
JSONSerializer: - Fix json-lines output adding extra whitespace between values. Jackson adds separator whitespace between root-level values, so each json-line is now serialized via a separate generator to a string buffer, then written as raw content. XQuerySerializer: - Flatten arrays before XML/text serialization — ArrayType items can't be serialized as SAX events, so [1,2,3,4,5] is flattened to the sequence (1,2,3,4,5) before passing to the SAX serializer. - For text method with flattened arrays, set default item-separator to space (per W3C spec) when not explicitly provided. Fixes: serialize-json-201, -203 (json-lines whitespace), Serialization-text-19 (array serialization in text method). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…feature/serialization-compliance
Two new validation checks in the XML element form of fn:serialize options: 1. Reject unrecognized attributes on the <serialization-parameters> root element (e.g., value2="no" is not a valid attribute) 2. Reject child elements with no namespace — serialization parameter elements must be in the output: or exist: namespace (e.g., <indent value="yes"/> without output: prefix is invalid) Both raise SEPM0017 per W3C Serialization 3.1. Fixes serialize-xml-015 and serialize-xml-020. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The W3C Serialization 3.1 spec requires whitespace normalization for parameter values. The 'value' attribute on parameter elements like <output:standalone value=" no "/> should be trimmed to "no" before use. Add value.trim() in readSerializationProperty after reading the 'value' attribute from the XML element form. Fixes serialize-xml-029 (standalone=" no ") and serialize-xml-030 (standalone=" yes "). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nt-spaces Three issues from the restxq-impl serialization audit: 1. Canonical parameter — reject with SEPM0016 instead of silently accepting. Full canonical XML (C14N) requires sorted attributes, sorted namespace declarations, expanded empty elements — complex to implement correctly. The parameter was accepted but had no effect; now throws SEPM0016 "not supported" to be honest about the limitation. 2. CSV serializer test coverage — add 13 XQSuite tests covering: array-of-arrays, sequence-of-maps, XML table, empty input, single item, custom delimiters, quoting (always/minimal), escaped quotes, header row, values with commas/newlines. 3. JSON indent-spaces — wire the exist:indent-spaces parameter to Jackson's JsonGenerator pretty printer. Previously JSON always used Jackson's default 2-space indent; now respects the configured indent-spaces value (default 4, matching XML/HTML). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
XMLWriter: - Mark DEL character (0x7F) as special in both text and attribute specialChars arrays, so it's escaped as  per XML spec. Previously DEL passed through unescaped because it fell below the 0x80 threshold for the 0x7F-0x9F check. IndentingXMLWriter: - Accept "true" and "1" as boolean true for the indent parameter, not just "yes". Per W3C Serialization 3.1, boolean parameters accept all three forms. Fixes: K2-Serialization-9 (DEL in attributes), K2-Serialization-10 (DEL in text), K2-Serialization-36 and -37 (indent="true" with suppress-indentation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…NR0001) The W3C Serialization spec requires that maps and function items cannot be serialized with the XML or text output methods (error SENR0001). Previously these items would silently produce incorrect output or errors deep in the SAX pipeline. Now XQuerySerializer validates the sequence upfront and throws a clear SENR0001 error. Also flatten arrays in sequences before XML/text serialization, since the SAX serializer cannot handle ArrayType items directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ements, suppress-indentation) The cardinality check in setPropertyForMap was inverted: it used Cardinality._MANY.isSuperCardinalityOrEqualOf(convention.getCardinality()) which always returned false because _MANY (bit 4) cannot contain ZERO_OR_MORE (bits 1+2+4). This meant that when passing multiple QName values for cdata-section-elements (e.g., both an unprefixed and a namespaced element), only the first QName was kept. Fix: reverse the check to convention.getCardinality().isSuperCardinalityOrEqualOf(_MANY), which correctly tests whether the parameter accepts multiple values. Also adds XQSuite tests for cdata-section-elements with namespaced elements, both individually and combined. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…path When serializing sequences with item-separator using the XML method, the serializeXMLWithItemSeparator path now respects the omit-xml-declaration parameter. If omit-xml-declaration is explicitly set to no/false/0, the declaration is output before the first item. Also ensures that individual node items within the sequence don't each get their own declaration. Note: eXist-db defaults omit-xml-declaration to yes (omit). The W3C Serialization spec leaves the default to the host language, so this is a valid choice. Adds test for item-separator with atomic values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1cc2446 to
301bb0e
Compare
… DOCTYPE Three serialization fixes: 1. standalone="omit" was written literally into the XML declaration instead of being treated as absent. Now "omit" suppresses the standalone attribute. Also normalizes true/false/1/0 to yes/no in the declaration output. 2. xs:untypedAtomic values in serialization parameter maps caused XPTY0004 errors. Now untypedAtomic is accepted and coerced to the expected type (boolean, string, etc.), matching the W3C spec requirement that untypedAtomic be castable to the target type. 3. HTML5 DOCTYPE was always emitted even for non-html root elements (e.g., serializing a bare <body> or <br> fragment). Now DOCTYPE is only output when the root element is <html>, matching W3C Serialization behavior for HTML fragments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Adaptive serialization: add "map" keyword prefix to map output
per W3C Serialization 3.1, section 10. Maps are now serialized as
map{key:value,...} instead of {key:value,...}.
2. Character map validation: use SEPM0016 (invalid parameter value)
instead of SEPM0017 (unrecognized parameter) when a character map
key is not a single character. SEPM0016 is the correct error code
per the W3C spec for invalid parameter values.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes:
1. Maps and function items in the normalize() step of fn:serialize
now correctly raise SENR0001 instead of being silently converted
to strings. Previously, map{} with method=xml/html/xhtml would
produce an empty string instead of an error.
2. XHTML5Writer now always outputs simple <!DOCTYPE html> for HTML5,
ignoring doctype-public and doctype-system parameters per W3C
Serialization spec. Previously it would pass through any
doctype-public value, producing non-HTML5 DOCTYPEs like
<!DOCTYPE html PUBLIC "...">.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three serialization fixes: 1. SERE0020: INF, -INF, and NaN values now raise SERE0020 when serialized with the JSON output method, per W3C Serialization 3.1. Previously these were silently converted to null/1e9999. 2. SERE0023: Multi-item sequences within array members now raise SERE0023. Previously [(1, 2)] would silently nest the items into a JSON sub-array. The spec requires an error for sequences with more than one item at any level. 3. CDATA encoding split: When serializing CDATA sections with a restricted encoding (e.g., us-ascii), characters that cannot be represented in the encoding now cause the CDATA section to be split, with the unencodable character written as a numeric character reference between CDATA segments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… C0 escaping) Two changes enable XML 1.1 serialization output: 1. ElementConstructor: Remove unconditional XQST0085 throw for namespace undeclaration (xmlns:prefix=""). Per the spec, XQST0085 only applies when the implementation does NOT support XML Names 1.1. Since eXist now supports XML 1.1 serialization, namespace undeclaration is allowed. 2. XMLWriter: Version-aware C0 control character escaping. When version="1.1" is set, characters 0x01-0x08, 0x0B, 0x0C, 0x0E-0x1F are serialized as numeric character references (e.g., ). These characters are valid in XML 1.1 but must be escaped. Note: codepoints-to-string() still uses XML 1.0 validation (XMLChar.isValid) since eXist does not yet have a context-level XML version setting. K2-7/K2-8 (XML 1.1 control char tests) remain blocked on this — they need the XDM to allow C0 controls, which requires a version-aware codepoints-to-string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ng, validation) Implements the W3C QT4 canonical serialization parameter for XML and XHTML output methods: 1. XMLWriter: when canonical=true, buffers namespace and attribute events. On closeStartTag(), sorts namespaces by prefix (default first) and attributes by namespace URI then local name. Emits namespaces before attributes per C14N spec. Rejects relative namespace URIs with SERE0024 during namespace buffering. 2. Empty element expansion: canonical mode writes <elem></elem> instead of <elem/> for empty elements. 3. SERE0024 validation: rejects relative namespace URIs (checked both in XQuerySerializer pre-validation and XMLWriter namespace buffering) and multi-root documents when canonical=true. 4. FunSerialize: removes SEPM0016 rejection of canonical parameter. When canonical=true, forces omit-xml-declaration=yes, encoding= UTF-8, include-content-type=no, and removes CDATA sections. 5. XHTML5Writer: suppresses DOCTYPE output when canonical=true. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4f75f1f to
6d9c4e9
Compare
Implements the W3C QT4 canonical serialization parameter for the JSON output method per RFC 8785 (JSON Canonicalization Scheme): 1. Map key sorting: when canonical=true, map entries are sorted by key using String.compareTo() (UTF-16 code unit order per RFC 8785). 2. Number formatting: all numeric values are cast to double and formatted using ECMAScript shortest representation via BigDecimal. Plain notation for [1e-6, 1e21), exponential otherwise. NaN and Infinity raise SERE0020. 3. Solidus escaping: disabled in canonical mode (RFC 8785 does not escape forward slashes). 4. Duplicate key rejection: canonical mode always rejects duplicate keys with SERE0022, regardless of allow-duplicate-names setting. 5. FunSerialize: canonical JSON forces indent=no, escape-solidus=no. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The XML/HTML serializers already supported use-character-maps via the CharacterMappingWriter decorator, but the JSON serializer (which uses Jackson's JsonGenerator directly) bypassed it entirely. Add character map support to JSONSerializer: - Parse use-character-maps from output properties in constructor - applyCharacterMap() substitutes mapped codepoints with replacement strings - writeStringWithCharMap() applies character map before passing to generator.writeString(), preserving Jackson's structural state Character map replacements in JSON string values go through writeString so Jackson handles structural separators (colons, commas) correctly. Replacement strings are included as-is in the JSON string (e.g., "<b>" stays literal since < is valid in JSON strings). 5 new XQSuite tests: JSON string mapping, special characters, raw output bypass, copyright symbol mapping, XML element text mapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Unicode normalization support during XML serialization: - Reads the normalization-form parameter (NFC, NFD, NFKC, NFKD, none) - Normalizes text content and attribute values via java.text.Normalizer - Applied in writeChars() (text and attributes) and writeCdataContent() - Skips normalization when form is "none" (default) or text is already in the target form (checked via Normalizer.isNormalized()) - "fully-normalized" treated as "none" (optional per spec) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
XHTML5 serialization now normalizes prefixed SVG and MathML elements to use default namespace bindings instead of prefixed forms: - <svg:svg xmlns:svg="...svg"> → <svg xmlns="...svg"> - <m:math xmlns:m="...MathML"> → <math xmlns="...MathML"> This applies only to html-version >= 5.0 and only to the two specific namespace URIs (http://www.w3.org/2000/svg and http://www.w3.org/1998/Math/MathML). General namespace handling is unchanged. The implementation extends the existing XHTML prefix collapsing mechanism in XHTMLWriter to also handle these foreign namespaces, converting prefixed namespace declarations to default ones. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Loads serialization parameters from an external XML file referenced by the parameter-document option: declare option output:parameter-document "path/to/params.xml"; The document is resolved relative to the query's static base URI and parsed as a W3C serialization parameters element. Parameters from the document provide base settings; inline declare option statements override them (per W3C spec). Supports all parameter types including use-character-maps, which enables character map expansion from external parameter documents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ments Two fixes that resolve eXide and other apps failing through the URL rewrite view pipeline: 1. XMLWriter.namespace(): Skip empty default namespace undeclarations (prefix='' nsURI='') that caused "namespace declaration outside an element" error. Also skip the implicit xml namespace prefix. 2. XHTMLWriter.writeContentTypeMeta(): Use self-closing <meta .../> tags in XHTML mode. The URL rewrite pipeline serializes source documents as XHTML (RESTServer forces method=xhtml for text/html), then the view re-parses the serialized output as XML. Non-self-closing <meta> tags made the XHTML output not well-formed XML, causing parseAsXml() to fail and request:get-data() to return a string instead of XML nodes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests that HTML documents with <head> elements can be served through the URL rewrite view pipeline without being returned as strings. Background: The W3C Serialization 3.1 spec requires that when include-content-type is "yes" (the default), the XHTML/HTML serializer should include a <meta> content-type declaration as the first child of <head>. Commit e6e395f added writeContentTypeMeta() to XHTMLWriter to implement this requirement. However, the injected <meta> tag used HTML-style non-self-closing format (<meta ...> instead of <meta .../>) even in XHTML mode. When the URL rewrite pipeline serialized a text/html document as XHTML (RESTServer forces method=xhtml for text/html), the non-self-closing <meta> made the output not well-formed XML. The view's request:get-data() then failed to parse it as XML and returned a string, causing XPTY0019. The test stores an HTML document with a <head> element, serves it through a controller.xq + view.xq dispatch, and verifies: - HTTP 200 (not 400 or 500) - Source page content preserved - View wrapper content applied - No raw XML entities in output (indicating string instead of nodes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ceed804 to
6ec8727
Compare
joewiz
added a commit
to joewiz/exist
that referenced
this pull request
Apr 6, 2026
Three targeted fixes prevent the forked JVM from hanging after BrokerPool.shutdown() completes: 1. StatusReporter threads are now daemon threads. The startup and shutdown status reporter threads are monitoring-only and must not prevent JVM exit. Added newInstanceDaemonThread() to ThreadUtils. 2. Four wait loops in BrokerPool that swallowed InterruptedException and used unbounded wait() now have 1-second poll timeouts, isShuttingDown() checks, and proper interrupt handling: - get() service mode wait: breaks on shutdown or interrupt - get() broker availability wait: throws EXistException on shutdown - enterServiceMode() wait: breaks on shutdown or interrupt - shutdown() active brokers wait: re-sets interrupt flag and breaks 3. At end of shutdown, instanceThreadGroup.interrupt() wakes any lingering threads in the instance's thread group. Previously, 4 test classes required exclusion or timeout workarounds (DeadlockIT, RemoveCollectionIT, CollectionLocksTest, MoveResourceTest). Now all complete cleanly: 6533 unit tests + 9 integration tests, 0 failures, clean JVM exit. Affects PRs with CI timeout workarounds: eXist-db#6112, eXist-db#6139, eXist-db#6138 Related: eXist-db#3685 (FragmentsTest deadlock) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
Member
Author
|
[This comment was co-authored with Claude Code. -Joe] Closing — superseded by #6219 (v2/serialization-compliance). This work has been consolidated into a clean v2/ branch as part of the eXist-db 7.0 PR reorganization. The new PR includes all commits from this PR plus additional related work, with reviewer feedback incorporated where applicable. See the reviewer guide for the full context. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Improve eXist-db's compliance with the W3C XSLT and XQuery Serialization 3.1 specification across all output methods (JSON, adaptive, XML, text, HTML, XHTML) and fix
fn:xml-to-jsonfor element node inputs.Depends on: #6139 (XQuery 4.0 parser) — rebased on Parser; merge after it lands.
What Changed
JSONSerializer.javaallow-duplicate-names; add SERE0022 duplicate-key detectionXQuerySerializer.javaJSON.javaliberal(boolean) andduplicates(string)AdaptiveWriter.javamapprefix ({...}notmap{...}); fix double INF/NaN to use text not Unicode symbolsXMLWriter.javacdata-section-elements; CR and LINE SEPARATOR character reference escaping;&{attribute escaping hookIndentingXMLWriter.javasuppress-indentationparameterOption.javaQ{namespace}localURIQualifiedName indeclare optionAbstractSerializer.javahtml-versionto 5.0;output:version→html-versionmappingXHTMLWriter.javainclude-content-typemeta tag (first child of<head>); boolean attribute minimization; XHTML content-type useshttp-equivformHTML5Writer.javaXHTML5Writer.javaFunSerialize.javaFunXmlToJson.javaSerializerUtils.javaQ{ns}localURIQualifiedName in QName-type properties; subtype checking for parameter validationXQTS Results (QT4)
Spec References
Test Plan
serialize-node)🤖 Generated with Claude Code