Improve W3C serialization compliance across all output methods#6219
Open
joewiz wants to merge 11 commits intoeXist-db:developfrom
Open
Improve W3C serialization compliance across all output methods#6219joewiz wants to merge 11 commits intoeXist-db:developfrom
joewiz wants to merge 11 commits intoeXist-db:developfrom
Conversation
5 tasks
dc13ff7 to
8d4400d
Compare
11890ec to
66c36c5
Compare
joewiz
added a commit
to joewiz/exist
that referenced
this pull request
Apr 8, 2026
…tructors With `declare copy-namespaces preserve, no-inherit`, eXist was incorrectly applying the no-inherit flag during element constructor evaluation, causing two distinct bugs: 1. Name resolution failure: `getURIForPrefix()` skips `inheritedInScopeNamespaces` when `inheritNamespaces=false`, so a child constructor `<b/>` inside `<e xmlns="http://example.com/">` could not see the inherited default namespace, and a prefixed name like `<appearsUnused:c/>` whose prefix was declared on an enclosing element threw XPTY0004. 2. `in-scope-prefixes()` returning incomplete results: `FunInScopePrefixes` does not traverse ancestor nodes when `inheritNamespaces=false`, so nested direct constructors like `<e3>{<e2>{<e1/>}</e2>}</e3>` only saw `<e1>`'s own namespace prefix, missing `namespace2` and `namespace3` from enclosing elements. Fix A (ElementConstructor, name resolution): temporarily restore `inheritNamespaces` to `true` while resolving the element's QName, so that `QName.parse()` can look up prefix-to-URI mappings in the inherited context. The no-inherit spec says only namespace propagation from *copied source nodes* is suppressed — not how constructor names are resolved (XQuery 3.1 §3.9.3.4). Fix B (ElementConstructor, namespace nodes): when no-inherit is active, explicitly add all inherited namespace bindings as namespace nodes on the newly constructed element. Since `FunInScopePrefixes` skips ancestor traversal in no-inherit mode, each constructor element must carry its full in-scope namespace context directly. Variable/copy content is unaffected because those nodes are built in a standalone context with an empty inherited namespace stack. Fixes the following pre-existing XQTS failures in prod-CopyNamespacesDecl: K2-CopyNamespacesProlog-4, K2-CopyNamespacesProlog-5, K2-CopyNamespacesProlog-9 (second half), copynamespace-2 Also fixes the real-world regression reported in eXist-db#2182 where `declare copy-namespaces preserve, no-inherit` caused XPath steps over constructed elements to return empty sequences. Companion to PR eXist-db#6219 (serializer xmlns="" fix). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4 tasks
joewiz
added a commit
to joewiz/exist
that referenced
this pull request
Apr 8, 2026
…tructors With `declare copy-namespaces preserve, no-inherit`, eXist was incorrectly applying the no-inherit flag during element constructor evaluation, causing two distinct bugs: 1. Name resolution failure: `getURIForPrefix()` skips `inheritedInScopeNamespaces` when `inheritNamespaces=false`, so a child constructor `<b/>` inside `<e xmlns="http://example.com/">` could not see the inherited default namespace, and a prefixed name like `<ns:c/>` whose prefix was declared on an enclosing element threw XPTY0004. 2. `in-scope-prefixes()` returning incomplete results for nested direct constructors: `<e3>{<e2>{<e1/>}</e2>}</e3>` — `<e1>` only returned its own prefix, missing `namespace2` and `namespace3` from the enclosing constructor elements. Per XQuery 3.1 §3.9.3.4, `no-inherit` governs how namespaces propagate from *copied source nodes* (existing XDM nodes placed into constructors via `{$var}`) into the result — it must not affect namespace resolution for element constructors themselves, nor prevent ancestor traversal when collecting in-scope prefixes of directly constructed elements. Fix A (ElementConstructor — name resolution): temporarily restore `inheritNamespaces=true` while resolving the element's QName so that `QName.parse()` can look up prefix-to-URI mappings in the inherited context. Restored to `false` in a `finally` block. Fix B (FunInScopePrefixes): always traverse the in-memory node's ancestor chain when collecting namespace prefixes. The previous coarse `inheritNamespaces()` switch prevented ancestor traversal for all no-inherit queries, including direct constructors where traversal is correct. Updated the cleanup pass to remove all entries with an empty URI (namespace undeclarations), not just empty-key+empty-value pairs. Fix C (EnclosedExpr + NoInheritCopyReceiver): when `no-inherit` is active and an enclosed expression places a pre-existing element node (a variable reference, not a direct constructor) into the outer element, inject namespace undeclarations (xmlns:prefix="") onto the root of the copy for every ancestor namespace binding not already declared by that root. This neutralizes ancestor traversal for those nodes so that `in-scope-prefixes()` returns only the node's own namespace context. Pre-existing nodes are distinguished from direct constructors by capturing the MemTreeBuilder allocated during the enclosed expression's evaluation (via new `peekDocumentBuilder()`) and checking whether each result node belongs to that builder's document. Fixes the following pre-existing XQTS failures in prod-CopyNamespacesDecl: K2-CopyNamespacesProlog-4, K2-CopyNamespacesProlog-5, K2-CopyNamespacesProlog-9 (second half: direct constructors), copynamespace-2 Also fixes the real-world regression reported in eXist-db#2182 where `declare copy-namespaces preserve, no-inherit` caused XPath steps over constructed elements to return empty sequences. Companion to PR eXist-db#6219 (serializer xmlns="" fix). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
joewiz
added a commit
to joewiz/exist
that referenced
this pull request
Apr 8, 2026
…tructors With `declare copy-namespaces preserve, no-inherit`, eXist was incorrectly applying the no-inherit flag during element constructor evaluation, causing two distinct bugs: 1. Name resolution failure: `getURIForPrefix()` skips `inheritedInScopeNamespaces` when `inheritNamespaces=false`, so a child constructor `<b/>` inside `<e xmlns="http://example.com/">` could not see the inherited default namespace, and a prefixed name like `<ns:c/>` whose prefix was declared on an enclosing element threw XPTY0004. 2. `in-scope-prefixes()` returning incomplete results for nested direct constructors: `<e3>{<e2>{<e1/>}</e2>}</e3>` — `<e1>` only returned its own prefix, missing `namespace2` and `namespace3` from the enclosing constructor elements. Per XQuery 3.1 §3.9.3.4, `no-inherit` governs how namespaces propagate from *copied source nodes* (existing XDM nodes placed into constructors via `{$var}`) into the result — it must not affect namespace resolution for element constructors themselves, nor prevent ancestor traversal when collecting in-scope prefixes of directly constructed elements. Fix A (ElementConstructor — name resolution): temporarily restore `inheritNamespaces=true` while resolving the element's QName so that `QName.parse()` can look up prefix-to-URI mappings in the inherited context. Restored to `false` in a `finally` block. Fix B (FunInScopePrefixes): always traverse the in-memory node's ancestor chain when collecting namespace prefixes. The previous coarse `inheritNamespaces()` switch prevented ancestor traversal for all no-inherit queries, including direct constructors where traversal is correct. Updated the cleanup pass to remove all entries with an empty URI (namespace undeclarations), not just empty-key+empty-value pairs. Fix C (EnclosedExpr + NoInheritCopyReceiver): when `no-inherit` is active and an enclosed expression places a pre-existing element node (a variable reference, not a direct constructor) into the outer element, inject namespace undeclarations (xmlns:prefix="") onto the root of the copy for every ancestor namespace binding not already declared by that root. This neutralizes ancestor traversal for those nodes so that `in-scope-prefixes()` returns only the node's own namespace context. Pre-existing nodes are distinguished from direct constructors by capturing the MemTreeBuilder allocated during the enclosed expression's evaluation (via new `peekDocumentBuilder()`) and checking whether each result node belongs to that builder's document. Fixes the following pre-existing XQTS failures in prod-CopyNamespacesDecl: K2-CopyNamespacesProlog-4, K2-CopyNamespacesProlog-5, K2-CopyNamespacesProlog-9 (second half: direct constructors), copynamespace-2 Also fixes the real-world regression reported in eXist-db#2182 where `declare copy-namespaces preserve, no-inherit` caused XPath steps over constructed elements to return empty sequences. Companion to PR eXist-db#6219 (serializer xmlns="" fix). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fab7e4b to
1e8941a
Compare
joewiz
added a commit
to joewiz/exist
that referenced
this pull request
Apr 13, 2026
…tructors With `declare copy-namespaces preserve, no-inherit`, eXist was incorrectly applying the no-inherit flag during element constructor evaluation, causing two distinct bugs: 1. Name resolution failure: `getURIForPrefix()` skips `inheritedInScopeNamespaces` when `inheritNamespaces=false`, so a child constructor `<b/>` inside `<e xmlns="http://example.com/">` could not see the inherited default namespace, and a prefixed name like `<ns:c/>` whose prefix was declared on an enclosing element threw XPTY0004. 2. `in-scope-prefixes()` returning incomplete results for nested direct constructors: `<e3>{<e2>{<e1/>}</e2>}</e3>` — `<e1>` only returned its own prefix, missing `namespace2` and `namespace3` from the enclosing constructor elements. Per XQuery 3.1 §3.9.3.4, `no-inherit` governs how namespaces propagate from *copied source nodes* (existing XDM nodes placed into constructors via `{$var}`) into the result — it must not affect namespace resolution for element constructors themselves, nor prevent ancestor traversal when collecting in-scope prefixes of directly constructed elements. Fix A (ElementConstructor — name resolution): temporarily restore `inheritNamespaces=true` while resolving the element's QName so that `QName.parse()` can look up prefix-to-URI mappings in the inherited context. Restored to `false` in a `finally` block. Fix B (FunInScopePrefixes): always traverse the in-memory node's ancestor chain when collecting namespace prefixes. The previous coarse `inheritNamespaces()` switch prevented ancestor traversal for all no-inherit queries, including direct constructors where traversal is correct. Updated the cleanup pass to remove all entries with an empty URI (namespace undeclarations), not just empty-key+empty-value pairs. Fix C (EnclosedExpr + NoInheritCopyReceiver): when `no-inherit` is active and an enclosed expression places a pre-existing element node (a variable reference, not a direct constructor) into the outer element, inject namespace undeclarations (xmlns:prefix="") onto the root of the copy for every ancestor namespace binding not already declared by that root. This neutralizes ancestor traversal for those nodes so that `in-scope-prefixes()` returns only the node's own namespace context. Pre-existing nodes are distinguished from direct constructors by capturing the MemTreeBuilder allocated during the enclosed expression's evaluation (via new `peekDocumentBuilder()`) and checking whether each result node belongs to that builder's document. Fixes the following pre-existing XQTS failures in prod-CopyNamespacesDecl: K2-CopyNamespacesProlog-4, K2-CopyNamespacesProlog-5, K2-CopyNamespacesProlog-9 (second half: direct constructors), copynamespace-2 Also fixes the real-world regression reported in eXist-db#2182 where `declare copy-namespaces preserve, no-inherit` caused XPath steps over constructed elements to return empty sequences. Companion to PR eXist-db#6219 (serializer xmlns="" fix). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1e8941a to
41fce20
Compare
Member
Author
|
[This response was co-authored with Claude Code. -Joe] CI state: 5/9 checks pass. Of the 4 failures:
Dependencies: Wave 5 — independent, can merge in any order after Wave 4. For full context on all 7.0 PRs and the merge order, see the Reviewer Guide. |
Contributor
|
needs a rebase |
Corrects multiple issues in how serialization parameters are parsed and validated: - Fix type checking to allow subtypes (e.g., xs:string subtype of xs:anyAtomicType) and coerce xs:untypedAtomic to target type - Accept "false", "0" as boolean false (not just "no") - Trim whitespace in XML serialization parameter values - Fix multi-value QName parameter cardinality check (was backwards) - Fix standalone=omit handling, normalize boolean true/false/1/0 to yes/no - Add SEPM0009 validation for contradictory use-character-maps - Add SEPM0016 error for character map key length validation - Add SEPM0017 validation for serialization-parameters XML element form - Add SERE0023 validation for multi-item sequences in JSON serialization - Accept eXist-specific parameters in XML serialization element form (fixes regression from eXist-db#3446) - Fix fn:json-to-xml option validation for liberal/duplicates params - Register QT4 serialization parameters: escape-solidus, json-lines, canonical, CSV field/row/quote params Spec: W3C Serialization 3.1 §5 (XML Output Method), QT4 Serialization 4.0 §3.1.1 (Serialization Parameters) XQTS: Fixes serialize-xml-*, serialize-json-* parameter validation tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive improvements to the core XML serializer (XMLWriter) and
indentation handling (IndentingXMLWriter):
Character escaping:
- Escape CR (U+000D), DEL (U+007F), and LINE SEPARATOR (U+2028)
- Escape C0 control characters (U+0001-U+001F) in XML 1.1 mode
- Fix character reference escaping in CDATA sections
CDATA sections:
- Encoding-aware CDATA split: break on ]]> and on characters not
representable in the output encoding
- Use cdata-section-elements with namespace-aware element matching
- Add shouldUseCdataSections() hook for subclass override
XML declaration and standalone:
- Normalize standalone="omit" to omit the attribute entirely
- Normalize boolean true/false/1/0 to yes/no for standalone
- Emit XML declaration when standalone is explicitly set
Canonical XML (C14N):
- Buffer namespace and attribute events for sorted emission
- Sort namespaces by prefix (default first), attributes by namespace
URI then local name
- Expand empty elements: <foo/> becomes <foo></foo>
- Validate relative namespace URIs (SERE0024)
Normalization form:
- Support NFC, NFD, NFKC, NFKD normalization forms
- Apply normalization during character output
XML 1.1:
- C0 control character escaping (U+0001-U+001F except tab/newline/CR)
Indentation:
- Support suppress-indentation with URI-qualified element names
- Accept boolean true/1 alongside yes for indent parameter
Spec: W3C Serialization 3.1 §5 (XML Output Method),
Canonical XML 1.1 (https://www.w3.org/TR/xml-c14n11/) §2.3,
XML 1.1 §2.2 (Characters)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major improvements to XHTMLWriter for correct HTML/XHTML output:
Content-type meta injection:
- Write <meta http-equiv="Content-Type" ...> or <meta charset="...">
as first child of <head> when include-content-type=yes (default)
- HTML5 uses <meta charset="UTF-8"> shorthand
- XHTML uses self-closing <meta .../> for valid XML output
- Track head element state, reset between serializations
HTML method support:
- Boolean attribute minimization (checked, disabled, selected, etc.)
- Raw text elements (script, style) — no escaping in element content
- Suppress cdata-section-elements for HTML method
- Don't escape & before { in HTML attribute values (template syntax)
- Add embed to void/empty elements list
SVG/MathML namespace normalization:
- Collapse SVG and MathML namespace prefixes to default namespace
in XHTML5 serialization (e.g., svg:rect → rect within SVG)
Canonical XML support in XHTML close tag.
HTML version detection: default from 1.0 to 5.0.
Spec: W3C Serialization 3.1 §7 (XHTML Output Method),
W3C Serialization 3.1 §8 (HTML Output Method)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
XHTML5Writer:
- Suppress DOCTYPE for non-<html> root elements (fragment serialization)
- Support doctype-public and doctype-system for XHTML mode
- Suppress DOCTYPE entirely in canonical mode
HTML5Writer:
- Processing instructions use > not ?> for HTML method
- Override needsEscape(char, boolean) for raw text elements
Test: HTML5FragmentTest — 12 new tests for fragment DOCTYPE suppression,
suppress-indentation, CDATA suppression in HTML, script escaping.
Spec: W3C Serialization 3.1 §7.3 (XHTML DOCTYPE),
HTML5 §12.1.3 (Serialization of script/style)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JSONSerializer:
- SERE0020: Reject INF/NaN in JSON serialization
- SERE0021: Reject function items
- SERE0022: Detect duplicate map keys
- SERE0023: Reject multi-item sequences
- escape-solidus parameter, json-lines parameter
- Canonical JSON (RFC 8785): sorted keys, canonical double format
- Character maps: apply use-character-maps to JSON string output
- Respect indent-spaces for JSON indentation
AdaptiveWriter:
- Fix map output: map{ not map { (spec compliance)
- Fix INF/NaN handling in adaptive double output
FunXmlToJson:
- Rewrite to DOM-based element conversion
- Better handling of element vs document nodes
Spec: W3C Serialization 3.1 §9 (JSON Output Method),
RFC 8785 (JSON Canonicalization Scheme)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SENR0001 validation:
- Reject maps and function items in XML/text sequence normalization
Text serialization:
- Flatten arrays recursively before text serialization
- Default item-separator to space for text method
XML serialization with item-separator:
- Support XML declaration in item-separator path
CSV serialization dispatch:
- Route method="csv" to CSVSerializer
Canonical XML validation:
- Validate canonical constraints before output
Spec: W3C Serialization 3.1 §2 (Sequence Normalization),
Canonical XML 1.1 §2 (Conformance)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tors Remove XQST0085 error for namespace undeclaration (xmlns:prefix="") in element constructors. XML 1.1 allows namespace undeclaration. Spec: XML 1.1 §4 (Namespace Undeclaration) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Support loading serialization parameters from an external XML document via declare option output:parameter-document. Parameters from the document are applied first, then inline options override them. Spec: W3C Serialization 3.1 §3.1 (parameter-document) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ments Two fixes that resolve eXide and other apps failing through the URL rewrite view pipeline: 1. XMLWriter.namespace(): Skip empty default namespace undeclarations (prefix='' nsURI='') that caused "namespace declaration outside an element" error. Also skip the implicit xml namespace prefix. 2. XHTMLWriter.writeContentTypeMeta(): Use self-closing <meta .../> tags in XHTML mode. The URL rewrite pipeline serializes source documents as XHTML (RESTServer forces method=xhtml for text/html), then the view re-parses the serialized output as XML. Non-self-closing <meta> tags made the XHTML output not well-formed XML, causing parseAsXml() to fail and request:get-data() to return a string instead of XML nodes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests that HTML documents with <head> elements can be served through the URL rewrite view pipeline without being returned as strings. Background: The W3C Serialization 3.1 spec requires that when include-content-type is "yes" (the default), the XHTML/HTML serializer should include a <meta> content-type declaration as the first child of <head>. Commit e6e395f added writeContentTypeMeta() to XHTMLWriter to implement this requirement. However, the injected <meta> tag used HTML-style non-self-closing format (<meta ...> instead of <meta .../>) even in XHTML mode. When the URL rewrite pipeline serialized a text/html document as XHTML (RESTServer forces method=xhtml for text/html), the non-self-closing <meta> made the output not well-formed XML. The view's request:get-data() then failed to parse it as XML and returned a string, causing XPTY0019. The test stores an HTML document with a <head> element, serves it through a controller.xq + view.xq dispatch, and verifies: - HTTP 200 (not 400 or 500) - Source page content preserved - View wrapper content applied - No raw XML entities in output (indicating string instead of nodes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Writer XMLWriter.namespace() was dropping all xmlns="" undeclarations at the top-level guard (prefix="" + URI="" → unconditional early return), so elements with no default namespace inside a default-namespace context were silently missing the required xmlns="" attribute, causing downstream parsers to assign the wrong namespace. Root cause: the single defaultNamespace field approach only checked whether the current value equaled the new value, but never reached that check when both were empty — even when the parent had declared a non-empty default namespace. Fix: adopt a BaseX-style namespace stack (nspaces / nstack). The flat nspaces list records (prefix, uri) pairs for all in-scope declarations; nstack records the list size at each startElement so endElement can roll back to the parent scope. namespace() now calls nsLookup() to find the currently in-scope URI for a prefix and only writes a declaration when the binding changes. This naturally handles xmlns="": if the ancestor has xmlns="http://foo.com" in scope, nsLookup("") returns that URI, which differs from "", so xmlns="" is emitted. As a side effect this also prevents redundant namespace re-declarations when the same prefix→URI binding is already in scope from an ancestor, laying the groundwork for fixing eXist-db#5790. Fixes 7 pre-existing test failures: - SerializationTest#xqueryUpdateNsTest (×2, local + remote) - ExpandTest#expandWithDefaultNS - XQueryTest#namespaceHandlingSameModule_1846228 - XQueryTest#doubleDefaultNamespace_1806901 - XQueryTest#wrongAddNamespace_1807014 - XQueryTest#modulesAndNS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
41fce20 to
d9cdb2d
Compare
Member
Author
|
[This response was co-authored with Claude Code. -Joe] Rebased onto develop (force-pushed). The only conflict was in |
Contributor
|
there is a test failure: the 15 codacy issues all seem actionable to me. could you take another look please |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes and improvements to XML, HTML, XHTML, JSON, text, and adaptive serialization per W3C Serialization 3.1.
What Changed
Tests
develop(not regressions from this branch)Supersedes
Test plan
🤖 Generated with Claude Code