[feature] Support file: URIs in fn:collection() for filesystem directory querying#6192
[feature] Support file: URIs in fn:collection() for filesystem directory querying#6192joewiz wants to merge 2 commits intoeXist-db:developfrom
Conversation
a0dec57 to
f5b2336
Compare
|
[This response was co-authored with Claude Code. -Joe] Good catch, @reinhapa — simplified to |
|
A few thoughts:
|
Aligns fn:collection("file://...") with fn:uri-collection's existing
Saxon-style query string parameters. Adam Retter requested this in his
review of PR eXist-db#6192.
Supported parameters (file: URIs):
- select=glob — filename glob filter (default *.xml)
- match=regex — additional filename regex filter
- content-type — MIME filter (binary returns empty since fn:collection
only returns XML documents)
- stable=yes|no — alphabetical ordering (default yes)
Refactoring:
- New CollectionQueryParameters helper centralises parameter parsing,
validation, and known-key sets for both fn:uri-collection and the
file: URI path of fn:collection.
- FunUriCollection refactored to use the helper (no behaviour change).
- ExtCollection.eval() splits the URI string at '?' before constructing
the URI, so query strings can contain regex characters (^, [, ], +,
$, etc.) that Java's URI class would otherwise reject.
- The raw query string is threaded through to getFileCollectionItems
and parsed by CollectionQueryParameters.
Tests:
- 12 JUnit tests in CollectionFileUriTest exercising each parameter
individually and in combination, plus error cases for unknown keys
and invalid values.
- All existing fn:uri-collection XQSuite tests still pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:collection() now supports file: URIs to scan a directory for XML
files and return them as in-memory documents. This matches the behavior
of BaseX and Saxon.
Usage:
collection("file:///path/to/dir") (: all *.xml files :)
collection("file:///path/to/dir?select=*.xhtml") (: glob filter :)
Security: only DBA users can access file: URIs (same restriction as
fn:doc for file system access). Non-parseable files are silently
skipped. Non-existent directories throw FODC0002.
Implementation: in getCollectionItems(), checks if the URI scheme is
"file", scans the directory with DirectoryStream + glob pattern, and
parses each matching file via DocUtils.parse() into a memtree document.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aligns fn:collection("file://...") with fn:uri-collection's existing
Saxon-style query string parameters. Adam Retter requested this in his
review of PR eXist-db#6192.
Supported parameters (file: URIs):
- select=glob — filename glob filter (default *.xml)
- match=regex — additional filename regex filter
- content-type — MIME filter (binary returns empty since fn:collection
only returns XML documents)
- stable=yes|no — alphabetical ordering (default yes)
Refactoring:
- New CollectionQueryParameters helper centralises parameter parsing,
validation, and known-key sets for both fn:uri-collection and the
file: URI path of fn:collection.
- FunUriCollection refactored to use the helper (no behaviour change).
- ExtCollection.eval() splits the URI string at '?' before constructing
the URI, so query strings can contain regex characters (^, [, ], +,
$, etc.) that Java's URI class would otherwise reject.
- The raw query string is threaded through to getFileCollectionItems
and parsed by CollectionQueryParameters.
Tests:
- 12 JUnit tests in CollectionFileUriTest exercising each parameter
individually and in combination, plus error cases for unknown keys
and invalid values.
- All existing fn:uri-collection XQSuite tests still pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9ed1af7 to
b69e803
Compare
I would want to avoid putting in resources into supporting non-standard extensions from Saxon. I would be in favour of proposing that XQuery 4.0 will get |
Summary
fn:collection()to supportfile:URIs, enabling queries over filesystem directoriesfn:collection("file:///path/to/dir")scans for*.xmlfiles by defaultfn:collection("file:///path/to/dir?select=*.xhtml")supports glob filtering (Saxon convention)file:URIs (security boundary, consistent withfn:doc())Motivation
BaseX and Saxon both support
collection("file:/path/to/dir")for querying filesystem directories. eXist'sfn:doc()already supportsfile:URIs butfn:collection()was limited to database collections only. The W3C spec saysfn:collection()is implementation-defined, making this a conformant extension.Test plan
collection("file:///tmp/test-xml/")returns XML documents?select=*.xhtmlglob filtering works🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com