Skip to content

[feature] Support file: URIs in fn:collection() for filesystem directory querying#6192

Open
joewiz wants to merge 2 commits intoeXist-db:developfrom
joewiz:feature/collection-file-uris
Open

[feature] Support file: URIs in fn:collection() for filesystem directory querying#6192
joewiz wants to merge 2 commits intoeXist-db:developfrom
joewiz:feature/collection-file-uris

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented Mar 28, 2026

Summary

  • Extend fn:collection() to support file: URIs, enabling queries over filesystem directories
  • fn:collection("file:///path/to/dir") scans for *.xml files by default
  • fn:collection("file:///path/to/dir?select=*.xhtml") supports glob filtering (Saxon convention)
  • DBA-only access for file: URIs (security boundary, consistent with fn:doc())
  • Non-parseable files silently skipped; non-existent directories throw FODC0002

Motivation

BaseX and Saxon both support collection("file:/path/to/dir") for querying filesystem directories. eXist's fn:doc() already supports file: URIs but fn:collection() was limited to database collections only. The W3C spec says fn:collection() is implementation-defined, making this a conformant extension.

Test plan

  • XQSuite test: non-existent directory throws FODC0002
  • Manual: collection("file:///tmp/test-xml/") returns XML documents
  • Manual: ?select=*.xhtml glob filtering works
  • Manual: non-DBA user gets permission denied

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

@joewiz joewiz requested a review from a team as a code owner March 28, 2026 04:35
@joewiz joewiz added the enhancement new features, suggestions, etc. label Mar 28, 2026
Comment thread exist-core/src/main/java/org/exist/xquery/functions/fn/ExtCollection.java Outdated
@joewiz joewiz force-pushed the feature/collection-file-uris branch from a0dec57 to f5b2336 Compare March 28, 2026 16:17
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Mar 28, 2026

[This response was co-authored with Claude Code. -Joe]

Good catch, @reinhapa — simplified to "file".equals(collectionUri.getScheme()) which handles the null case cleanly. Pushed.

@adamretter
Copy link
Copy Markdown
Contributor

adamretter commented Mar 28, 2026

A few thoughts:

  1. This is really just a shortcut for doing the exact equivalent using the File extension module.

  2. fn:collection-uri already supports Saxon query string syntax that I added for match, content-type, and stable - it would be nice to see this unified with fn:collection i.e. both should support those options plus also select if that's one you would like to add.

  3. There are serious security implications around using file://. Can I suggest that you add some sensible controls around that in the same manner (or better) that we did for the File extension module please? Any time you provide a route to access the filesystem, you open up a security hole. I already demonstrated that several of the online XQuery and XSLT fiddles can be used to read /etc/passwd (and worse) from the hosted systems. eXide which is included by default is not different to a fiddle - not to mention the REST API etc that allows remote query execution by guest and/or non-DBA users..

joewiz added a commit to joewiz/exist that referenced this pull request Apr 6, 2026
Aligns fn:collection("file://...") with fn:uri-collection's existing
Saxon-style query string parameters. Adam Retter requested this in his
review of PR eXist-db#6192.

Supported parameters (file: URIs):
- select=glob   — filename glob filter (default *.xml)
- match=regex   — additional filename regex filter
- content-type  — MIME filter (binary returns empty since fn:collection
                  only returns XML documents)
- stable=yes|no — alphabetical ordering (default yes)

Refactoring:
- New CollectionQueryParameters helper centralises parameter parsing,
  validation, and known-key sets for both fn:uri-collection and the
  file: URI path of fn:collection.
- FunUriCollection refactored to use the helper (no behaviour change).
- ExtCollection.eval() splits the URI string at '?' before constructing
  the URI, so query strings can contain regex characters (^, [, ], +,
  $, etc.) that Java's URI class would otherwise reject.
- The raw query string is threaded through to getFileCollectionItems
  and parsed by CollectionQueryParameters.

Tests:
- 12 JUnit tests in CollectionFileUriTest exercising each parameter
  individually and in combination, plus error cases for unknown keys
  and invalid values.
- All existing fn:uri-collection XQSuite tests still pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
joewiz and others added 2 commits April 13, 2026 09:25
fn:collection() now supports file: URIs to scan a directory for XML
files and return them as in-memory documents. This matches the behavior
of BaseX and Saxon.

Usage:
  collection("file:///path/to/dir")           (: all *.xml files :)
  collection("file:///path/to/dir?select=*.xhtml")  (: glob filter :)

Security: only DBA users can access file: URIs (same restriction as
fn:doc for file system access). Non-parseable files are silently
skipped. Non-existent directories throw FODC0002.

Implementation: in getCollectionItems(), checks if the URI scheme is
"file", scans the directory with DirectoryStream + glob pattern, and
parses each matching file via DocUtils.parse() into a memtree document.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aligns fn:collection("file://...") with fn:uri-collection's existing
Saxon-style query string parameters. Adam Retter requested this in his
review of PR eXist-db#6192.

Supported parameters (file: URIs):
- select=glob   — filename glob filter (default *.xml)
- match=regex   — additional filename regex filter
- content-type  — MIME filter (binary returns empty since fn:collection
                  only returns XML documents)
- stable=yes|no — alphabetical ordering (default yes)

Refactoring:
- New CollectionQueryParameters helper centralises parameter parsing,
  validation, and known-key sets for both fn:uri-collection and the
  file: URI path of fn:collection.
- FunUriCollection refactored to use the helper (no behaviour change).
- ExtCollection.eval() splits the URI string at '?' before constructing
  the URI, so query strings can contain regex characters (^, [, ], +,
  $, etc.) that Java's URI class would otherwise reject.
- The raw query string is threaded through to getFileCollectionItems
  and parsed by CollectionQueryParameters.

Tests:
- 12 JUnit tests in CollectionFileUriTest exercising each parameter
  individually and in combination, plus error cases for unknown keys
  and invalid values.
- All existing fn:uri-collection XQSuite tests still pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/collection-file-uris branch from 9ed1af7 to b69e803 Compare April 13, 2026 13:26
@line-o
Copy link
Copy Markdown
Member

line-o commented Apr 14, 2026

fn:collection-uri already supports Saxon query string syntax that I added for match, content-type, and stable - it would be nice to see this unified with fn:collection i.e. both should support those options plus also select if that's one you would like to add.

I would want to avoid putting in resources into supporting non-standard extensions from Saxon.

I would be in favour of proposing that XQuery 4.0 will get fn:collection#2 with options maps similar to fn:doc#2 and fn:doc-available#2 see https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-doc

@line-o line-o requested review from a team and reinhapa April 14, 2026 10:09
@line-o line-o added this to v7.0.0 Apr 14, 2026
@line-o line-o added the xquery issue is related to xquery implementation label Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement new features, suggestions, etc. xquery issue is related to xquery implementation

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants