Add AllowedGlobResolver for fine-grained path sandboxing with glob patterns#89
Add AllowedGlobResolver for fine-grained path sandboxing with glob patterns#89
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #89 +/- ##
==========================================
+ Coverage 81.19% 81.26% +0.06%
==========================================
Files 99 102 +3
Lines 4009 4109 +100
==========================================
+ Hits 3255 3339 +84
- Misses 754 770 +16
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
WalkthroughAdds a new AllowedGlobResolver and a glob-policy subsystem (GlobPolicy, GlobPolicyBuilder, RuleAction) plus path normalization and shell-expansion utilities (using shellexpand). The resolver canonicalizes configured base directories, enforces base-directory bounds for existing and non-existent targets, normalizes relative paths, and optionally allows/denies resolved paths via glob rules. New modules: path/allowed_glob (policy.rs, normalize.rs) and helpers in path/mod.rs; core and serdesai crates re-export the new types; README and examples updated. Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/llm-coding-tools-core/src/path/allowed_glob/normalize.rs (1)
24-31: Consider surfacing expansion failures explicitly.
unwrap_or_else(|_| path.into())makes bad$VAR/~expansion look like an ordinary missing-directory error later inAllowedGlobResolver::new(). Returning aToolResult<PathBuf>here would preserve the real configuration failure instead of collapsing it into a genericcanonicalize()error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm-coding-tools-core/src/path/allowed_glob/normalize.rs` around lines 24 - 31, The current expand_shell function swallows shellexpand errors via unwrap_or_else and hides real expansion failures from AllowedGlobResolver::new(); change expand_shell to return a ToolResult<PathBuf> (or appropriate error type) instead of PathBuf, propagate the shellexpand::FullError by mapping the Err(e) into a descriptive ToolError/ToolResult with context (include the original error and the input path), and update callers such as AllowedGlobResolver::new() to handle/propagate the ToolResult so canonicalize() failures no longer mask expansion problems.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/llm-coding-tools-core/src/path/allowed_glob/policy.rs`:
- Around line 113-123: The documentation in allowed_glob::policy (the doc
comment above the pattern syntax) contradicts the module tests and README which
assume patterns like `*.rs` match `src/lib.rs`; update that doc comment to
reflect the actual/expected semantics: explicitly state whether `*` and `?` may
match path separators (and under what conditions) or that matching is performed
against the entire relative path so `*.rs` can match `src/lib.rs`; also adjust
the example wording so it aligns with the test table in this module (the table
at lines ~195-211) and the README example to eliminate the ambiguity.
- Around line 89-92: The loop in the policy evaluation currently iterates rules
in reverse (self.rules.iter().rev()), implementing last-match-wins; to enforce
the PR's first-match-wins contract change the iteration to forward order
(self.rules.iter()) so the first matcher that returns true for
matcher.is_match(normalized_path) determines the result (return matches!(action,
RuleAction::Allow)); also scan and update any tests or docs that assumed reverse
behavior to reflect first-match-wins semantics and ensure RuleAction::Allow
remains the returned check for the matching rule.
---
Nitpick comments:
In `@src/llm-coding-tools-core/src/path/allowed_glob/normalize.rs`:
- Around line 24-31: The current expand_shell function swallows shellexpand
errors via unwrap_or_else and hides real expansion failures from
AllowedGlobResolver::new(); change expand_shell to return a ToolResult<PathBuf>
(or appropriate error type) instead of PathBuf, propagate the
shellexpand::FullError by mapping the Err(e) into a descriptive
ToolError/ToolResult with context (include the original error and the input
path), and update callers such as AllowedGlobResolver::new() to handle/propagate
the ToolResult so canonicalize() failures no longer mask expansion problems.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7a6b7f89-644f-459c-bd0d-d830550c50db
⛔ Files ignored due to path filters (1)
src/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
src/llm-coding-tools-core/Cargo.tomlsrc/llm-coding-tools-core/README.mdsrc/llm-coding-tools-core/src/lib.rssrc/llm-coding-tools-core/src/path/allowed.rssrc/llm-coding-tools-core/src/path/allowed_glob/mod.rssrc/llm-coding-tools-core/src/path/allowed_glob/normalize.rssrc/llm-coding-tools-core/src/path/allowed_glob/policy.rssrc/llm-coding-tools-core/src/path/mod.rssrc/llm-coding-tools-serdesai/src/lib.rssrc/llm-coding-tools-serdesai/src/tools/mod.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Semver Checks (Serdesai Full+Linux)
- GitHub Check: Blocking Windows
- GitHub Check: Async Linux
- GitHub Check: Async Windows
- GitHub Check: Blocking Linux
- GitHub Check: Async macOS
- GitHub Check: Blocking macOS
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
📚 Learning: 2026-03-28T02:14:04.465Z
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
Applied to files:
src/llm-coding-tools-serdesai/src/tools/mod.rssrc/llm-coding-tools-core/src/lib.rssrc/llm-coding-tools-serdesai/src/lib.rssrc/llm-coding-tools-core/src/path/allowed.rssrc/llm-coding-tools-core/src/path/mod.rssrc/llm-coding-tools-core/src/path/allowed_glob/normalize.rssrc/llm-coding-tools-core/README.mdsrc/llm-coding-tools-core/src/path/allowed_glob/policy.rssrc/llm-coding-tools-core/src/path/allowed_glob/mod.rs
🔇 Additional comments (3)
src/llm-coding-tools-serdesai/src/tools/mod.rs (2)
11-15: Documentation update correctly reflects resolver extensibility.The wording now accurately communicates that tools accept any
PathResolverimplementation, and the addedAllowedGlobResolverbullet improves discoverability.
48-48: Good API-doc link addition forAllowedGlobResolver.Adding the explicit rustdoc link target here keeps intra-doc references complete and clear.
src/llm-coding-tools-serdesai/src/lib.rs (1)
28-30: Re-export surface is consistent with the new resolver feature.Including
AllowedGlobResolveralongside existing resolvers is a clean and expected API update.
… patterns Implements ordered allow/deny glob policy for directory-restricted path resolution. - Add AllowedGlobResolver with glob pattern matching support - Enforces root containment (all resolved paths stay within allowed roots) - Rules evaluated in order: first match wins (allow or deny) - GlobPolicyBuilder for ergonomic rule construction - Export from path module and crate root (llm-coding-tools-core) - Re-export from serdesai for convenience - Comprehensive tests covering policy evaluation, edge cases, and error handling - Update README with usage examples
The doc comment incorrectly stated that `*` and `?` do not match path separators (`/`), which contradicted the actual behavior and test expectations (e.g., `*.rs` matching `src/lib.rs`). Updated to clarify that patterns are matched against the entire relative path string, so `*` and `?` can match `/`. Also improved the `**` description and added explicit examples that align with the test table and README.
e4d0660 to
6a8e0a3
Compare
Changed expand_shell to return ToolResult<PathBuf> instead of silently falling back to the original path when expansion fails. This ensures users see clear error messages when environment variables are missing, rather than confusing "directory not found" errors from canonicalize(). EOF'
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs (1)
163-186: Document or handle the non-existent parent directory limitation.When resolving a path like
new_dir/file.rswherenew_dirdoesn't exist, this branch fails becauseparent.canonicalize()returnsErr. The caller (write_file) creates parent directories after resolution succeeds, creating a chicken-and-egg scenario.Consider:
- Document this as an expected limitation (users must create directories first, or write to existing directories)
- Or walk up the path to find the first existing ancestor, verify it's within bounds, then reconstruct
The current behavior is safe but may surprise users expecting to write to new subdirectories.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm-coding-tools-core/src/path/allowed_glob/mod.rs` around lines 163 - 186, The code in AllowedGlob::... that handles non-existent paths currently aborts when parent.canonicalize() fails (e.g., for "new_dir/file.rs"), causing a chicken-and-egg with write_file; update the logic in the branch that uses candidate.parent() and parent.canonicalize() to walk up the candidate's ancestors until you find the first existing ancestor, canonicalize that ancestor and ensure it starts_with(base_dir), then reconstruct the intended target_path by joining the remaining non-existent segments to the resolved ancestor, compute relative_path via strip_prefix(base_dir) and normalize::normalize_path, and apply self.policy.is_allowed(&normalized_relative) as before (or alternatively document the limitation if you choose not to implement the walk-up behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/llm-coding-tools-core/src/path/allowed_glob/mod.rs`:
- Around line 163-186: The code in AllowedGlob::... that handles non-existent
paths currently aborts when parent.canonicalize() fails (e.g., for
"new_dir/file.rs"), causing a chicken-and-egg with write_file; update the logic
in the branch that uses candidate.parent() and parent.canonicalize() to walk up
the candidate's ancestors until you find the first existing ancestor,
canonicalize that ancestor and ensure it starts_with(base_dir), then reconstruct
the intended target_path by joining the remaining non-existent segments to the
resolved ancestor, compute relative_path via strip_prefix(base_dir) and
normalize::normalize_path, and apply
self.policy.is_allowed(&normalized_relative) as before (or alternatively
document the limitation if you choose not to implement the walk-up behavior).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 88821c54-8d77-465e-803e-d8273b7edede
📒 Files selected for processing (2)
src/llm-coding-tools-core/src/path/allowed_glob/mod.rssrc/llm-coding-tools-core/src/path/allowed_glob/normalize.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/llm-coding-tools-core/src/path/allowed_glob/normalize.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Blocking Windows
- GitHub Check: Async macOS
- GitHub Check: Async Windows
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
📚 Learning: 2026-03-28T02:14:04.465Z
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
Applied to files:
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs
🔇 Additional comments (6)
src/llm-coding-tools-core/src/path/allowed_glob/mod.rs (6)
31-32: Documentation conflicts with PR objectives on rule matching semantics.The doc comment states "Last-match-wins" with reverse iteration, but the PR objectives describe "first-match-wins allow/deny rules." This contradiction will confuse users writing policies.
Please verify the actual implementation in
policy.rsand align the documentation accordingly.
[raise_major_issue, request_verification]#!/bin/bash # Description: Check the policy module implementation to verify rule matching semantics # Search for the is_allowed implementation and iteration logic ast-grep --pattern $'impl GlobPolicy { $$$ fn is_allowed($_, $_) $_ { $$$ } $$$ }' # Also search for any reverse iteration patterns rg -n -C5 'rev\(\)|\.rev\b|last.*match|first.*match' --type rust
70-93: LGTM!The constructor properly canonicalizes base directories after shell expansion, with clear error messages identifying which directory failed. The use of
to_string_lossy()for shell expansion is acceptable since shell patterns are typically ASCII.
95-112: LGTM!The
from_canonicalmethod correctly delegates validation responsibility to the caller, with appropriate documentation warning about potential path traversal attacks. This is consistent with the pattern where trusted application code provides base directories. Based on learnings: "Path normalization and..-component hardening in validators... is therefore NOT required to defend against traversal attacks" when paths come from trusted application code.
323-343: LGTM!Critical security test that validates symlink escape detection. The test correctly verifies that following a symlink outside the base directory is blocked.
308-321: LGTM!Good use of parameterized tests covering various path traversal attack vectors. The test cases effectively validate the security boundary.
409-430: The test correctly expectssrc/lib.rsto matchsrc/**/*.rs. The globset crate's**pattern matches zero or more directory segments, which is explicitly documented in the codebase as "matches any number of path components (including zero)".
- Strip verbatim path prefix from canonicalized temp paths before comparison - On Windows, test tilde expansion against actual home directory from dirs::home_dir() (uses Windows Known Folder API, not env vars) - On Unix, test with temp directory and HOME env var override - Add dirs crate to dev-dependencies for Windows home directory detection
2126921 to
c6a7990
Compare
Previously, AllowedGlobResolver and AllowedPathResolver failed when trying to write to paths like "new_dir/file.rs" where intermediate directories did not exist. The canonicalize() call on the parent would fail, causing a chicken-and-egg problem for write operations. Now both resolvers walk up ancestors to find the first existing one, canonicalize that, and safely reconstruct the target path while still blocking ".." escapes. A shared helper was added to avoid duplication. - Added resolve_nonexistent_candidate() and join_remaining_suffix() helpers with path traversal protection for non-existent targets - Added regression tests for missing intermediate directories - Added traversal tests for paths with missing directory escapes
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/llm-coding-tools-core/src/path/allowed_glob/normalize.rs`:
- Around line 102-110: The TempDir is dropped immediately because it is created
in the expression and goes out of scope before the closure runs; fix by binding
the TempDir to a local variable so it lives across the closure (e.g., create let
temp_dir = TempDir::new().unwrap(); then compute let temp_home_path =
temp_dir.path().canonicalize().unwrap(); and call temp_env::with_var("HOME",
Some(&temp_home_path), || { let result = expand_shell("~/project").unwrap(); ...
});), ensuring TempDir, temp_home_path, and the closure reference keep the
temp_dir alive for the duration of the test.
- Around line 115-116: The temporary directory created with TempDir::new() is
being dropped immediately because you assign its path to temp_home_path and lose
the TempDir handle; keep the TempDir alive until after you use strip_verbatim by
binding the TempDir to a variable (e.g., temp_home_dir) and then derive
temp_home_path via temp_home_dir.path().canonicalize().unwrap() before passing
that path into strip_verbatim; ensure you reference TempDir::new(), the
temp_home_dir binding, temp_home_path, and strip_verbatim so the TempDir
lifetime is preserved while the path is used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ee7d42c4-d3fb-41ca-8feb-16c34ec88251
⛔ Files ignored due to path filters (1)
src/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
src/llm-coding-tools-core/Cargo.tomlsrc/llm-coding-tools-core/src/path/allowed.rssrc/llm-coding-tools-core/src/path/allowed_glob/mod.rssrc/llm-coding-tools-core/src/path/allowed_glob/normalize.rssrc/llm-coding-tools-core/src/path/mod.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- src/llm-coding-tools-core/Cargo.toml
- src/llm-coding-tools-core/src/path/allowed.rs
- src/llm-coding-tools-core/src/path/mod.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Async macOS
- GitHub Check: Blocking Windows
- GitHub Check: Async Windows
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
📚 Learning: 2026-03-28T02:14:04.465Z
Learnt from: Sewer56
Repo: Sewer56/llm-coding-tools PR: 69
File: src/llm-coding-tools-bubblewrap/src/profile/validation.rs:57-67
Timestamp: 2026-03-28T02:14:04.465Z
Learning: In `src/llm-coding-tools-bubblewrap/src/profile/` (Rust, llm-coding-tools-bubblewrap crate), the `Builder` API paths (workspace, synthetic_home, cache_root, mount lists, overlays, etc.) are always set by trusted application/operator code — the library consumer is the trusted party. Path normalization and `..`-component hardening in validators like `validate_absolute_path` is therefore NOT required to defend against traversal attacks. Untrusted input (LLM-generated shell commands) only enters through `wrap_command`/`execute_command_with_mode`, not through the `Builder`.
Applied to files:
src/llm-coding-tools-core/src/path/allowed_glob/normalize.rssrc/llm-coding-tools-core/src/path/allowed_glob/mod.rs
🔇 Additional comments (9)
src/llm-coding-tools-core/src/path/allowed_glob/normalize.rs (3)
7-21: LGTM!The
normalize_pathfunction correctly handles cross-platform path separator normalization. Usingto_string_lossy()is acceptable here since glob patterns are string-based and non-UTF8 paths are rare edge cases in practice.
23-39: LGTM!The
expand_shellfunction properly wrapsshellexpand::fullwith appropriate error mapping. The error message includes both the original path and the underlying error, which aids debugging.
47-59: LGTM!The
strip_verbatimhelper correctly handles Windows verbatim path prefixes returned bycanonicalize(), enabling consistent test assertions across platforms.src/llm-coding-tools-core/src/path/allowed_glob/mod.rs (6)
40-46: LGTM!The struct design with
Arc<[Arc<Path>]>for base directories enables efficient cloning while maintaining immutability of the configuration. The optional policy pattern is clean.
70-93: LGTM!The constructor properly expands shell patterns and canonicalizes paths, with clear error messages that include the original path for debugging. Using
collect()for fail-fast error handling is appropriate.
95-131: LGTM!The
from_canonicalconstructor appropriately skips validation for performance when the caller guarantees pre-canonicalized paths. The security warning in the documentation is clear. Based on learnings, this aligns with the codebase's trust model where the library consumer is the trusted party.
136-184: LGTM!The resolver implementation correctly:
- Handles absolute paths by checking bounds after canonicalization (line 146)
- Prevents symlink escapes via
starts_withcheck on canonicalized paths- Applies glob policy to normalized relative paths
- Falls back to
resolve_nonexistent_candidatefor write operations with non-existent targetsThe defensive
unwrap_or(Path::new(""))at lines 151 and 167 is acceptable since thestarts_withcheck guaranteesstrip_prefixsuccess in normal operation.
194-227: LGTM!The test helper functions are well-structured and create realistic directory structures for testing different scenarios (basic structure, policy application, and globstar matching).
320-356: Excellent test coverage for security-critical paths.The parameterized traversal rejection tests (lines 320-334) and the symlink escape test (lines 336-356) provide solid coverage for the security boundaries. Using
rstestfor parameterization keeps the tests readable while covering multiple attack vectors.
Bind TempDir to a local variable so it lives across the closure execution, preventing the temporary directory from being deleted before tests run.
10de92b to
f272b0e
Compare
Implements ordered allow/deny glob policy for directory-restricted path resolution, providing more granular control than the existing
AllowedPathResolver.Key Features
GlobPolicyBuilderfor fluent rule constructionImplementation Details
AllowedGlobResolvertype extending path sandboxing capabilitiesGlobPolicyandGlobPolicyBuilderfor rule configuration/separator support~/,$HOME/)llm-coding-tools-coreandllm-coding-tools-serdesaiExample Usage