feat(staged): cache interactive-login-shell env per project for pipelines and git ops#744
feat(staged): cache interactive-login-shell env per project for pipelines and git ops#744matt2e wants to merge 7 commits into
Conversation
…ine steps Spawn `$SHELL -ils` once per working directory, capture its environment via `env -0` to a tempfile, and cache the snapshot for an hour. `run_pipeline_command` now applies the cached env to a native `sh -c` subprocess, so Hermit-managed PATH and other `.zshrc`-driven env vars are visible to pipeline steps (and any git hooks they trigger) without paying the per-call shell-init cost. Concurrent first-callers for the same working directory share a single capture via a `watch` channel. Falls back to the old `sh -lc` behaviour if capture fails. Signed-off-by: Matt Toohey <contact@matttoohey.com>
Per-cwd interactive-login-shell env capture is general-purpose pipeline plumbing, not ACP-shaped: `apps/staged/src-tauri/src/session_runner.rs` is the sole consumer and `acp-client` itself never calls `ShellEnvCache`. Colocate the module with its only caller and drop the cross-crate re-export. No behaviour change. Tempfile prefix renamed `acp-shell-env-*` → `staged-shell-env-*`. Adds the `fs` feature to `src-tauri`'s tokio features for `tokio::fs::read`/`remove_file`. Signed-off-by: Matt Toohey <contact@matttoohey.com>
`shell_env` moved to `src-tauri` in 68f3fef, taking the only consumers of `tokio::fs::read`/`remove_file` with it. Drop the now-orphaned `fs` feature so `acp-client`'s tokio features match what it actually uses. No behaviour change. Signed-off-by: Matt Toohey <contact@matttoohey.com>
`ShellEnvCache::get` inserts a `CachedEntry::InFlight(rx)` into the map before awaiting `capture_shell_env`, then promotes it to `Ready` (or removes it) in the `Ok`/`Err` arms. Neither arm runs when the future is dropped mid-await — and `run_pipeline_command` awaits `get` inside a cancellable pipeline step, so cancellation is a real path. When that happened, `tx` dropped without sending, but the `InFlight(rx)` entry stayed in the map. Subsequent callers cloned the dead receiver, saw `rx.changed()` return `Err`, fell through with `rx.borrow()` still `None`, and retried the outer loop — finding the same dead entry. Hot spin until the cache was dropped. Bind the `InFlight` lifetime to a stack-allocated `InFlightGuard` whose `Drop` removes the entry unless it has been promoted to `Ready`. Declared inside the `Capture` arm body after the `tx` match binding, so on cancellation the guard drops first (evict) and `tx` drops second (signal waiters `Err`). Waiters wake, see no entry on retry, and become the next Capturer. Also handles a panic inside `capture_shell_env` for free, and lets the explicit `map.remove` in the `Err` arm go away. Regression test: spawn a capturer, `yield_now` so it inserts `InFlight` and parks inside `capture_shell_env`, abort it, then assert (a) the map has no `InFlight` for the dir and (b) a subsequent `get` completes within a bounded timeout instead of spinning. Signed-off-by: Matt Toohey <contact@matttoohey.com>
`git/cli::run` is the central choke point for every `cli::run` callsite in `apps/staged/src-tauri/src/git/*` — commits, worktree adds/removes, checkouts, fetches, fast-forward merges, cherry-picks, clones via the `git clone` fallback path in `github.rs`. All of those previously inherited the Tauri process's PATH, which on macOS launched from Finder/Dock is `/usr/bin:/bin:/usr/sbin:/sbin` — so Hermit-shimmed git, LFS filters, credential helpers, and any binary invoked by a fired hook (pre-commit, commit-msg, post-commit, post-checkout, post-merge) would silently miss anything `~/.zshrc` would have put on PATH. Add `ShellEnvCache::get_blocking`, a synchronous mirror of `get` that reuses the same per-cwd `Ready` snapshot map. On a fresh miss it spawns `$SHELL -ils` synchronously and stores the result; it does not engage the `InFlight` watch-channel coalescing — a sync caller that races an async caller will do its own capture and overwrite, which is harmless because both shells produce equivalent env. `cli::run` now applies the snapshot via the new `ShellEnv::apply_to_std`, falling back to the prior `strip_git_env`-on-inherited-env behaviour if capture fails. The integration is gated `cfg(not(test))` because each unit test runs in a fresh tempdir, so the cache always misses and would add hundreds of ms per test for zero test value. Production builds always take the cache path. Factor `dump_path`/`dump_script`/`parse_env_dump` out of `capture_shell_env` so the new `capture_shell_env_blocking` shares the same tempfile naming, env-dump script, and parser — keeping the two capture paths in lockstep. Tempfile prefix unchanged. Also lands two testability seams that were already staged in the working tree from the test plan: `ShellEnvCache::with_shell_and_ttl` (point the cache at a hermetic fake shell) and `run_pipeline_command_with_cache` (inject a pre-seeded cache instead of the global singleton). No production behaviour change from either. Signed-off-by: Matt Toohey <contact@matttoohey.com>
Implements the four-wave test plan for the per-cwd `ShellEnvCache` and the `run_pipeline_command_with_cache` integration seam landed in 0e9ad29. The existing tests covered the happy path and the cancellation regression from 9e6e4c8, but left concurrency, TTL expiry, failure modes, fallback behavior, and tempfile cleanup unverified. Wave 1 (pure unit, no fake shell): - `ttl_expiry_triggers_recapture` — exercises the stale-`Ready` branch - `invalidate_all_clears_every_entry` / `invalidate_only_affects_target_dir` - `different_dirs_cached_independently` - `apply_to_clears_existing_env` — locks in `env_clear` against a future drop, complementing the existing PATH-is-present assertion - `apply_to_then_env_lets_caller_override` — docstring contract on post-`apply_to` overrides - `single_quote_empty_input` / `single_quote_backslash_unchanged` / `single_quote_pure_quote` Wave 2 (single capture): - `working_dir_argument_is_honoured` — `PWD` matches `working_dir` - `home_and_user_passed_through` — guards the `env_clear` whitelist - `promoted_guard_keeps_ready_entry` — locks `promoted = true` semantics Wave 3 (concurrency + failure modes; uses the `with_shell_and_ttl` seam): - `concurrent_first_callers_share_one_capture` — all callers see the same `captured_at` (watch-channel coalescing) - `waiter_wakes_when_capturer_cancels` — bounded-timeout assertion that a Waiter doesn't stall when the Capturer is aborted - `capture_err_propagates_and_is_not_cached` — both concurrent waiters see `Err` and a third call recaptures (no negative caching) - `tempfile_cleaned_up_on_success` / `tempfile_cleaned_up_on_failure` — set-difference snapshot of `staged-shell-env-*` so parallel tests don't interfere - `values_with_newlines_round_trip` — locks in NUL-framing over line-based parsing Wave 4 (`run_pipeline_command_with_cache` integration): - `snapshot_path_cached_env_reaches_child` — child sees a token baked into the cached snapshot - `fallback_path_when_cache_returns_err` — falling back to `sh -lc` when capture fails still runs the command - `cancellation_under_snapshot_branch` — `kill_on_drop` + select arm still cancel under the snapshot path (the existing test mostly hits the fallback branch) - `current_dir_survives_apply_to` — `pwd` reports the requested dir Pre-warm the global cache in `pipeline_command_cancellation_stops_current_step` so its 4-second deadline measures pure cancellation latency rather than the first-time shell-init cost, which can blow the budget under the heavier parallel-test load this PR adds. All wave-3/4 fake-shell tests are `#[cfg(unix)]` (PermissionsExt) and use a tempfile-backed script so they don't depend on the developer's `$SHELL` or `.zshrc`. Signed-off-by: Matt Toohey <contact@matttoohey.com>
The `with_shell_and_ttl` seam was introduced in 0e9ad29 specifically so concurrency tests could be deterministic, but `concurrent_first_callers_share_one_capture` and `waiter_wakes_when_capturer_cancels` still used `ShellEnvCache::new()` and the real `$SHELL`. On a fast capture (the common case on dev machines with a warm cache), call 0 finished before calls 1–7 arrived, so B5's "all `captured_at` equal" assertion passed via the cache-hit path without ever exercising the watch-channel coalescing branch. Point both tests at a fake shell whose `sleep 0.5` guarantees the Capturer parks inside `capture_shell_env` while the other 7 callers (B5) or the parallel Waiter (B6) arrive on the same `InFlight(rx)`. B6's recapture path becomes fast and deterministic too, so its post-abort timeout shrinks from 30s to 3s — wide enough for the expected ~600ms path (abort → guard evicts InFlight → waiter retries → new capture's `sleep 0.5` + parse) but tight enough to catch slower-but-not-infinite regressions, not just the hard hangs the cancellation fix originally targeted. Also drop `single_quote_pure_quote` (F19): the existing `single_quote_escapes_quotes` already covers the close/escape/open dance via the mid-string `'quote'` case, so F19 just costs a line. Signed-off-by: Matt Toohey <contact@matttoohey.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d643078c87
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| .env_clear() | ||
| .env("HOME", std::env::var("HOME").unwrap_or_default()) | ||
| .env("USER", std::env::var("USER").unwrap_or_default()) | ||
| .env("SHELL", shell) |
There was a problem hiding this comment.
Preserve inherited env vars when capturing shell snapshots
The capture process starts from env_clear() and re-adds only HOME, USER, and SHELL, so any session-scoped variables from the parent process (for example SSH_AUTH_SOCK, proxy settings, or token vars injected when launching the app from a terminal) are dropped from the snapshot. Because apply_to/apply_to_std later replace child environments entirely, pipeline commands and git invocations can lose authentication/context and fail in environments that previously worked. This regression is introduced here and is also duplicated in the blocking capture path used by git.
Useful? React with 👍 / 👎.
Summary
Spawn
\$SHELL -ilsonce per working directory, capture its environment viaenv -0to a tempfile, and cache the snapshot for an hour. Bothrun_pipeline_command(async) andgit/cli::run(sync) now apply the cached env to their subprocesses, so Hermit-managed PATH and other.zshrc-driven vars reach pipeline steps, git, LFS filters, credential helpers, and any hook-fired binaries — without the per-call shell-init cost. On macOS this fixes the silent breakage when Tauri inherits the Finder/Dock PATH (/usr/bin:/bin:/usr/sbin:/sbin).Changes
shell_envmodule (src-tauri/src/shell_env.rs): per-cwdShellEnvCachewith 1h TTL,watch-channel coalescing for concurrent first-callers, andapply_to/apply_to_stdhelpers. Module lives next to its only caller (moved out ofacp-client).run_pipeline_commandapplies the cached snapshot to a nativesh -cchild; falls back tosh -lcif capture fails.ShellEnvCache::get_blockingmirror letsgit/cli::runreuse the same snapshot map synchronously. Gatedcfg(not(test))so per-test tempdirs don't pay capture cost.InFlightGuardevicts staleInFlight(rx)entries whengetis dropped mid-await (otherwise waiters hot-spin on a dead receiver — real path because pipeline steps are cancellable).run_pipeline_command_with_cacheintegration (snapshot path, fallback path, cancellation under snapshot, working-dir survivesapply_to). Concurrency tests use awith_shell_and_ttlfake-shell seam for determinism.tokiofsfeature fromacp-client.Test plan
cargo test -p staged(crates-test green on push)pre-commithook that shells out to a Hermit tool; confirm the hook finds it🤖 Generated with Claude Code