Skip to content

feat(staged): cache interactive-login-shell env per project for pipelines and git ops#744

Open
matt2e wants to merge 7 commits into
mainfrom
pipeline-git-ops
Open

feat(staged): cache interactive-login-shell env per project for pipelines and git ops#744
matt2e wants to merge 7 commits into
mainfrom
pipeline-git-ops

Conversation

@matt2e
Copy link
Copy Markdown
Contributor

@matt2e matt2e commented May 20, 2026

Summary

Spawn \$SHELL -ils once per working directory, capture its environment via env -0 to a tempfile, and cache the snapshot for an hour. Both run_pipeline_command (async) and git/cli::run (sync) now apply the cached env to their subprocesses, so Hermit-managed PATH and other .zshrc-driven vars reach pipeline steps, git, LFS filters, credential helpers, and any hook-fired binaries — without the per-call shell-init cost. On macOS this fixes the silent breakage when Tauri inherits the Finder/Dock PATH (/usr/bin:/bin:/usr/sbin:/sbin).

Changes

  • shell_env module (src-tauri/src/shell_env.rs): per-cwd ShellEnvCache with 1h TTL, watch-channel coalescing for concurrent first-callers, and apply_to / apply_to_std helpers. Module lives next to its only caller (moved out of acp-client).
  • Pipeline integration: run_pipeline_command applies the cached snapshot to a native sh -c child; falls back to sh -lc if capture fails.
  • Git CLI integration: new ShellEnvCache::get_blocking mirror lets git/cli::run reuse the same snapshot map synchronously. Gated cfg(not(test)) so per-test tempdirs don't pay capture cost.
  • Cancellation fix: an InFlightGuard evicts stale InFlight(rx) entries when get is dropped mid-await (otherwise waiters hot-spin on a dead receiver — real path because pipeline steps are cancellable).
  • Test suite: four-wave coverage — TTL expiry, invalidate semantics, single-quote escaping edge cases, working-dir/HOME/USER passthrough, concurrent coalescing, cancellation, error propagation, tempfile cleanup, NUL-framed values, and run_pipeline_command_with_cache integration (snapshot path, fallback path, cancellation under snapshot, working-dir survives apply_to). Concurrency tests use a with_shell_and_ttl fake-shell seam for determinism.
  • Cleanup: drop now-unused tokio fs feature from acp-client.

Test plan

  • cargo test -p staged (crates-test green on push)
  • crates-fmt / crates-lint / differ-ci / staged-ci all green on push
  • Manual: launch staged from Finder on macOS, run a pipeline step that needs a Hermit-shimmed binary; confirm it resolves
  • Manual: run a commit through staged with a pre-commit hook that shells out to a Hermit tool; confirm the hook finds it

🤖 Generated with Claude Code

matt2e added 7 commits May 19, 2026 17:22
…ine steps

Spawn `$SHELL -ils` once per working directory, capture its environment via
`env -0` to a tempfile, and cache the snapshot for an hour. `run_pipeline_command`
now applies the cached env to a native `sh -c` subprocess, so Hermit-managed
PATH and other `.zshrc`-driven env vars are visible to pipeline steps (and any
git hooks they trigger) without paying the per-call shell-init cost. Concurrent
first-callers for the same working directory share a single capture via a
`watch` channel. Falls back to the old `sh -lc` behaviour if capture fails.

Signed-off-by: Matt Toohey <contact@matttoohey.com>
Per-cwd interactive-login-shell env capture is general-purpose pipeline
plumbing, not ACP-shaped: `apps/staged/src-tauri/src/session_runner.rs`
is the sole consumer and `acp-client` itself never calls `ShellEnvCache`.
Colocate the module with its only caller and drop the cross-crate
re-export.

No behaviour change. Tempfile prefix renamed `acp-shell-env-*` →
`staged-shell-env-*`. Adds the `fs` feature to `src-tauri`'s tokio
features for `tokio::fs::read`/`remove_file`.

Signed-off-by: Matt Toohey <contact@matttoohey.com>
`shell_env` moved to `src-tauri` in 68f3fef, taking the only consumers
of `tokio::fs::read`/`remove_file` with it. Drop the now-orphaned `fs`
feature so `acp-client`'s tokio features match what it actually uses.

No behaviour change.

Signed-off-by: Matt Toohey <contact@matttoohey.com>
`ShellEnvCache::get` inserts a `CachedEntry::InFlight(rx)` into the map
before awaiting `capture_shell_env`, then promotes it to `Ready` (or
removes it) in the `Ok`/`Err` arms. Neither arm runs when the future is
dropped mid-await — and `run_pipeline_command` awaits `get` inside a
cancellable pipeline step, so cancellation is a real path.

When that happened, `tx` dropped without sending, but the `InFlight(rx)`
entry stayed in the map. Subsequent callers cloned the dead receiver,
saw `rx.changed()` return `Err`, fell through with `rx.borrow()` still
`None`, and retried the outer loop — finding the same dead entry. Hot
spin until the cache was dropped.

Bind the `InFlight` lifetime to a stack-allocated `InFlightGuard` whose
`Drop` removes the entry unless it has been promoted to `Ready`. Declared
inside the `Capture` arm body after the `tx` match binding, so on
cancellation the guard drops first (evict) and `tx` drops second (signal
waiters `Err`). Waiters wake, see no entry on retry, and become the next
Capturer. Also handles a panic inside `capture_shell_env` for free, and
lets the explicit `map.remove` in the `Err` arm go away.

Regression test: spawn a capturer, `yield_now` so it inserts `InFlight`
and parks inside `capture_shell_env`, abort it, then assert (a) the map
has no `InFlight` for the dir and (b) a subsequent `get` completes within
a bounded timeout instead of spinning.

Signed-off-by: Matt Toohey <contact@matttoohey.com>
`git/cli::run` is the central choke point for every `cli::run` callsite
in `apps/staged/src-tauri/src/git/*` — commits, worktree adds/removes,
checkouts, fetches, fast-forward merges, cherry-picks, clones via the
`git clone` fallback path in `github.rs`. All of those previously
inherited the Tauri process's PATH, which on macOS launched from
Finder/Dock is `/usr/bin:/bin:/usr/sbin:/sbin` — so Hermit-shimmed git,
LFS filters, credential helpers, and any binary invoked by a fired hook
(pre-commit, commit-msg, post-commit, post-checkout, post-merge) would
silently miss anything `~/.zshrc` would have put on PATH.

Add `ShellEnvCache::get_blocking`, a synchronous mirror of `get` that
reuses the same per-cwd `Ready` snapshot map. On a fresh miss it spawns
`$SHELL -ils` synchronously and stores the result; it does not engage
the `InFlight` watch-channel coalescing — a sync caller that races an
async caller will do its own capture and overwrite, which is harmless
because both shells produce equivalent env. `cli::run` now applies the
snapshot via the new `ShellEnv::apply_to_std`, falling back to the
prior `strip_git_env`-on-inherited-env behaviour if capture fails.

The integration is gated `cfg(not(test))` because each unit test runs
in a fresh tempdir, so the cache always misses and would add hundreds
of ms per test for zero test value. Production builds always take the
cache path.

Factor `dump_path`/`dump_script`/`parse_env_dump` out of
`capture_shell_env` so the new `capture_shell_env_blocking` shares the
same tempfile naming, env-dump script, and parser — keeping the two
capture paths in lockstep. Tempfile prefix unchanged.

Also lands two testability seams that were already staged in the
working tree from the test plan: `ShellEnvCache::with_shell_and_ttl`
(point the cache at a hermetic fake shell) and
`run_pipeline_command_with_cache` (inject a pre-seeded cache instead
of the global singleton). No production behaviour change from either.

Signed-off-by: Matt Toohey <contact@matttoohey.com>
Implements the four-wave test plan for the per-cwd `ShellEnvCache` and the
`run_pipeline_command_with_cache` integration seam landed in 0e9ad29. The
existing tests covered the happy path and the cancellation regression from
9e6e4c8, but left concurrency, TTL expiry, failure modes, fallback
behavior, and tempfile cleanup unverified.

Wave 1 (pure unit, no fake shell):
- `ttl_expiry_triggers_recapture` — exercises the stale-`Ready` branch
- `invalidate_all_clears_every_entry` / `invalidate_only_affects_target_dir`
- `different_dirs_cached_independently`
- `apply_to_clears_existing_env` — locks in `env_clear` against a future
  drop, complementing the existing PATH-is-present assertion
- `apply_to_then_env_lets_caller_override` — docstring contract on
  post-`apply_to` overrides
- `single_quote_empty_input` / `single_quote_backslash_unchanged` /
  `single_quote_pure_quote`

Wave 2 (single capture):
- `working_dir_argument_is_honoured` — `PWD` matches `working_dir`
- `home_and_user_passed_through` — guards the `env_clear` whitelist
- `promoted_guard_keeps_ready_entry` — locks `promoted = true` semantics

Wave 3 (concurrency + failure modes; uses the `with_shell_and_ttl` seam):
- `concurrent_first_callers_share_one_capture` — all callers see the same
  `captured_at` (watch-channel coalescing)
- `waiter_wakes_when_capturer_cancels` — bounded-timeout assertion that
  a Waiter doesn't stall when the Capturer is aborted
- `capture_err_propagates_and_is_not_cached` — both concurrent waiters see
  `Err` and a third call recaptures (no negative caching)
- `tempfile_cleaned_up_on_success` / `tempfile_cleaned_up_on_failure` —
  set-difference snapshot of `staged-shell-env-*` so parallel tests don't
  interfere
- `values_with_newlines_round_trip` — locks in NUL-framing over
  line-based parsing

Wave 4 (`run_pipeline_command_with_cache` integration):
- `snapshot_path_cached_env_reaches_child` — child sees a token baked into
  the cached snapshot
- `fallback_path_when_cache_returns_err` — falling back to `sh -lc` when
  capture fails still runs the command
- `cancellation_under_snapshot_branch` — `kill_on_drop` + select arm
  still cancel under the snapshot path (the existing test mostly hits the
  fallback branch)
- `current_dir_survives_apply_to` — `pwd` reports the requested dir

Pre-warm the global cache in `pipeline_command_cancellation_stops_current_step`
so its 4-second deadline measures pure cancellation latency rather than the
first-time shell-init cost, which can blow the budget under the heavier
parallel-test load this PR adds.

All wave-3/4 fake-shell tests are `#[cfg(unix)]` (PermissionsExt) and use
a tempfile-backed script so they don't depend on the developer's `$SHELL`
or `.zshrc`.

Signed-off-by: Matt Toohey <contact@matttoohey.com>
The `with_shell_and_ttl` seam was introduced in 0e9ad29 specifically so
concurrency tests could be deterministic, but `concurrent_first_callers_share_one_capture`
and `waiter_wakes_when_capturer_cancels` still used `ShellEnvCache::new()`
and the real `$SHELL`. On a fast capture (the common case on dev machines
with a warm cache), call 0 finished before calls 1–7 arrived, so B5's
"all `captured_at` equal" assertion passed via the cache-hit path without
ever exercising the watch-channel coalescing branch.

Point both tests at a fake shell whose `sleep 0.5` guarantees the
Capturer parks inside `capture_shell_env` while the other 7 callers (B5)
or the parallel Waiter (B6) arrive on the same `InFlight(rx)`. B6's
recapture path becomes fast and deterministic too, so its post-abort
timeout shrinks from 30s to 3s — wide enough for the expected ~600ms
path (abort → guard evicts InFlight → waiter retries → new capture's
`sleep 0.5` + parse) but tight enough to catch slower-but-not-infinite
regressions, not just the hard hangs the cancellation fix originally
targeted.

Also drop `single_quote_pure_quote` (F19): the existing
`single_quote_escapes_quotes` already covers the close/escape/open
dance via the mid-string `'quote'` case, so F19 just costs a line.

Signed-off-by: Matt Toohey <contact@matttoohey.com>
@matt2e matt2e requested review from baxen and wesbillman as code owners May 20, 2026 06:04
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d643078c87

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +309 to +312
.env_clear()
.env("HOME", std::env::var("HOME").unwrap_or_default())
.env("USER", std::env::var("USER").unwrap_or_default())
.env("SHELL", shell)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve inherited env vars when capturing shell snapshots

The capture process starts from env_clear() and re-adds only HOME, USER, and SHELL, so any session-scoped variables from the parent process (for example SSH_AUTH_SOCK, proxy settings, or token vars injected when launching the app from a terminal) are dropped from the snapshot. Because apply_to/apply_to_std later replace child environments entirely, pipeline commands and git invocations can lose authentication/context and fail in environments that previously worked. This regression is introduced here and is also duplicated in the blocking capture path used by git.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant