Skip to content

feat(agents-runtime): Sandbox primitive + Docker/E2B providers + sandbox profile picker#4369

Draft
msfstef wants to merge 25 commits into
mainfrom
msfstef/agent-sandboxing-1
Draft

feat(agents-runtime): Sandbox primitive + Docker/E2B providers + sandbox profile picker#4369
msfstef wants to merge 25 commits into
mainfrom
msfstef/agent-sandboxing-1

Conversation

@msfstef
Copy link
Copy Markdown
Contributor

@msfstef msfstef commented May 20, 2026

Summary

Adds the Sandbox primitive to the agents runtime — a pluggable abstraction that isolates the filesystem, process, and network operations performed by LLM-driven tool calls — and wires it end-to-end through the runtime, agents-server, and new-session UI.

The primitive

@electric-ax/agents-runtime/sandbox exposes a Sandbox interface: exec, FS methods (readFile/writeFile/readdir/exists/remove/stat/mkdir), fetch, getUrl (port forwarding), updateNetworkPolicy, and dispose. SandboxError carries a policy | runtime | unavailable kind.

Providers

  • unrestrictedSandbox — explicit pass-through over node:fs / child_process. The name is the warning; it's the built-in default.
  • dockerSandbox — hardened container isolation via dockerode (optional peer dep). Constrains network, CPU, memory, and processes; ships an SSRF-guarded egress proxy and a read-policy. Recommended path for multi-entity hosting. Exported under the /sandbox/docker subpath.
  • remoteSandbox({provider: 'e2b'}) — adapter for E2B's npm SDK (optional peer dep). The RemoteSandboxClient interface makes adding Vercel/Daytona/etc. mechanical.

An earlier iteration shipped a nativeSandbox provider (Seatbelt/bubblewrap via @anthropic-ai/sandbox-runtime). It was removed in favor of Docker as the hardened-isolation path; the dependency is gone.

Sandbox profiles (advertise / validate / pick)

  • Runtimes register named SandboxProfiles (name, label, description?, local factory). Built-ins: local (always) and docker (only registered when the Docker daemon is reachable, so the UI never offers a non-functional choice).
  • The runtime advertises the descriptive fields (not the factory closures) to the agents-server via runner registration. New migration 0010_sandbox_profiles adds runners.sandbox_profiles and entities.sandbox.
  • Spawn requests carry sandbox.profile; the server validates the choice against the target runner's advertised set (or, for unpinned dispatch, a tenant-wide check) and rejects unserviceable choices up front.
  • The new-session UI reads the selected runner's advertised profiles and renders a picker.
  • processWake resolves the chosen profile and constructs the sandbox once per wake-session, disposing it in the outer finally (handlers must not call dispose()).

Tool refactor + security fixes (folded in)

  • All tool factories (createFetchUrlTool, read/write/edit, bash) now require a Sandbox parameter and route through it.
  • bash no longer forwards process.env to children (closes $ANTHROPIC_API_KEY exfil); bash description corrected (no longer claims to be sandboxed).
  • read/write/edit reject symlink escapes from the workspace at the tool layer (safe-path.ts).

Built-in entities (Horton, Worker) default to unrestrictedSandbox via chooseDefaultSandbox(workingDirectory). Stronger isolation is opt-in by selecting the docker profile or constructing dockerSandbox / remoteSandbox directly.

What this primitive is and is not

Targets host isolation for LLM-driven tool calls (escape of cwd, env-var exfil, arbitrary network egress, symlink traversal). It does not address prompt-injection-driven misuse of otherwise-legitimate tools.

Provider-specific limitations:

  • unrestrictedSandbox provides no host isolation — in-process tool-layer defenses (env scrubbing, symlink resolution, fetch SSRF guards) are the only protection.
  • sandbox.fetch() on remoteSandbox runs in the host Node process, not inside the VM. To route egress through the VM, use sandbox.exec('curl ...').

Test plan

  • Cross-provider conformance suite pins the Sandbox contract across providers; per-provider suites for docker (incl. live-daemon smoke), unrestricted, profiles, tool-refactor, and symlink safety.

  • Server-side spawn validation covered by electric-agents-sandbox-spawn.test.ts + runners-router.test.ts.

  • Verified locally after rebase: agents-runtime, agents-server, agents-server-ui typecheck clean; sandbox suites green (82 passed / 2 skipped, incl. Docker conformance against a live daemon); server spawn + runners suites green (26 passed).

  • CI matrix exercises the Docker path on Linux

  • Manual smoke test of remoteSandbox({provider: 'e2b'}) against a real E2B account

🤖 Generated with Claude Code

@msfstef msfstef self-assigned this May 20, 2026
@msfstef msfstef force-pushed the msfstef/agent-sandboxing-1 branch from c6a9ffc to 91303cc Compare May 20, 2026 09:02
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 73.11492% with 517 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.22%. Comparing base (8803b36) to head (91b0613).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-runtime/src/sandbox/remote/e2b.ts 0.64% 153 Missing ⚠️
packages/agents-runtime/src/sandbox/docker.ts 86.06% 80 Missing ⚠️
...ackages/agents-runtime/src/sandbox/docker/proxy.ts 55.08% 75 Missing ⚠️
...-server-ui/src/components/views/NewSessionView.tsx 0.00% 50 Missing ⚠️
...ackages/agents-runtime/src/sandbox/unrestricted.ts 84.83% 32 Missing ⚠️
packages/agents-runtime/src/sandbox/docker/fs.ts 88.84% 31 Missing ⚠️
packages/agents-runtime/src/sandbox/remote.ts 90.43% 20 Missing ⚠️
packages/agents/src/bootstrap.ts 36.00% 16 Missing ⚠️
packages/agents-runtime/src/process-wake.ts 46.15% 14 Missing ⚠️
packages/agents-server/src/entity-registry.ts 42.10% 11 Missing ⚠️
... and 8 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4369      +/-   ##
==========================================
+ Coverage   59.46%   60.22%   +0.75%     
==========================================
  Files         304      317      +13     
  Lines       30626    32460    +1834     
  Branches     8335     8776     +441     
==========================================
+ Hits        18211    19548    +1337     
- Misses      12397    12894     +497     
  Partials       18       18              
Flag Coverage Δ
packages/agents 66.78% <33.33%> (-0.75%) ⬇️
packages/agents-mcp 77.54% <ø> (ø)
packages/agents-runtime 80.04% <76.41%> (-0.64%) ⬇️
packages/agents-server 74.44% <73.80%> (-0.01%) ⬇️
packages/agents-server-ui 6.16% <0.00%> (-0.05%) ⬇️
packages/electric-ax 43.81% <ø> (ø)
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 94.39% <ø> (ø)
packages/y-electric 56.05% <ø> (ø)
typescript 60.22% <73.11%> (+0.75%) ⬆️
unit-tests 60.22% <73.11%> (+0.75%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Electric Agents Desktop Builds

Build artifacts for commit 91b0613.

Platform Status Artifact
macOS Apple Silicon Failed Unavailable
macOS Intel Failed Unavailable
Windows x64 Failed Unavailable
Linux x64 Failed Unavailable

Workflow run

@netlify
Copy link
Copy Markdown

netlify Bot commented May 20, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 8d6ed8f
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a0ee2c2f10d1200084e5a9d
😎 Deploy Preview https://deploy-preview-4369--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@msfstef msfstef force-pushed the msfstef/agent-sandboxing-1 branch from 4beddcf to 8d6ed8f Compare May 21, 2026 10:47
msfstef and others added 23 commits May 25, 2026 11:12
Introduce the Sandbox interface and an unrestrictedSandbox provider as the
plumbing for host-isolation work. No default behavior change — all built-in
entities (Horton, Worker) explicitly construct unrestrictedSandbox so they
behave identically to before. See plans/sandbox-design.md.

- New: packages/agents-runtime/src/sandbox/{types,unrestricted}.ts and the
  public /sandbox subpath aggregator.
- Tool factories (bash, read, write, edit, fetch_url) now take Sandbox
  instead of a workingDirectory string. They delegate FS/exec/fetch to it.
- bash no longer forwards process.env to children. Scrubbed env
  (PATH/HOME/USER/LANG/TERM) only. Closes env-var exfil via "echo \$KEY".
- bash description string stops claiming a sandbox that wasn't there.
- read/write/edit add realpath-based path resolution via resolveSafePath
  to block symlink-escape from the workspace.
- The standalone fetchUrlTool export is removed; callers must construct
  via createFetchUrlTool(sandbox).
- Horton/Worker construct unrestrictedSandbox per wake and dispose in a
  finally block. Conformance tests updated to the new signatures.

Tests: 48 new tests across sandbox-unrestricted, sandbox-tool-refactor,
and sandbox-tool-symlink-safety. Existing test suites for bash/write/edit
updated for the new tool signatures. Full agents-runtime suite + agents
suite green; typecheck clean across runtime, agents, and conformance-tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (PR 6b)

Adds nativeSandbox(), a Sandbox provider that wraps Anthropic's
sandbox-runtime library to enforce host isolation through OS-level
primitives (Seatbelt on macOS, bubblewrap on Linux/WSL2).

Architecture:
- New dependency: @anthropic-ai/sandbox-runtime@0.0.52 (Apache-2.0, pinned).
- src/sandbox/native.ts: implements Sandbox over SandboxManager. Translates
  our config (workingDirectory, allowedHosts, extraReadPaths) into the
  library's config shape so customers never see the library's API.
- Lazy initialization: SandboxManager is only set up on the first exec()
  call. readFile / writeFile / mkdir / fetch are enforced at the TS layer
  (path canonicalization + deny overlay; hostname allowlist for fetch).
  No proxy startup cost for handlers that don't spawn subprocesses.
- Refcount + single-instance enforcement: one workingDirectory can be
  actively exec'd through the OS sandbox at a time in one Node process.
  Concurrent exec from a conflicting workingDirectory throws
  SandboxError({kind: 'unavailable'}).
- Default deny overlay covers ~/.ssh, ~/.aws, ~/.config/{gcloud,op,gh},
  ~/.kube, ~/.docker, ~/.netrc, ~/.npmrc, ~/.pgpass, ~/.huggingface,
  and ~/Library/Application Support. Documented as incomplete in
  plans/sandbox-design.md §5.2; the v2 fix is a curated read-allowlist.
- name: 'native:macos-seatbelt' on Darwin, 'native:linux-bwrap-only'
  elsewhere — makes the bwrap-only Linux limitation legible in logs.
- Throws SandboxError({kind: 'unavailable'}) on unsupported platforms
  (Windows) with an actionable error pointing to unrestrictedSandbox or
  remoteSandbox.

Tests (test/sandbox-native.test.ts):
- Identity, FS policy (deny overlay, allowed reads/writes), fetch policy.
- Lifecycle: re-construction after dispose, concurrent-exec rejection.
- Real OS sandbox integration tests (skipped on unsupported platforms):
  basic echo, /etc/sudoers blocked, writes inside cwd allowed.

No default change for Horton/Worker — they still use unrestrictedSandbox.
PR 6d will flip the default and add the Horton home-as-cwd fix.

Also: write-tool test updated to compare canonical (realpath-resolved)
paths in readSet, matching PR 6a's symlink-safety semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds remoteSandbox(), a Sandbox provider that delegates host isolation to a
remote workspace (microVM/container) at a SaaS provider. v1 ships an E2B
adapter; additional providers (Vercel, Daytona) are mechanical to add via
the RemoteSandboxClient interface.

Architecture:
- src/sandbox/remote/types.ts: RemoteSandboxClient interface — the narrow
  contract each provider adapter implements (exec, readFile, writeFile,
  mkdir, kill).
- src/sandbox/remote/e2b.ts: createE2BClient and adaptE2B. Dynamically
  imports the 'e2b' package so it remains an *optional peer dependency*.
  Customers using the remote provider install e2b separately; no install
  cost for everyone else.
- src/sandbox/remote.ts: provider-neutral remoteSandbox factory and the
  RemoteSandbox class implementing the Sandbox interface. FS paths are
  VM-rooted (default cwd '/work'). Writes outside the working directory
  are rejected at the TS layer. dispose() calls client.kill() once;
  subsequent operations throw SandboxError({kind:'runtime'}).

The 'client' opt accepts a pre-constructed RemoteSandboxClient, used by
tests (a fake client tracks all calls and serves an in-memory FS) and by
customers who want to wrap the provider SDK with retry/observability
before handing it to us.

sandbox.fetch() runs in the host Node process with a TS-level hostname
allowlist — *not* inside the VM. Documented caveat: to route outbound
traffic through the VM, use sandbox.exec('curl ...'). v1.1 may add a
VM-routed fetch.

Tests (test/sandbox-remote.test.ts, 9 cases):
- Identity (name reflects provider).
- exec delegation with default + override cwd.
- writeFile/readFile roundtrip; writeFile outside cwd rejected.
- mkdir delegation, including recursive walk.
- fetch hostname allowlist rejection.
- dispose calls kill exactly once even on repeat.
- Unknown provider name throws SandboxError({kind:'unavailable'}).

No real e2b account/SDK is needed for the test suite — all tests use the
in-memory fake client.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(PR 6d)

Wires the native sandbox in as the default for built-in entities (Horton,
Worker) on macOS and Linux. Behavior change: LLM-driven bash/read/write/
edit/fetch_url tools now run inside Seatbelt (macOS) or bubblewrap (Linux)
by default, with the env-scrubbing + symlink-safety from PR 6a and the
default deny overlay from PR 6b.

- New: src/sandbox/default.ts — chooseDefaultSandbox(workingDirectory, env?)
  helper. Picks nativeSandbox when SandboxManager.isSupportedPlatform()
  returns true; otherwise unrestrictedSandbox.
- Panic-revert env switch: ELECTRIC_AGENTS_UNRESTRICTED=1 (also accepts
  'true'/'yes'/'on', case-insensitive) forces unrestrictedSandbox on any
  platform. Documented as the emergency lever when the native engine
  misbehaves; not promoted in customer-facing docs.
- Horton and Worker handlers replace their direct unrestrictedSandbox
  construction with a chooseDefaultSandbox call. No other change to the
  handler logic; the try/finally dispose pattern from PR 6a stays.

Tests (test/sandbox-default.test.ts, 5 cases):
- Native chosen on supported platforms.
- ELECTRIC_AGENTS_UNRESTRICTED=1 forces unrestricted.
- Case-insensitive truthy values (true, yes, on) all force unrestricted.
- Unrestricted picked when isNativeSupported() returns false (Windows
  shape via the testing override).
- ELECTRIC_AGENTS_UNRESTRICTED=0 does NOT trigger the panic switch.

The agents-desktop home-as-cwd fix (main.ts:1939 'app.getPath(home)'
fallback) is deferred to a separate, smaller desktop PR — it's a UX
change with its own implications.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…egatives

Closes two test-coverage gaps that surfaced during PR 6a-6d review.

sandbox-conformance.test.ts (20 cases):
- Parameterizes a single set of scenarios over unrestricted, native (real
  OS sandbox, gated by SandboxManager.isSupportedPlatform), and remote
  (driven by an in-memory fake matching RemoteSandboxClient).
- Asserts the cross-provider contract: writeFile+readFile roundtrip, exec
  returns an exitCode, dispose is safe, name/workingDirectory exposed,
  readFile ENOENT propagates.
- Encodes the *deliberate* semantic difference: writeFile outside cwd
  rejects for native/remote (policy-bearing providers) but succeeds for
  unrestricted (which delegates to node:fs — path security lives in the
  tool layer's resolveSafePath helper).
- Symlink-escape sub-suite for non-remote providers documents that
  unrestricted does not block symlinks at the sandbox layer (tool layer
  handles it) while native does.

sandbox-native-os.test.ts (5 cases, real OS sandbox only):
- bash does not inherit arbitrary parent env vars (closes the
  __SANDBOX_OS_TEST_SECRET__ exfil path via the OS sandbox).
- bash cannot write outside the working directory at the OS level.
- bash cannot follow a symlink whose target is in the default deny
  overlay. Comments explicitly note the v1 denylist's limitation:
  symlinks to arbitrary /tmp paths *are* readable (option 1 in
  plans/sandbox-design.md §5.2); only paths inside the deny set are
  blocked. v2 read-allowlist would change this.
- bash with no allowedHosts cannot reach the network (verifies
  https://1.1.1.1 is refused).
- readFile through the TS adapter denies known credential paths under
  home (~/.ssh, ~/.aws, ~/.config/gcloud).

Coverage gap honest-status after this commit:
- remoteSandbox against the real E2B SDK is still untested (needs an
  account). adaptE2B's type translations could drift without us noticing.
- Linux bwrap path is not exercised in CI by this machine (macOS dev env).
- Horton/Worker full integration through a fake LLM is blocked by the
  pre-existing better-sqlite3 missing-module error in packages/agents.

Test totals: 78 sandbox tests, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The e2b peer dep added in PR 6c was missing from the lockfile (an earlier
checkout reverted the install change). This commit lands the lock entries
for e2b@2.21.0 and its transitive deps so a fresh pnpm install resolves
consistently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three integration adjustments after rebasing on origin/main:

- Delete packages/agents-runtime/test/tool-path-symlink.test.ts. This was
  a characterization test from #4354 that documents pre-fix symlink-escape
  behavior with an explicit "update when realpath resolution lands" note.
  PR 6a's resolveSafePath helper is that fix; the file's expectations are
  now contradicted by sandbox-tool-symlink-safety.test.ts.

- Trim packages/agents-runtime/test/bash-tool.test.ts: the two
  characterization tests from #4354 that documented the bash env-leak bug
  are removed. PR 6a fixed that bug; sandbox-tool-refactor.test.ts has
  the corresponding assertion ('does not forward arbitrary process.env to
  children'). The first test in the file (cwd + HOME exposure) stays.

- Migrate packages/agents-runtime/test/fetch-url-ssrf.test.ts to the new
  createFetchUrlTool(sandbox, opts) signature. The assertions still hold
  for unrestrictedSandbox (NetPolicy SSRF protection is deferred); the
  test is now explicit about that scope.

- Remove the @ts-expect-error directive on the dynamic e2b import in
  src/sandbox/remote/e2b.ts. With e2b now in the lockfile, TS resolves
  the package and the directive is unused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…proxy

Previously sandbox.fetch() on nativeSandbox enforced its hostname policy
via a manual `Set<string>.has(url.hostname)` exact-match check that
duplicated (badly) what `@anthropic-ai/sandbox-runtime`'s HTTP proxy
already does for subprocess egress.

The two pathways now share one policy enforcer (the library's proxy)
with consistent semantics:
- Wildcard patterns (e.g. `*.example.com`)
- IP canonicalization (e.g. `2852039166` → `169.254.169.254`)
- Denied-domains taking precedence over allowed
- Control-character host rejection
- IPv6 zone-ID payload rejection

Implementation:
- Add `undici` as a direct dependency for `ProxyAgent`.
- On `ensureInitialized`, read `SandboxManager.getProxyPort()` and build
  a `ProxyAgent('http://127.0.0.1:PORT')` dispatcher.
- `sandbox.fetch()` passes `{ dispatcher }` to global fetch so undici
  routes the request through the same proxy that gates `bash`-emitted
  egress.
- A 403 with `x-srt-denied` header or undici proxy-refusal error is
  translated to `SandboxError({kind: 'policy'})` so callers still see a
  consistent policy-rejection shape.
- `dispatcher.close()` runs in `dispose()` to release sockets.

Linux Unix-socket gap documented: `getLinuxHttpSocketPath()` returns
a Unix socket on Linux which `ProxyAgent` does not consume directly.
For now sandbox.fetch on Linux falls back to direct (non-proxy)
network access. exec-driven egress on Linux still routes through the
proxy correctly via the bind-mounted unix socket. A custom undici
dispatcher targeting the unix socket would close this; tracked for a
follow-up.

Tests (test/sandbox-native-proxy-fetch.test.ts, 3 cases):
- Allowed host through the proxy reaches a local HTTP server.
- Disallowed host is refused with SandboxError({kind:'policy'}).
- Wildcard patterns in allowedHosts (e.g. '*.example.com') are accepted
  by the library's validator and config — confirming we no longer
  shadow the matcher with our own naïve exact-match check.

Existing tests in sandbox-native.test.ts now exercise the proxy
rejection path end-to-end (a real undici/proxy round-trip), not a
synthetic Set.has() check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e kill

Two CI-surfaced bugs:

1. **nativeSandbox crashed on Linux runners without `bubblewrap` installed.**
   `SandboxManager.isSupportedPlatform()` returns true on any Linux but
   the actual `initialize()` call throws when bwrap isn't on PATH. Tests
   gated on `isSupportedPlatform()` ran on the Linux runner and exploded
   instead of skipping.

   Fix: in the nativeSandbox factory and in chooseDefaultSandbox, call
   `SandboxManager.checkDependencies()` and surface a missing dependency
   as `SandboxError({kind: 'unavailable'})` *before* `initialize()`.
   The test gates (sandbox-native, sandbox-native-os,
   sandbox-native-proxy-fetch, sandbox-conformance, sandbox-default)
   also use `checkDependencies()` so they skip cleanly on hosts where
   the runtime tools aren't installed.

   chooseDefaultSandbox now falls back to unrestrictedSandbox on a
   Linux host without bwrap rather than throwing — keeps the "default
   to native, panic to unrestricted" contract from PR 6d intact even
   when the native engine is unusable.

2. **timeoutMs test hung on Linux until vitest's 5s default fired.**
   `spawn('sh', ['-c', 'sleep 5'])` then `child.kill('SIGTERM')` kills
   `sh` immediately but leaves `sleep` orphaned, still holding the
   stdio pipes — the `close` event waits for the grandchild to finish
   naturally. macOS happened to terminate the tree differently so the
   bug only surfaced on the Ubuntu runner.

   Fix: spawn with `detached: true` to create a new process group, then
   send the signal to `-pid` so the whole tree dies. SIGTERM first,
   escalating to SIGKILL after 500ms if anything is still hanging on.
   Applied symmetrically in unrestricted.ts and native.ts.

Also: adds the missing changeset entry (`Check Changeset` CI failure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The factory's eager `checkDependencies()` (added in the previous commit
to surface 'unavailable' clearly to users) means that ALL tests in
sandbox-native.test.ts crash on a host without bubblewrap, not just the
ones that actually exec under the OS sandbox. The CI Linux runner
exposed this — the inner `identity`, `filesystem policy`, `fetch
policy`, and `lifecycle` describes were previously running on the lazy
assumption and need the outer gate now.

Coverage of the same TS-policy assertions remains on unsupported hosts
via sandbox-conformance.test.ts's unrestricted + fake-remote providers,
which are gated per-provider.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ AbortSignal

Additive interface changes — no rename of exec, no namespacing. Brings the
adapter contract in line with the May 2026 industry LCD (Vercel/Cloudflare/
E2B/Daytona/ComputeSDK) so future adapters and tools can rely on a broader
filesystem surface.

- types.ts: add readdir/exists/remove/stat to Sandbox, DirEntry/FileStat,
  signal?:AbortSignal on SandboxExecOpts.
- unrestricted/native: implement new methods. AbortSignal escalates SIGTERM
  then SIGKILL through the existing kill-tree path. FS errors normalized to
  SandboxError('runtime') at the adapter boundary so conformance assertions
  are stable across providers.
- remote + RemoteSandboxClient: extend contract; E2B adapter prefers
  files.list/exists/remove/getInfo when available and falls back to shell
  commands (BusyBox/GNU stat compatible) for older SDK shapes.
- write tool: switch read-before-write existence probe to sandbox.exists()
  rather than ENOENT detection on readFile.
- conformance: scenarios for exists/stat/readdir/remove/remove-recursive and
  an exec(AbortSignal) abort case (skipped semantically for the in-memory
  remote fake).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review pass on the previous commit flagged that throwing on policy-denied
paths in exists() drifts from the 2026 LCD semantics shared by Vercel,
Cloudflare, and E2B — they all treat exists() as a safe-probe that returns
false in both the missing and denied cases. Flipping native to match;
unrestricted has no policy boundary so its behavior is unchanged.

Also adds SandboxExecResult.aborted so callers can disambiguate caller-side
AbortSignal cancellation from naturally-delivered signals and from timeoutMs
expiry — the OS signal field is unreliable for that purpose under musl /
on Alpine builds.

E2B shell fallbacks hardened: readdir now uses `find -print0` to be
newline-safe and preserves the file/dir/symlink distinction via `%y`;
stat() validates a 3-field output structure before parsing instead of
unioning two mutually-incompatible formats; failure paths synthesize
classified errno codes (ENOENT/EACCES/EIO) so SandboxError messages are
never blank.

Conformance gains four cases: stat on missing path, remove on missing
path, remove on non-empty dir without recursive, and a pre-aborted exec
that returns immediately. The mid-flight abort test now asserts the new
aborted boolean. Remote (fake) is skipIf'd on abort cases since the fake
client has no abort plumbing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…licy)

Two interface additions that line the Sandbox contract up with the 2026
LCD (Vercel/Cloudflare/E2B publish both as first-class methods).

- types.ts: NetworkPolicy discriminated union (allow-all/deny-all/allowlist),
  getUrl({port, protocol}) and updateNetworkPolicy(policy) on Sandbox.
- unrestricted: getUrl returns loopback URL; updateNetworkPolicy records
  policy without enforcement (documented no-op — unrestricted has no
  boundary).
- native: getUrl returns loopback (Seatbelt + bwrap both leave 127.0.0.1
  reachable from inside). updateNetworkPolicy wires SandboxManager
  .updateConfig() — the library API confirmed at sandbox-manager.d.ts:36
  exposes mid-session reconfiguration of the MITM proxy's allowedDomains.
  NativeSandboxOpts.allowedHosts is preserved but deprecated in favor of
  initialNetworkPolicy; the latter wins when both supplied.
- remote: TS-side allowedHosts Set replaced with a NetworkPolicy state
  machine; updateNetworkPolicy reconfigures the host-process gate and
  logs once that VM-side egress is not reconfigured (E2B doesn't expose
  the necessary API). getUrl delegates to a new optional client.getUrl()
  hook so the contract remains pluggable.
- RemoteSandboxClient: add optional getUrl({port, protocol}); falls back
  to SandboxError('unavailable') when absent.
- conformance: add ProviderCapabilities descriptor (supportsAbort/
  supportsRealGetUrl/enforcesNetworkPolicy) so per-provider quirks become
  declarative instead of name-string branching. Two new scenarios:
  getUrl returns a port-bearing URL (or rejects unavailable), and
  updateNetworkPolicy(deny-all) flips subsequent fetch to policy-rejected.
  Abort-skip predicates migrated to capability checks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A fourth sandbox provider that runs each Sandbox instance inside a
dedicated Docker container. Targets local development and self-hosted
deployments where neither macOS Seatbelt nor a paid microVM provider is
appropriate. Built on dockerode (added as an optional peer dependency).

Hardening
---------
The HostConfig is hardcoded and not overrideable from DockerSandboxOpts:

  - CapDrop: ['ALL'], CapAdd: []   — no caps means mount/chroot/su fail
  - SecurityOpt: ['no-new-privileges:true']
  - Privileged: false
  - PidsLimit / Memory / NanoCpus  — sensible defaults, opt-out via opts.resources
  - Ulimits: nofile=2048, nproc=1024
  - IpcMode: 'none'
  - AutoRemove: true (ephemeral)
  - NetworkMode: 'none' by default; switches to 'bridge' only when an
    allowlist or exposedPorts is set
  - Default image pinned by digest (node:20-alpine multi-arch manifest)
  - extraMounts entries enforce readOnly: true at the type level and
    reject any hostPath matching /docker\.sock$/ at runtime so the docker
    socket cannot be exposed back to sandboxed code.

ReadonlyRootfs is intentionally *not* enabled by default — dockerode's
putArchive operates at the storage-driver layer and gets rejected on a
read-only rootfs even when the target is a tmpfs/volume. Operators who
want it must drive all writes via sandbox.exec.

Filesystem
----------
src/sandbox/docker/fs.ts provides putFile/getFile via dockerode's
putArchive/getArchive (small in-memory tar writer, no extra dep) and
exec-based readdir/exists/remove/stat. readdir does three POSIX
`find -type X -print0` passes so it works on both GNU find and BusyBox
(alpine) — newline-safe via NUL delimiting.

Network
-------
src/sandbox/docker/proxy.ts ships a minimal HTTP/HTTPS forward proxy
(~150 LoC, node:http + CONNECT) with a dynamic allowlist that
updateNetworkPolicy mutates in place. Container env is set with
HTTP_PROXY=http://host.docker.internal:<port> so tools that respect
proxy env vars (curl, python requests, undici with ProxyAgent, browser
clients) route through it. Programs that bypass HTTP_PROXY (raw TCP,
Node's built-in fetch without explicit setGlobalDispatcher) leak
through Docker's NAT — documented gap; v2 needs a sidecar netns / nft
filter for full sealing.

Lifecycle
---------
One long-lived container per Sandbox instance (PID 1 = sleep keepalive).
Each exec uses container.exec() with stream demux. timeoutMs and
AbortSignal both wire to a kill-everything-but-PID-1 helper that
enumerates /proc and SIGKILLs to side-step a dockerode stream-close
race. dispose() removes the container even if AutoRemove already fired.

Testing
-------
- test/helpers/docker-probe.ts: top-level await isDockerAvailable() so
  describe.skipIf works at import time. Tests skip cleanly with a
  warning when the daemon is unreachable.
- test/sandbox-docker.test.ts: 13 integration tests against real
  Docker — roundtrip, hardening (caps, docker.sock, mount, chroot),
  timeouts, AbortSignal, port forwarding, policy enforcement at the
  proxy boundary, leftover-container sweep. Runs in <10s on a warm
  machine.
- test/sandbox-docker-smoke.test.ts: ad-hoc smoke probes that exercise
  CapEff/CapPrm/CapBnd ≡ 0, container /etc/passwd isolation, /Users
  invisibility, raw-CONNECT proxy allow vs deny.

All 132 sandbox tests pass on macOS + OrbStack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cker

Add `KNOWN_ADAPTERS` const to the public sandbox barrel and assert in the
conformance suite that every slug is exercised by exactly one provider —
adding a new adapter without registering it in the conformance suite
will now fail CI.

Wire the docker adapter into the cross-provider conformance loop with
a `dockerAvailable`-gated `enabled` flag. The describe gating relies on
the existing top-level await probe at `test/helpers/docker-probe.ts`
(skips clean on machines without Docker, no CI workflow change needed —
Ubuntu runners have Docker pre-installed, macOS doesn't and skips
gracefully).

Replace remaining provider-name string equality checks (`provider.name
=== 'remote (fake)'`, `=== 'unrestricted'`) with two declarative axes
on `ProviderFactory`:
  - `adapter: KnownAdapter` — KNOWN_ADAPTERS slug, used by the symlink
    test branch and the unrestricted-no-policy branch
  - `outsideKind: 'host-tempdir' | 'etc-passwd'` — controls which path
    the "writeFile outside the working directory" probe uses, replacing
    the brittle name match

73 conformance scenarios now run against the docker provider (in ~12s
on a warm machine, OrbStack on M-series). All 151 sandbox tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…low-all

Address the high-severity findings from the final security review.

Docker read-side policy (R1)
----------------------------
readdir, stat, and readFile now call assertReadable() to enforce the
workingDirectory boundary, matching native's behavior. Previously
docker.readdir('/etc') would silently succeed and enumerate the
container's filesystem — only writeFile / mkdir / remove were gated.
exists() also goes through the new isReadable() helper but returns
false on denial (safe-probe semantics, consistent with the rest of
the adapter set).

Proxy SSRF guards (R2)
----------------------
The docker allowlist proxy now refuses CONNECT and plain-HTTP requests
to literal RFC1918 (10/8, 172.16/12, 192.168/16, 100.64/10 CGNAT),
loopback (127/8), unspecified (0/8), link-local + cloud metadata
(169.254/16, 169.254.169.254 AWS/GCP), IPv6 loopback (::1), and IPv6
link-local / unique-local (fe80::, fc::, fd::). The guard runs after
the hostname allowlist and rejects regardless of the policy decision —
even if a user explicitly allows 169.254.169.254 they cannot reach it.

Plain-HTTP proxying now overrides the caller-supplied Host header with
the target's authority, so an attacker can no longer split an
allowlisted absolute URL from a different vhost via the Host header.
proxy-authorization and proxy-connection hop-by-hop headers are also
stripped before forwarding.

Port bindings bind to 127.0.0.1 only (was 0.0.0.0) so dev-machine
sandboxes don't expose services across the LAN.

Native allow-all (correctness)
------------------------------
The upstream @anthropic-ai/sandbox-runtime config validator rejects
bare '*' in network.allowedDomains as "too broad". Our previous
policyToAllowedDomains returned ['*'] for mode:'allow-all', which would
have silently broken init. Throw SandboxError('unavailable') with a
pointer to unrestrictedSandbox instead.

New conformance — sandbox-docker.test.ts:
- "read-side methods enforce the working directory boundary": asserts
  readFile / readdir / stat throw policy and exists() returns false
  for /etc paths.

New smoke — sandbox-docker-smoke.test.ts: even when explicitly allowed,
CONNECT to 169.254.169.254, 127.0.0.1, and 10.0.0.1 all return 403.

149 + 2 new tests pass; conformance still green across all 4 adapters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The underlying SandboxManager from @anthropic-ai/sandbox-runtime is a
process-global singleton: two nativeSandbox instances with different
working directories conflict and throw SandboxError('unavailable'). The
agents-runtime hosts many agent entities concurrently, each with its own
working directory, so this constraint is incompatible with the product.

dockerSandbox now covers the strong-isolation use case (no singleton,
multi-instance safe). unrestrictedSandbox + tool-layer policy (env
scrubbing, symlink resolution, fetch SSRF guards) covers the dev case.

- Delete src/sandbox/native.ts and the three native test files.
- Drop 'native' from KNOWN_ADAPTERS; drop nativeSandbox /
  NativeSandboxOpts / ChooseDefaultSandboxOpts exports.
- Simplify chooseDefaultSandbox to always return unrestrictedSandbox.
  Remove the ELECTRIC_AGENTS_UNRESTRICTED env var — it only existed to
  revert from native to unrestricted, which is now the default.
- Drop the native provider entry from the conformance suite; the
  KNOWN_ADAPTERS round-trip assertion now covers unrestricted/remote/docker.
- Drop @anthropic-ai/sandbox-runtime from dependencies; regenerate
  pnpm-lock.yaml.
- Update sandbox-design.md and the changeset to reflect the new lineup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
processWake now constructs ctx.sandbox at wake-session start (via the
entity-type's defaultSandbox factory, falling back to the runtime-level
default) and disposes it in the outer finally. Inter-wake state
preservation is the provider's responsibility: remote provider
factories derive their reattach identity from entityUrl so ephemeral
hosts (Cloudflare Workers, Lambda) re-find warm sandboxes across cold
starts. Local providers (docker, unrestricted) pay full create/dispose
per wake-session — accepted for v1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vite-bundled callers (the desktop renderer) that import
@electric-ax/agents-runtime/sandbox were pulling dockerode and its
native dependencies (cpufeatures.node, ssh2) into their bundle via the
docker provider's re-exports. Move dockerSandbox, DockerSandboxOpts,
and isDockerAvailable to a separate /sandbox/docker subpath so only
callers that actually use the docker provider pay for it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runners now advertise the sandbox profiles they support, the UI lets
a user pick one when spawning an entity, and the chosen profile is
persisted on the entity row and consumed at wake time.

- Runtime adds `SandboxProfile` (`name`, `label`, `description?`,
  `factory`). `createRuntimeRouter({ sandboxProfiles })` registers
  the set; `processWake` looks up the profile named on
  `entity.sandbox.profile` and falls back to `unrestrictedSandbox`
  at cwd when nothing was selected.
- Server-side: extend `runners.sandbox_profiles` (jsonb) so each
  runner declares its advertised set; extend the spawn body with
  `sandbox: { profile }` and persist on a new `entities.sandbox`
  column. Spawn-time validation checks the profile against the
  pinned runner's set (per-runner) or, for unpinned dispatch, the
  tenant-wide set. Bad picks reject with 400 instead of failing late
  on first wake.
- Runner registration (`POST /_electric/runners`) accepts
  `sandbox_profiles`; built-in agents server forwards
  `runtime.sandboxProfileDescriptors` so its bundled runner
  advertises `local` (always) and `docker` (when the daemon is
  reachable). Horton's docker profile auto-mounts the user's
  `workingDirectory` read-write and runs nothing-mounted when none
  is set.
- UI: provider syncs `runners.sandbox_profiles` via the existing
  Electric runners shape and threads it through the new-session
  view. Horton's composer pill defaults to the first profile;
  full-schema spawn form uses a new `extraRows` slot on
  `SchemaForm`. The entity timeline header surfaces the active
  profile as a read-only badge.
- Docker sandbox's `extraMounts` relaxes `readOnly: true` (literal)
  to `readOnly?: boolean` so the Horton mount can be RW.

Entity-type-level sandbox policy is intentionally omitted: profiles
are a per-runner concern, and any entity dispatched to a runner can
use any profile that runner advertises. Type-level "must run in
docker" enforcement is a one-liner inside the handler when needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rebasing the sandbox branch onto main pulled in main's dependency
drift. Rebase the lockfile on main's resolutions and let pnpm add only
the sandbox peer deps (dockerode, e2b, @types/dockerode) so existing
ranges (notably @tanstack/db@0.6.6) are not bumped into breaking
versions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sandbox design doc under plans/ was removed earlier in the branch,
leaving dangling pointers in the Sandbox primitive doc comment, the
remoteSandbox doc comment, and the changeset. Remove the pointers and
fold the still-relevant guidance inline; expand the changeset to cover
the sandbox-profile picker work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@msfstef msfstef force-pushed the msfstef/agent-sandboxing-1 branch from b4082a4 to 91b0613 Compare May 25, 2026 08:58
@msfstef msfstef changed the title feat(agents-runtime): Sandbox primitive + native (Seatbelt/bwrap) and E2B remote providers feat(agents-runtime): Sandbox primitive + Docker/E2B providers + sandbox profile picker May 25, 2026
msfstef and others added 2 commits May 25, 2026 14:59
Introduce a sandbox identity (sandboxKey) that defaults to the entity URL
but can be shared, so an agent and the subagents it spawns operate on one
filesystem.

- runtime: SandboxFactoryParams gains sandboxKey/shared; process-wake
  resolves them from the entity's sandbox; SpawnSandboxOption +
  ctx.spawn(..., { sandbox: 'inherit' }).
- docker provider: persistent, reattachable containers keyed by reuseKey,
  a process-local refcounted live-container registry so concurrent sibling
  wakes share one container, and reapIdleDockerSandboxes for idle cleanup.
- server: spawn accepts sandbox { profile, key, inherit }; resolveSandbox
  ForSpawn resolves inherit (a graceful no-op when the parent has no shared
  sandbox) behind a single-runner co-location guard.
- agents: spawn_worker dispatches the worker into the parent's sandbox.
- fix: include sandbox_profiles in the runners shape-proxy column allowlist
  so advertised profiles actually reach the UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- add dockerode as a dependency and externalize it (plus its native
  ssh2/cpu-features deps) from the Electron main bundle, which
  inlineDynamicImports would otherwise try to bundle and fail on.
- UI: key each session's sandbox by its URL so it persists across the
  agent's wakes and subagents inherit the same container; render the
  picker's selected option by label rather than the raw profile name;
  coerce a null/absent sandbox_profiles column to an empty list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant