diff --git a/docs/plans/native-kubernetes-cleanroom.md b/docs/plans/native-kubernetes-cleanroom.md
new file mode 100644
index 00000000..70c66164
--- /dev/null
+++ b/docs/plans/native-kubernetes-cleanroom.md
@@ -0,0 +1,730 @@
+# Native Kubernetes Cleanroom Plan
+
+**Status:** Proposed
+**Last reviewed:** 2026-06-01
+**Spec references:** `docs/api.md`, `docs/backends.md`, `docs/policy.md`, `docs/caching.md`, `docs/remote-access.md`, `docs/observability.md`
+**Related plans:** `docs/plans/layered-caching.md`, `docs/plans/zfs-stage-cache-replication.md`, `docs/plans/multi-principal-control-server.md`, `docs/plans/sandbox-suspend-wake.md`
+
+## Summary
+
+Make Cleanroom a Kubernetes-native sandbox runtime instead of a daemon that
+creates hidden VMs behind Kubernetes' scheduler.
+
+Kubernetes should own desired state, scheduling, bin packing, coarse tenancy,
+load balancing, lifecycle visibility, and cluster-level failure recovery.
+Cleanroom should own the runtime semantics that are specific to secure
+repository sandboxes: immutable repository policy, microVM isolation,
+deny-by-default egress, host-side gateway and cache mediation, exact-principal
+resource ownership, execution and file APIs, snapshots, and backend-specific
+runtime setup.
+
+The preferred northbound integration is the Kubernetes SIG Agent Sandbox API.
+Cleanroom should fit underneath that API as a runtime implementation before
+inventing a competing agent-facing object model. If Cleanroom later needs its
+own CRDs, they should follow the same Kubernetes conventions and stay narrow.
+
+KubeVirt is not a replacement for this split. It is a possible VM substrate
+under Cleanroom for clusters that want Kubernetes-managed QEMU/KVM VMs.
+Firecracker remains the first backend target because Cleanroom already has
+Firecracker runtime, gateway, and cache behavior.
+
+## Problem
+
+Cleanroom's current control model is server-oriented. The server process is
+authoritative for sandbox lifecycle and execution state, and clients create
+sandboxes through the Cleanroom API. That model works for local hosts and
+shared servers, but it is awkward in Kubernetes at large scale.
+
+The main failure mode is hidden capacity. If a small Kubernetes pod or central
+service can start arbitrary VMs on a node, the Kubernetes scheduler cannot make
+correct placement decisions. CPU, memory, disk, KVM capacity, tap devices, ZFS
+clone capacity, cache locality, and node-specific backend support become
+invisible to bin packing, quotas, autoscaling, preemption, and disruption
+handling.
+
+A Kubernetes-native design should make every sandbox visible as desired state
+and as a schedulable resource envelope before the VM starts. The node runtime
+should reconcile only the sandboxes Kubernetes assigned to that node.
+
+## Goals
+
+- Represent each Cleanroom sandbox as one Kubernetes-schedulable unit.
+- Let Kubernetes perform normal placement, bin packing, quota enforcement,
+  topology spreading, priority, preemption, and autoscaler integration.
+- Keep Cleanroom's policy, gateway, cache, execution, auth, and backend
+  invariants intact.
+- Prefer Agent Sandbox `Sandbox`, `SandboxTemplate`, `SandboxClaim`, and
+  `SandboxWarmPool` as the user-facing Kubernetes API.
+- Keep privileged host operations inside a node-local Cleanroom runtime.
+- Keep user-facing pods unprivileged or narrowly scoped.
+- Surface lifecycle through Kubernetes status conditions and events.
+- Use Kubernetes `Service` and Gateway API patterns for routing instead of
+  ad hoc per-host port allocation.
+- Use Kubernetes namespace, RBAC, quota, admission, and NetworkPolicy as the
+  coarse cluster boundary.
+- Fail closed when a backend, node capability, policy, owner, cache lineage, or
+  gateway scope cannot be validated.
+
+## Non-goals
+
+- Do not build a custom scheduler in the first version.
+- Do not make a central `cleanroom serve` deployment start hidden VMs on
+  arbitrary nodes.
+- Do not replace Cleanroom repository policy with Kubernetes NetworkPolicy.
+- Do not expose host privileged sockets or ZFS/KVM operations to user pods.
+- Do not store large logs, stdout, filesystem blobs, policy blobs, or cache
+  records in Kubernetes status.
+- Do not make warm pools hand out pre-owned user sandboxes in the first version.
+- Do not require KubeVirt for the first Kubernetes deployment.
+- Do not make KubeVirt the public product API.
+- Do not add compatibility paths for older Kubernetes-specific prototypes until
+  a first production shape exists.
+
+## Target Model
+
+The target shape is a normal Kubernetes reconciliation stack:
+
+```text
+SandboxClaim or CleanroomSandbox
+        |
+Cleanroom Kubernetes controller
+        |
+scheduled sandbox workload
+        |
+node-local Cleanroom runtime
+        |
+Firecracker, KubeVirt, or another backend
+        |
+Cleanroom guest agent, gateway, cache, execution, files
+```
+
+The cluster controller observes declarative Kubernetes objects, validates the
+request, compiles Cleanroom policy, creates child resources, and reports status.
+The Kubernetes scheduler places the sandbox workload. The node-local runtime
+starts and manages the VM only after Kubernetes has assigned the workload to a
+node with the required capabilities.
+
+### Ownership Boundaries
+
+Kubernetes owns:
+
+- desired state objects
+- scheduling and bin packing
+- namespace-level tenancy
+- RBAC and admission
+- resource quotas and limit ranges
+- workload replacement after pod or node failures
+- Services, Gateway routes, and outer NetworkPolicies
+- status conditions and events visible to cluster operators
+
+Cleanroom owns:
+
+- repository policy compilation and policy hashes
+- backend capability checks
+- sandbox VM lifecycle on the assigned node
+- guest execution and file operations
+- exact-principal resource ownership
+- gateway authorization envelopes
+- dependency and service cache keys
+- stage-cache lineage and import validation
+- backend-specific storage, networking, and guest-agent behavior
+
+## API Strategy
+
+### Preferred Northbound API: Agent Sandbox
+
+The first Kubernetes integration should target Agent Sandbox rather than define
+a competing high-level API.
+
+The mapping is:
+
+| Agent Sandbox concept | Cleanroom responsibility |
+| --- | --- |
+| `SandboxTemplate` | Runtime image, adapter pod shape, backend class, coarse NetworkPolicy, service exposure defaults |
+| `SandboxClaim` | Request for one policy-bound Cleanroom sandbox and its owner identity |
+| `SandboxWarmPool` | Warm adapter capacity and backend/runtime cache warmup |
+| `Sandbox` status | Readiness, suspension, failure, service identity, route metadata |
+| SDK command/file operations | Cleanroom execution and file APIs |
+
+Cleanroom-specific fields should start as annotations or template conventions
+only if the Agent Sandbox API has no field for them. If a field becomes
+durable, prefer upstreaming it or adding a small Cleanroom extension CRD rather
+than depending on undocumented annotation contracts.
+
+### Optional Cleanroom CRD
+
+If a native Cleanroom object is needed, keep it close to Kubernetes API
+conventions and avoid imperative verbs:
+
+```yaml
+apiVersion: cleanroom.buildkite.com/v1alpha1
+kind: CleanroomSandbox
+metadata:
+  name: buildkite-test-abc123
+spec:
+  repository:
+    url: https://github.com/buildkite/cleanroom.git
+    commit: abc123
+  policy:
+    configMapRef:
+      name: cleanroom-policy
+      key: cleanroom.yaml
+  backendClassName: firecracker-zfs
+  resources:
+    vcpus: 4
+    memory: 8Gi
+    disk: 32Gi
+  operatingMode: Running
+status:
+  observedGeneration: 4
+  sandboxID: sbx_01abc
+  policyHash: sha256:...
+  conditions:
+    - type: PolicyCompiled
+      status: "True"
+      reason: Valid
+    - type: Ready
+      status: "True"
+      reason: RuntimeReady
+```
+
+Rules:
+
+- `spec` is desired state.
+- `status` is observed state.
+- lifecycle changes use declarative fields such as `operatingMode`, not
+  imperative status writes.
+- finalizers guard runtime cleanup.
+- child resources use owner references.
+- status contains IDs, hashes, condition summaries, and route metadata only.
+
+## Scheduling And Capacity
+
+Every sandbox must have a Kubernetes-visible resource envelope before
+placement:
+
+```yaml
+resources:
+  requests:
+    cpu: "4"
+    memory: 8Gi
+    ephemeral-storage: 32Gi
+    cleanroom.buildkite.com/kvm: "1"
+    cleanroom.buildkite.com/vm-slot: "1"
+```
+
+The controller derives this from:
+
+- `cleanroom.yaml` policy resource floors
+- backend runtime defaults
+- backend class minimums
+- repository bootstrap requirements
+- Docker service requirements
+- cache-output volume floors where they imply local disk pressure
+
+Use standard Kubernetes placement features first:
+
+- node labels for backend capability:
+  - `cleanroom.buildkite.com/backend.firecracker=true`
+  - `cleanroom.buildkite.com/storage.zfs=true`
+  - `cleanroom.buildkite.com/kvm=true`
+- taints and tolerations for dedicated sandbox nodes
+- node affinity for backend classes
+- topology spread constraints for availability
+- priority classes for interactive versus batch sandboxes
+- namespace `ResourceQuota` and `LimitRange`
+- Cluster Autoscaler or Karpenter around visible pending workloads
+
+Use a device plugin for scarce node-local resources in the first version:
+
+- KVM slot availability
+- maximum VM concurrency
+- optionally tap-device or gateway slot limits
+
+Use Dynamic Resource Allocation later if allocation needs structured
+parameters, such as storage driver, cache locality, or per-sandbox device
+metadata that a simple extended resource cannot express.
+
+## Controller Design
+
+The cluster controller runs as a normal Kubernetes controller with leader
+election, watches, work queues, finalizers, status updates, and events.
+
+Responsibilities:
+
+- Watch Agent Sandbox objects and optional Cleanroom CRDs.
+- Resolve repository policy source.
+- Validate and compile policy using Cleanroom's existing parser and compiler.
+- Resolve effective resources and backend class.
+- Create or patch the scheduled sandbox workload.
+- Create or patch Services, Gateway routes, NetworkPolicies, and narrow
+  Secrets or projected tokens.
+- Stamp owner metadata and policy hash.
+- Set status conditions with `observedGeneration`.
+- Emit Kubernetes Events for material lifecycle transitions.
+- Requeue for expiration, suspend/wake, retryable runtime failures, and cleanup.
+- Run finalizer cleanup for runtime resources and route resources.
+
+The controller should not:
+
+- start VMs directly
+- run privileged host setup
+- stream command output through CRD status
+- choose nodes outside Kubernetes scheduling
+- mutate policy after sandbox creation
+- grant broad host access to adapter pods
+
+## Node Runtime
+
+Run the node runtime as a DaemonSet on nodes that advertise Cleanroom
+capability.
+
+Responsibilities:
+
+- expose a node-local, authenticated control endpoint
+- validate that requested sandboxes are scheduled to the local node
+- prepare Firecracker/KVM networking
+- manage rootfs and cache-output volumes
+- manage ZFS/file snapshot drivers
+- start, suspend, resume, and terminate VMs
+- register and release gateway scopes
+- publish runtime metrics and health
+- clean orphaned VMs, TAP devices, firewall rules, and temporary volumes
+
+The runtime is the only component with host privileges required for KVM,
+networking, firewall, and storage operations. The adapter pod should receive a
+per-sandbox token or projected service account identity that authorizes only
+that sandbox.
+
+## Scheduled Workload Shape
+
+The first version should use an adapter pod as the schedulable workload:
+
+```text
+Pod cleanroom-sandbox-abc123
+  container cleanroom-adapter
+    - exposes Agent Sandbox runtime HTTP surface
+    - talks to local Cleanroom node runtime
+    - maps run/read/write/list/tunnel operations to Cleanroom APIs
+```
+
+The adapter pod carries the resource requests for the underlying VM. That makes
+Kubernetes bin packing approximately correct even though the untrusted workload
+runs in a VM started by the node runtime.
+
+The adapter pod must not be privileged. It should not have raw access to the
+node runtime socket unless that socket enforces per-sandbox authorization.
+
+## Networking And Load Balancing
+
+Use Kubernetes networking for entry and routing:
+
+- create one Service per sandbox for stable in-cluster identity when needed
+- use a shared router plus Gateway API for external access
+- route by sandbox ID, claim name, or stable service name
+- set readiness based on sandbox `Ready` condition
+- keep per-sandbox direct load balancers out of the default path
+
+Use Kubernetes NetworkPolicy as the outer perimeter:
+
+- default deny namespace policy
+- allow router to adapter service
+- allow adapter to node runtime on the same node where supported
+- allow adapter/runtime to Cleanroom gateway services
+- deny metadata service and cluster-internal ranges unless explicitly required
+
+Use Cleanroom policy inside the runtime:
+
+- exact host and port egress allowlists
+- stage-scoped network rules
+- gateway-mediated Git, OCI, Go, RubyGems, Docker Hub, and fetch traffic
+- owner-aware gateway envelopes
+- deny-by-default behavior on unsupported backend capabilities
+
+Kubernetes NetworkPolicy should not be treated as a replacement for Cleanroom's
+repository policy. It is too coarse for stage-scoped repository egress and
+host-side credential mediation.
+
+## Storage And Caching
+
+Keep storage layers explicit:
+
+- Kubernetes PVCs are for lifecycle-visible sandbox state that must survive pod
+  restart, reschedule, or suspension.
+- node-local disks are for backend runtime artifacts and hot caches.
+- Cleanroom stage-cache metadata remains Cleanroom-owned because it is keyed by
+  policy, repository, owner, backend, runtime lineage, and storage driver.
+- CSI snapshots can be used where they match the backend's semantics, but they
+  do not replace Cleanroom user snapshots or system stage caches by default.
+
+Warm pools should start with:
+
+- prewarmed adapter pods
+- pulled adapter images
+- prepared Cleanroom base images
+- populated transport caches
+- populated stage caches where owner and lineage are safe
+
+Warm pools should not initially provide pre-owned user sandboxes. Adoption would
+need exact principal stamping, gateway envelope replacement, cache partitioning,
+and file state guarantees before it is safe.
+
+## Security Model
+
+Kubernetes handles coarse access:
+
+- namespace isolation
+- RBAC for claim/template operations
+- admission policy for allowed backend classes and resource ceilings
+- ResourceQuota and LimitRange
+- NetworkPolicy and Gateway policy
+- service account identity
+
+Cleanroom handles sandbox access:
+
+- exact owner principal for sandboxes, executions, snapshots, files, streams,
+  and cache metadata
+- request-time authorization before repository mirrors, host credentials,
+  snapshots, stage caches, or backend work are used
+- immutable policy hash per sandbox
+- backend capability validation before provisioning
+- owner-aware gateway scopes
+- fail-closed cache import and gateway behavior
+
+Admission should reject:
+
+- unsupported backend classes
+- resource requests above namespace or tenant limits
+- unpinned or disallowed images when policy requires pinned images
+- dangerous allow-all egress unless the namespace is explicitly permitted
+- templates that mount privileged host paths into adapter pods
+- adapter pod specs that request privileged mode outside the node runtime
+
+## Backend Notes
+
+### Firecracker
+
+Firecracker is the first Kubernetes backend target.
+
+Expected shape:
+
+- adapter pod is the Kubernetes-scheduled unit
+- node runtime starts Firecracker on the assigned node
+- KVM and VM slots are represented through extended resources
+- ZFS-backed hosts use clone-backed stage-cache materialization
+- file-backed hosts remain functional but should report degraded cache
+  materialization capability
+
+### KubeVirt
+
+KubeVirt is a later backend option, not the primary product API.
+
+Expected shape:
+
+- the controller creates a KubeVirt `VirtualMachine` or
+  `VirtualMachineInstance` as the scheduled VM resource
+- Cleanroom still compiles policy, provides execution/file APIs, controls the
+  gateway, and owns cache semantics
+- the KubeVirt backend adapter maps Cleanroom sandbox lifecycle to KubeVirt
+  lifecycle and status
+
+Use this path only if Kubernetes-managed QEMU/KVM lifecycle, storage, or live
+migration materially beats the Firecracker node-runtime path for a target
+deployment.
+
+### Backend Neutrality
+
+Public Kubernetes-facing resources should name backend classes, not internal
+runtime knobs. Backend-specific fields belong in `CleanroomBackendClass` style
+runtime config or adapter internals.
+
+## Observability
+
+Expose both Kubernetes-native and Cleanroom-native signals.
+
+Kubernetes-native:
+
+- `status.conditions` for `PolicyCompiled`, `Scheduled`, `RuntimeReady`,
+  `Ready`, `Suspended`, `Failed`, and `Expired`
+- Events for policy validation failure, scheduling failure, runtime launch,
+  ready, suspend, wake, termination, and cleanup failure
+- metrics for controller reconcile latency, queue depth, errors, and condition
+  transitions
+
+Cleanroom-native:
+
+- existing OTLP spans and metrics for sandbox creation, execution, gateway
+  requests, cache lookup/import/export, and launch phases
+- structured logs with `sandbox_id`, `execution_id`, `backend`, `reason_code`,
+  Kubernetes namespace/name/UID, and owner principal
+- retained execution observability artifacts outside CRD status
+
+Metric labels must stay low-cardinality. Kubernetes object UID, sandbox ID, and
+execution ID belong in logs and traces, not high-cardinality Prometheus labels.
+
+## Current State
+
+Cleanroom already has several pieces this plan can reuse:
+
+- backend-neutral policy and resource floors in repository config
+- ConnectRPC sandbox and execution APIs
+- exact-principal auth for shared control servers
+- gateway routes for Git, OCI, Docker Hub, Go modules, RubyGems, and fetches
+- layered cache and stage-cache metadata
+- Firecracker backend with KVM, TAP devices, host firewalling, and file/ZFS
+  snapshot drivers
+- suspend/wake lifecycle states and RPCs
+- observability contracts for traces, metrics, logs, and retained execution
+  diagnostics
+
+The missing Kubernetes-native pieces are:
+
+- controller reconciliation around Kubernetes objects
+- scheduler-visible sandbox resource envelopes
+- node-local runtime authorization for scheduled workloads
+- adapter pod runtime surface for Agent Sandbox
+- device plugin or DRA capacity exposure
+- Service/Gateway routing integration
+- Kubernetes status conditions and events
+
+## Delivery Strategy
+
+### Phase 1: Agent Sandbox Adapter Prototype
+
+Build the smallest end-to-end integration without new Cleanroom CRDs.
+
+Scope:
+
+- package a `cleanroom-agent-sandbox-adapter` image
+- run Cleanroom node runtime as a DaemonSet on one Firecracker-capable node pool
+- define an Agent Sandbox `SandboxTemplate` for the adapter pod
+- map Agent Sandbox run and file operations to Cleanroom execution and file APIs
+- create one Cleanroom sandbox per claimed adapter pod
+- set realistic pod resource requests from static template values
+
+Definition of done:
+
+- a `SandboxClaim` becomes ready
+- an SDK command runs inside a Cleanroom VM
+- file read/write flows through Cleanroom APIs
+- Kubernetes schedules the adapter pod based on declared CPU, memory, storage,
+  and extended resources
+- deleting the claim terminates the Cleanroom sandbox and releases runtime
+  resources
+
+### Phase 2: Policy And Resource Reconciliation
+
+Move from static template resources to policy-derived resources.
+
+Scope:
+
+- add a controller that resolves repository policy for a claim
+- compile and persist the immutable policy hash
+- derive effective CPU, memory, disk, Docker, and backend capability
+  requirements
+- patch or create the scheduled workload with those requests before VM start
+- expose `PolicyCompiled` and `Ready` conditions
+
+Definition of done:
+
+- invalid policy fails before scheduling runtime work
+- resource requests match the compiled policy and backend floors
+- policy hash and effective resources are visible in status or annotations
+- backend capability mismatch fails closed with an explainable condition
+
+### Phase 3: Scheduler-Visible Node Capacity
+
+Expose scarce runtime capacity to Kubernetes.
+
+Scope:
+
+- add node labels for backend support and storage driver support
+- add taints for dedicated sandbox nodes
+- add a device plugin for KVM and VM slots
+- optionally expose local cache/storage capacity as a coarse extended resource
+- document namespace `ResourceQuota` examples
+
+Definition of done:
+
+- pending claims remain pending when no capable node has capacity
+- Kubernetes bin packs multiple claims on a capable node without overcommitting
+  hidden VM slots
+- autoscaler can react to pending adapter pods
+
+### Phase 4: Routing And Network Policy
+
+Make ingress and egress cluster-native without weakening Cleanroom policy.
+
+Scope:
+
+- create per-sandbox Services or stable Service records
+- route external traffic through a shared router and Gateway API
+- generate or document outer NetworkPolicies
+- keep Cleanroom gateway and egress policy as the inner enforcement layer
+
+Definition of done:
+
+- clients can reach a sandbox through a stable Kubernetes route
+- router targets only ready sandboxes
+- NetworkPolicy blocks direct unwanted paths
+- Cleanroom still denies policy-disallowed egress inside the VM
+
+### Phase 5: Warm Capacity
+
+Use Kubernetes warm pools and autoscaling for latency without violating owner
+isolation.
+
+Scope:
+
+- warm adapter pods
+- pre-pull images
+- pre-materialize base runtime artifacts
+- pre-populate transport caches
+- add cache warming jobs or hooks where cache ownership is safe
+
+Definition of done:
+
+- warm claims have lower time to first instruction
+- no warm pool member contains another principal's retained files, credentials,
+  gateway scopes, or owner-partitioned cache entries
+- stale template updates roll through warm pools without claim disruption
+
+### Phase 6: Optional KubeVirt Backend
+
+Evaluate KubeVirt as a backend only after the Firecracker path is proven.
+
+Scope:
+
+- implement a backend adapter that creates and observes KubeVirt VM resources
+- map Cleanroom lifecycle to KubeVirt lifecycle
+- retain Cleanroom execution, file, policy, gateway, and cache semantics
+- compare density, startup time, storage, networking, migration, and operator
+  complexity against Firecracker
+
+Definition of done:
+
+- one policy-bound Cleanroom sandbox runs through KubeVirt
+- the same Cleanroom API contract works across Firecracker and KubeVirt
+- unsupported KubeVirt features fail closed with clear conditions
+
+## Verification
+
+Unit tests:
+
+- policy-to-resource-envelope resolution
+- backend class validation
+- condition transitions
+- owner and policy hash stamping
+- finalizer cleanup planning
+- adapter authorization checks
+
+Controller integration tests:
+
+- claim creates child workload and status
+- invalid policy sets failure condition and creates no runtime workload
+- deletion runs finalizers and removes child resources
+- suspend and resume update desired state and conditions
+- status updates use `observedGeneration`
+
+Cluster smoke tests:
+
+- kind or real-cluster adapter smoke for API behavior
+- Firecracker-capable node-pool smoke for actual VM launch
+- multiple concurrent claims prove scheduler-visible bin packing
+- quota exhaustion leaves claims pending or denied predictably
+- router/Gateway smoke for command and port traffic
+- NetworkPolicy smoke proves direct disallowed paths fail
+
+Performance tests:
+
+- time to first instruction for cold and warm claims
+- controller reconcile throughput under claim bursts
+- node runtime launch concurrency
+- cache hit and miss latency
+- autoscaler behavior when node pools are empty
+
+Security tests:
+
+- one principal cannot read another principal's sandbox, execution, files,
+  snapshots, or cache entries
+- adapter token cannot control a sandbox other than its own
+- dangerous egress override is denied by admission unless explicitly allowed
+- gateway denies ownerless or mismatched scopes under auth
+- unprivileged adapter pod cannot access host runtime operations directly
+
+## Key Learnings From Pressure-Testing
+
+- The main Kubernetes objection would be hidden capacity. The plan therefore
+  makes one sandbox equal one scheduled workload and requires honest resource
+  requests before VM launch.
+- A custom scheduler would raise unnecessary complexity early. The plan starts
+  with normal scheduler primitives, node labels, taints, quotas, and a device
+  plugin before considering DRA or scheduler extensions.
+- Warm pools are useful but dangerous if they contain principal-bound state.
+  The first warm-pool slice warms adapter/runtime/cache capacity, not adopted
+  user sandboxes.
+- Kubernetes NetworkPolicy is useful as a perimeter, but it cannot express
+  Cleanroom's stage-scoped repository and gateway policy. The plan keeps both
+  layers separate.
+- KubeVirt is valuable as a backend candidate, but making it the public API
+  would bypass Agent Sandbox and would not provide Cleanroom policy, gateway,
+  cache, or ownership semantics by itself.
+- CRD status can become a data sink. The plan keeps large artifacts in
+  Cleanroom stores and exposes only condition summaries, IDs, hashes, and
+  routes through Kubernetes objects.
+
+## Resolved Decisions
+
+- Use Agent Sandbox as the preferred user-facing Kubernetes API.
+- Keep Cleanroom as the runtime semantics layer.
+- Start with Firecracker and a node-local runtime.
+- Treat KubeVirt as a later backend option.
+- Use adapter pods as the first schedulable unit.
+- Make resource requests honest enough for Kubernetes bin packing.
+- Use device plugins for first-slice scarce node-local capacity.
+- Keep Cleanroom policy enforcement inside the runtime.
+- Use Kubernetes Services and Gateway API for routing.
+
+## Deferred Work
+
+- Dynamic Resource Allocation for structured backend capacity.
+- KubeVirt backend implementation.
+- Controlled sharing across principals.
+- Cross-node live migration or hibernation.
+- Cross-host cache scheduler beyond configured cache peers.
+- First-class UI for Kubernetes-native sandbox status.
+- Multi-sandbox-per-pod density optimization.
+
+## Open Questions
+
+### Blocking The First Slice
+
+- Should the first adapter target Agent Sandbox's existing runtime HTTP surface
+  exactly, or should it expose a narrower compatibility shim first?
+
+Recommended default: target the existing runtime surface for run and file
+operations only. Defer tunnels and richer interactive behavior until the basic
+claim lifecycle is proven.
+
+- Should the first node runtime endpoint be a Unix socket mounted into the
+  adapter pod or a localhost/node-local HTTPS endpoint?
+
+Recommended default: use a node-local HTTPS endpoint with per-sandbox bearer
+  tokens. Avoid host socket mounts into user-visible pods unless the
+  authorization boundary is already strong.
+
+### Needed Before Production
+
+- Which Kubernetes versions and managed providers are in the support matrix?
+- Which CNI implementations must pass NetworkPolicy and routing tests?
+- Should cache storage be local PV, hostPath managed by the node runtime, or a
+  CSI-backed volume class in the first production deployment?
+- Should KubeVirt be evaluated for migration and storage reasons before any
+  production launch, or only after Firecracker proves insufficient?
+
+### Safe To Defer
+
+- Whether DRA replaces the first device plugin.
+- Whether warm pools can safely adopt pre-created Cleanroom sandboxes.
+- Whether Cleanroom needs its own CRD after Agent Sandbox integration.
+- Whether live migration belongs in Cleanroom, KubeVirt, or outside v1 scope.