Skip to content

fix(sandbox): recover git/daemon proxy when preview handle is stale#3606

Open
guitavano wants to merge 1 commit into
mainfrom
fix/sandbox-git-410-stale-handle
Open

fix(sandbox): recover git/daemon proxy when preview handle is stale#3606
guitavano wants to merge 1 commit into
mainfrom
fix/sandbox-git-410-stale-handle

Conversation

@guitavano
Copy link
Copy Markdown
Contributor

@guitavano guitavano commented Jun 1, 2026

Summary

  • Resurrect or adopt a live sandbox claim in proxyDaemonRequest before returning 404, matching the recovery path preview already uses
  • Retry git/daemon proxy routes once via adoptLiveClaim before mapping 404 → 410 Gone
  • Self-heal the Save changes / publish dialog by re-running SANDBOX_START once when git APIs return 410

Fixes the production issue where preview stays up but GET /api/:org/sandbox/:virtualMcpId/:branch/git/status returns 410 with "Sandbox handle is gone".

Test plan

  • Reproduce stale-handle case: preview iframe loads, open Save changes — git status should succeed instead of 410
  • Confirm git/status, git/diff, and publish still work on a healthy sandbox
  • Confirm truly deleted sandboxes still return 410 and trigger reprovision
  • Deploy to studio.decocms.com and verify the reported URL no longer 410s while preview is live

Made with Cursor


Summary by cubic

Recover git/daemon proxy when a sandbox handle is stale so Save changes and Publish keep working while preview is live. The UI now self-heals once before surfacing 410 Gone.

  • Bug Fixes
    • On 404 from daemon/git proxy, try adoptLiveClaim and retry once; only then map 404 → 410.
    • Added adoptLiveClaim to the agent sandbox provider and mirrored preview resurrection in proxyDaemonRequest, including retries after 401 or port-forward failures.
    • git/status and git/diff pass { userId, projectRef } to enable adoption and retry.
    • Publish dialog detects unreachable sandbox, runs a single SANDBOX_START via useSandboxStart (@decocms/mesh-sdk + SELF_MCP_ALIAS_ID), then retries loading changes.
    • Truly deleted sandboxes still return 410 and trigger reprovision.

Written for commit 08894ab. Summary will update on new commits.

Review in cubic

Preview gateway traffic could stay live while mesh daemon/git paths still returned 410 on stale handles. Resurrect or adopt the live claim before surfacing gone, and self-heal the publish dialog on 410.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

🧪 Benchmark

Should we run the Virtual MCP strategy benchmark for this PR?

React with 👍 to run the benchmark.

Reaction Action
👍 Run quick benchmark (10 & 128 tools)

Benchmark will run on the next push after you react.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Release Options

Suggested: Patch (2.376.2) — based on fix: prefix

React with an emoji to override the release type:

Reaction Type Next Version
👍 Prerelease 2.376.2-alpha.1
🎉 Patch 2.376.2
❤️ Minor 2.377.0
🚀 Major 3.0.0

Current version: 2.376.1

Note: If multiple reactions exist, the smallest bump wins. If no reactions, the suggested bump is used (default: patch).

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/mesh/src/api/routes/sandbox-proxy.ts">

<violation number="1" location="apps/mesh/src/api/routes/sandbox-proxy.ts:202">
P1: `adoptLiveClaim` errors are unhandled in `proxyDaemon`, which can turn a recoverable 404 path into a 500 response.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment on lines +202 to 205
const adopted = await runner.adoptLiveClaim?.(
{ userId, projectRef },
claimName,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: adoptLiveClaim errors are unhandled in proxyDaemon, which can turn a recoverable 404 path into a 500 response.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/src/api/routes/sandbox-proxy.ts, line 202:

<comment>`adoptLiveClaim` errors are unhandled in `proxyDaemon`, which can turn a recoverable 404 path into a 500 response.</comment>

<file context>
@@ -193,14 +199,32 @@ async function proxyDaemon(
-      },
-      410,
-      SANDBOX_PROXY_CACHE_HEADERS,
+    const adopted = await runner.adoptLiveClaim?.(
+      { userId, projectRef },
+      claimName,
</file context>
Suggested change
const adopted = await runner.adoptLiveClaim?.(
{ userId, projectRef },
claimName,
);
let adopted = false;
try {
adopted =
(await runner.adoptLiveClaim?.({ userId, projectRef }, claimName)) ??
false;
} catch {
adopted = false;
}

@guitavano
Copy link
Copy Markdown
Contributor Author

Architectural note

The preview gateway cannot be out of sync with the daemon proxy — preview traffic is served by the daemon (port 9000, reverse-proxying to the dev server). Both paths must resolve the same handle to the same live pod.

What was happening in prod:

  • Preview kept working because the gateway URL (sandboxMap.previewUrl) still routed to a live claim/pod.
  • Git/daemon APIs (/git/status, etc.) went through mesh's proxyDaemonRequest, which held a stale or empty records cache and returned 404 → 410 before trying to recover.

The handle is deterministic (computeClaimHandle) — it does not change on reprovision. The bug wasn't a "new handle vs old handle" mismatch; it was mesh losing track of the same handle while the gateway still reached the daemon.

This PR aligns the daemon/git proxy recovery path with what preview already does: resurrect from state-store, adopt the live K8s claim, then retry — so gateway and daemon proxy always converge on the same handle → same pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant