browserbase · aq17 · May 26, 2026 · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026
diff --git a/skills/autobrowse/.env.example b/skills/autobrowse/.env.example
@@ -1,3 +1,8 @@
 ANTHROPIC_API_KEY=sk-ant-...
 BROWSERBASE_API_KEY=bb_live_...
 BROWSERBASE_PROJECT_ID=your-project-id
+
+# Throwaway inbox provisioning (only for signup/login/MFA tasks).
+# Get a free key at https://agentmail.to. Browserbase deployments inject
+# their own pooled key; regular users provide their own.
+AGENTMAIL_API_KEY=
diff --git a/skills/autobrowse/.gitignore b/skills/autobrowse/.gitignore
@@ -4,3 +4,4 @@ tasks/
 traces/
 *.log
 .DS_Store
+.inbox.json
diff --git a/skills/autobrowse/SKILL.md b/skills/autobrowse/SKILL.md
@@ -72,12 +72,31 @@ List available tasks:
 ls ./autobrowse/tasks/
 ```
 
+### Step 2.5 — (Only if the task needs email) Provision a throwaway inbox
+
+If the workflow requires **registering an account, logging in, or email / MFA verification**, give the inner agent its own disposable inbox. The inner agent never sees an email credential — `scripts/inbox.mjs` mints a throwaway AgentMail inbox and only the address is injected into the run.
+
+Requires `AGENTMAIL_API_KEY` in the environment (see `.env.example`). No key? Get one free at https://agentmail.to. (Browserbase deployments inject a pooled key automatically.) Then, once per task, before the loop:
+
+```bash
+node ${CLAUDE_SKILL_DIR}/scripts/inbox.mjs create --workspace ./autobrowse --task <task>
+# prints the inbox address, e.g. ab-3f9k2@agentmail.to
+```
+
+Capture it and pass `--inbox-email` to **every** `evaluate.mjs` run for this task (see "Run the inner agent"). The address is also available to task.md authors as `{{inbox_email}}`.
+
+The inbox is **loop-only** — it exists just so exploration can complete signup/MFA. Always release it when the loop ends (see "Clean up the inbox"). Graduated skills do not depend on it; end users supply their own email/credentials at run time.
+
+> **Concurrency limit:** AgentMail's free tier caps at 3 inboxes per account. Sequential loops self-heal (a stale inbox is swept on the next `create`), but **do not run more than 3 email-needing tasks in parallel** (`--all` / `--tasks`) — the 4th `create` will fail. Run them in smaller batches, or raise the cap with a paid AgentMail plan.
+
 ### Step 3 — Multi-task: spawn parallel sub-agents
 
 If running multiple tasks, use the Agent tool to spawn one sub-agent per task simultaneously. Each sub-agent receives a self-contained prompt to run the full autobrowse loop for its task:
 
 > "You are running the autobrowse skill for task `<name>`. Workspace: `<absolute-path-to-workspace>` (e.g. `/path/to/project/autobrowse`). Run `<N>` iterations of: evaluate → read trace → improve strategy.md → repeat. Use `--env <env>`. Pass `--workspace <workspace>` to every evaluate.mjs invocation. Follow the autobrowse loop instructions exactly.
 >
+> If this task needs signup/login/MFA, run `inbox.mjs create` once before the loop, pass `--inbox-email <addr>` to every evaluate.mjs run, and run `inbox.mjs release` when the loop ends (even on failure).
+>
 > When graduating, install the skill to `~/.claude/skills/<task-name>/SKILL.md` with proper agentskills frontmatter (name + description). Do not just copy strategy.md — write a self-contained skill.
 >
 > At the end, output a structured summary with: task name, pass/fail on final run, total cumulative cost, iterations completed, per-iteration table (iter number, turns, cost, status, hypothesis tested), and 2-3 bullet key learnings."
@@ -104,6 +123,8 @@ Check that `./autobrowse/tasks/<task>/task.md` exists (scaffold it from the temp
 node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse
 # or for bot-protected sites:
 node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote
+# if you provisioned an inbox in Step 2.5, pass it on every run:
+node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --inbox-email <addr>
 ```
 
 This runs the browser session and writes a full trace to `./autobrowse/traces/<task>/latest/`.
@@ -221,6 +242,18 @@ ls ~/.claude/skills/<task-name>/SKILL.md
 
 The skill is now available as `/<task-name>` in Claude Code.
 
+> **Email/MFA tasks — graduation note:** the throwaway inbox is loop-only. The graduated SKILL.md must **not** reference `inbox.mjs` or the autobrowse inbox. Instead, document that the end user supplies their own email/credentials at run time (or reuses an authenticated session via `/cookie-sync`), and note in "Site-Specific Gotchas" that the flow requires email/MFA verification.
+
+### Clean up the inbox
+
+If you provisioned an inbox in Step 2.5, release it once the loop ends — **whether it graduated, failed, or hit max iterations**:
+
+```bash
+node ${CLAUDE_SKILL_DIR}/scripts/inbox.mjs release --workspace ./autobrowse --task <task>
+```
+
+This deletes the throwaway inbox and removes its local `.inbox.json`. It's best-effort and safe to run even if no inbox exists. (Abandoned inboxes are also swept automatically on the next `create`, but release promptly to stay under the 3-inbox cap.)
+
 ---
 
 ## Final report (multi-task mode)

diff --git a/skills/autobrowse/references/example-task.md b/skills/autobrowse/references/example-task.md
@@ -13,6 +13,12 @@ List the data the agent needs (credentials, form values, etc.):
 - Field 1: value
 - Field 2: value
 
+If the task requires registering an account, logging in, or email/MFA
+verification, provision a throwaway inbox before the loop (see SKILL.md) and the
+agent receives `{{inbox_email}}` automatically — use it for any email field:
+
+- Email: {{inbox_email}}
+
 ## Steps
 
 1. Navigate to the URL

diff --git a/skills/autobrowse/scripts/evaluate.mjs b/skills/autobrowse/scripts/evaluate.mjs
@@ -48,6 +48,7 @@ const TOOLS = [
       "  browse get url/title/text  — Get page info\n" +
       "  browse mouse drag <x1> <y1> <x2> <y2> — Drag (for sliders)\n" +
       "  browse back/reload/stop    — Navigation/session control\n\n" +
+      "If a throwaway inbox was provisioned for this task (see the Agent Inbox section, when present), you may also run `node <path>/scripts/inbox.mjs wait-otp|latest ...` through this tool to read verification emails.\n\n" +
       "Critical: Always `browse snapshot` after every action — refs invalidate on DOM changes.",
     input_schema: {
       type: "object",
@@ -81,6 +82,8 @@ Options:
   --env local|remote   Browser environment (default: local)
   --model <model>      Claude model for the inner agent (default: ${DEFAULT_MODEL})
   --run-number N       Force a specific run number (default: auto-increment)
+  --inbox-email <addr> Throwaway inbox address for signup/login/MFA tasks
+                       (provision it first via scripts/inbox.mjs create)
   --help               Show this help message
 
 Environment variables:
@@ -159,6 +162,29 @@ function getNextRunNumber(tracesDir) {
 }
 
 const ALLOWED_COMMAND = "browse";
+// Absolute path to the throwaway-inbox helper. The agent may shell out to it
+// (e.g. `node <abs>/scripts/inbox.mjs wait-otp ...`) when a task involves
+// signup/login/MFA and an inbox was provisioned for the run.
+const INBOX_SCRIPT = path.join(SKILL_DIR, "scripts", "inbox.mjs");
+
+function isAllowedCommand(executable, args) {
+  if (executable === ALLOWED_COMMAND) return true;
+  // node <abs-path-to-inbox.mjs> ...
+  if (executable === "node" && args[0] && path.resolve(args[0]) === INBOX_SCRIPT) return true;
+  return false;
+}
+
+// inbox.mjs wait-otp/wait-link block for up to --within seconds polling for an
+// email — longer than the default 30s exec cap. Give them their full window
+// plus headroom so the harness doesn't kill them mid-poll (the ETIMEDOUT bug).
+function execTimeoutFor(executable, args) {
+  const isInbox = executable === "node" && args[0] && path.resolve(args[0]) === INBOX_SCRIPT;
+  const isWait = isInbox && (args.includes("wait-otp") || args.includes("wait-link"));
+  if (!isWait) return EXEC_TIMEOUT_MS;
+  const i = args.indexOf("--within");
+  const within = i !== -1 ? parseInt(args[i + 1], 10) : 60;
+  return Math.max(EXEC_TIMEOUT_MS, (Number.isFinite(within) ? within : 60) * 1000 + 15_000);
+}
 
 function parseCommand(command) {
   const args = [];
@@ -250,15 +276,15 @@ function executeCommand(command) {
   }
 
   const [executable, ...args] = parsed.args;
-  if (executable !== ALLOWED_COMMAND) {
-    return { output: `BLOCKED: only browse commands are allowed. Got: ${command.slice(0, 50)}`, error: true, duration_ms: 0 };
+  if (!isAllowedCommand(executable, args)) {
+    return { output: `BLOCKED: only browse and inbox.mjs commands are allowed. Got: ${command.slice(0, 50)}`, error: true, duration_ms: 0 };
   }
 
   const start = Date.now();
   try {
     const output = execFileSync(executable, args, {
       encoding: "utf-8",
-      timeout: EXEC_TIMEOUT_MS,
+      timeout: execTimeoutFor(executable, args),
       stdio: ["pipe", "pipe", "pipe"],
       maxBuffer: 1024 * 1024,
     });
@@ -271,7 +297,34 @@ function executeCommand(command) {
   }
 }
 
-function buildSystemPrompt(strategy, traceDir, browseEnv) {
+function buildInboxSection(inboxEmail, workspace, taskName) {
+  if (!inboxEmail) return "";
+  return `
+# Agent Inbox
+
+You have been provisioned a throwaway email inbox for this task:
+
+  ${inboxEmail}
+
+Use this address for any signup, login, or MFA / email-verification step — type it into email fields exactly as shown. To read mail that arrives (verification links, one-time codes), shell out via the execute tool:
+
+- Wait for an OTP / verification code:
+  \`node ${INBOX_SCRIPT} wait-otp --workspace ${workspace} --task ${taskName} --from <sender-domain> --within 60\`
+  Prints just the extracted code on stdout (or fails after the timeout). Use the sending domain you expect, e.g. \`--from stripe.com\`. Default matches a 4–8 digit code; pass \`--regex "<pattern>"\` for alphanumeric codes.
+
+- Wait for a verification / magic link, then open it:
+  \`node ${INBOX_SCRIPT} wait-link --workspace ${workspace} --task ${taskName} --from <sender-domain> --within 60\`
+  Prints just the first URL found (optionally filter with \`--match <substr>\`, e.g. \`--match verify\`). Then \`browse open <that-url>\` to complete verification.
+
+- Read the most recent message raw (fallback if the helpers above miss):
+  \`node ${INBOX_SCRIPT} latest --workspace ${workspace} --task ${taskName}\`
+  Prints the newest message as JSON (from, subject, text, html).
+
+Do not call AgentMail or any other email API directly — only the commands above.
+`;
+}
+
+function buildSystemPrompt(strategy, traceDir, browseEnv, inboxSection) {
   const openFlag = browseEnv === "remote" ? "--remote" : "--local";
   const envDesc = browseEnv === "remote"
     ? `Use **remote mode** (Browserbase) — Browserbase Identity, Verified browsers, CAPTCHA solving, residential proxies:
@@ -352,7 +405,7 @@ ${envDesc}
 - **Page seems empty**: Try \`browse wait timeout 1000\` then \`browse snapshot\`; if you know the target element, use \`browse wait selector "<selector>"\`
 - **Dropdown didn't open**: Wait briefly, then snapshot to check
 - **Slider won't move with click**: Use \`browse press ArrowRight\` / \`browse press ArrowLeft\` after clicking the slider thumb
-
+${inboxSection}
 # Current Navigation Strategy
 
 The following strategy has been learned from previous iterations. Follow these guidelines:
@@ -401,7 +454,9 @@ async function main() {
 
   const strategy = fs.readFileSync(strategyFile, "utf-8");
   const task = fs.readFileSync(taskFile, "utf-8");
-  const systemPrompt = buildSystemPrompt(strategy, traceDir, browseEnv);
+  const inboxEmail = getArg("inbox-email");
+  const inboxSection = buildInboxSection(inboxEmail, workspace, taskName);
+  const systemPrompt = buildSystemPrompt(strategy, traceDir, browseEnv, inboxSection);
 
   console.error(`\n${"=".repeat(60)}`);
   console.error(`  AUTOBROWSE — ${taskName} — Run ${runNumber}`);
-Original file line number
+Diff line change
@@ Expand Up / @@ -4,3 +4,4 @@ tasks/ @@
     traces/
     *.log
     .DS_Store
+    .inbox.json