Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions skills/autobrowse/.env.example
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
ANTHROPIC_API_KEY=sk-ant-...
BROWSERBASE_API_KEY=bb_live_...
BROWSERBASE_PROJECT_ID=your-project-id

# Throwaway inbox provisioning (only for signup/login/MFA tasks).
# Get a free key at https://agentmail.to. Browserbase deployments inject
# their own pooled key; regular users provide their own.
AGENTMAIL_API_KEY=
1 change: 1 addition & 0 deletions skills/autobrowse/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ tasks/
traces/
*.log
.DS_Store
.inbox.json
33 changes: 33 additions & 0 deletions skills/autobrowse/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,31 @@ List available tasks:
ls ./autobrowse/tasks/
```

### Step 2.5 — (Only if the task needs email) Provision a throwaway inbox

If the workflow requires **registering an account, logging in, or email / MFA verification**, give the inner agent its own disposable inbox. The inner agent never sees an email credential — `scripts/inbox.mjs` mints a throwaway AgentMail inbox and only the address is injected into the run.

Requires `AGENTMAIL_API_KEY` in the environment (see `.env.example`). No key? Get one free at https://agentmail.to. (Browserbase deployments inject a pooled key automatically.) Then, once per task, before the loop:

```bash
node ${CLAUDE_SKILL_DIR}/scripts/inbox.mjs create --workspace ./autobrowse --task <task>
# prints the inbox address, e.g. ab-3f9k2@agentmail.to
```

Capture it and pass `--inbox-email` to **every** `evaluate.mjs` run for this task (see "Run the inner agent"). The address is also available to task.md authors as `{{inbox_email}}`.

The inbox is **loop-only** — it exists just so exploration can complete signup/MFA. Always release it when the loop ends (see "Clean up the inbox"). Graduated skills do not depend on it; end users supply their own email/credentials at run time.

> **Concurrency limit:** AgentMail's free tier caps at 3 inboxes per account. Sequential loops self-heal (a stale inbox is swept on the next `create`), but **do not run more than 3 email-needing tasks in parallel** (`--all` / `--tasks`) — the 4th `create` will fail. Run them in smaller batches, or raise the cap with a paid AgentMail plan.

### Step 3 — Multi-task: spawn parallel sub-agents

If running multiple tasks, use the Agent tool to spawn one sub-agent per task simultaneously. Each sub-agent receives a self-contained prompt to run the full autobrowse loop for its task:

> "You are running the autobrowse skill for task `<name>`. Workspace: `<absolute-path-to-workspace>` (e.g. `/path/to/project/autobrowse`). Run `<N>` iterations of: evaluate → read trace → improve strategy.md → repeat. Use `--env <env>`. Pass `--workspace <workspace>` to every evaluate.mjs invocation. Follow the autobrowse loop instructions exactly.
>
> If this task needs signup/login/MFA, run `inbox.mjs create` once before the loop, pass `--inbox-email <addr>` to every evaluate.mjs run, and run `inbox.mjs release` when the loop ends (even on failure).
>
> When graduating, install the skill to `~/.claude/skills/<task-name>/SKILL.md` with proper agentskills frontmatter (name + description). Do not just copy strategy.md — write a self-contained skill.
>
> At the end, output a structured summary with: task name, pass/fail on final run, total cumulative cost, iterations completed, per-iteration table (iter number, turns, cost, status, hypothesis tested), and 2-3 bullet key learnings."
Expand All @@ -104,6 +123,8 @@ Check that `./autobrowse/tasks/<task>/task.md` exists (scaffold it from the temp
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse
# or for bot-protected sites:
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote
# if you provisioned an inbox in Step 2.5, pass it on every run:
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --inbox-email <addr>
```

This runs the browser session and writes a full trace to `./autobrowse/traces/<task>/latest/`.
Expand Down Expand Up @@ -221,6 +242,18 @@ ls ~/.claude/skills/<task-name>/SKILL.md

The skill is now available as `/<task-name>` in Claude Code.

> **Email/MFA tasks — graduation note:** the throwaway inbox is loop-only. The graduated SKILL.md must **not** reference `inbox.mjs` or the autobrowse inbox. Instead, document that the end user supplies their own email/credentials at run time (or reuses an authenticated session via `/cookie-sync`), and note in "Site-Specific Gotchas" that the flow requires email/MFA verification.

### Clean up the inbox

If you provisioned an inbox in Step 2.5, release it once the loop ends — **whether it graduated, failed, or hit max iterations**:

```bash
node ${CLAUDE_SKILL_DIR}/scripts/inbox.mjs release --workspace ./autobrowse --task <task>
```

This deletes the throwaway inbox and removes its local `.inbox.json`. It's best-effort and safe to run even if no inbox exists. (Abandoned inboxes are also swept automatically on the next `create`, but release promptly to stay under the 3-inbox cap.)

---

## Final report (multi-task mode)
Expand Down
6 changes: 6 additions & 0 deletions skills/autobrowse/references/example-task.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ List the data the agent needs (credentials, form values, etc.):
- Field 1: value
- Field 2: value

If the task requires registering an account, logging in, or email/MFA
verification, provision a throwaway inbox before the loop (see SKILL.md) and the
agent receives `{{inbox_email}}` automatically — use it for any email field:

- Email: {{inbox_email}}

## Steps

1. Navigate to the URL
Expand Down
67 changes: 61 additions & 6 deletions skills/autobrowse/scripts/evaluate.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ const TOOLS = [
" browse get url/title/text — Get page info\n" +
" browse mouse drag <x1> <y1> <x2> <y2> — Drag (for sliders)\n" +
" browse back/reload/stop — Navigation/session control\n\n" +
"If a throwaway inbox was provisioned for this task (see the Agent Inbox section, when present), you may also run `node <path>/scripts/inbox.mjs wait-otp|latest ...` through this tool to read verification emails.\n\n" +
"Critical: Always `browse snapshot` after every action — refs invalidate on DOM changes.",
input_schema: {
type: "object",
Expand Down Expand Up @@ -81,6 +82,8 @@ Options:
--env local|remote Browser environment (default: local)
--model <model> Claude model for the inner agent (default: ${DEFAULT_MODEL})
--run-number N Force a specific run number (default: auto-increment)
--inbox-email <addr> Throwaway inbox address for signup/login/MFA tasks
(provision it first via scripts/inbox.mjs create)
--help Show this help message

Environment variables:
Expand Down Expand Up @@ -159,6 +162,29 @@ function getNextRunNumber(tracesDir) {
}

const ALLOWED_COMMAND = "browse";
// Absolute path to the throwaway-inbox helper. The agent may shell out to it
// (e.g. `node <abs>/scripts/inbox.mjs wait-otp ...`) when a task involves
// signup/login/MFA and an inbox was provisioned for the run.
const INBOX_SCRIPT = path.join(SKILL_DIR, "scripts", "inbox.mjs");

function isAllowedCommand(executable, args) {
if (executable === ALLOWED_COMMAND) return true;
// node <abs-path-to-inbox.mjs> ...
if (executable === "node" && args[0] && path.resolve(args[0]) === INBOX_SCRIPT) return true;
return false;
}

// inbox.mjs wait-otp/wait-link block for up to --within seconds polling for an
// email — longer than the default 30s exec cap. Give them their full window
// plus headroom so the harness doesn't kill them mid-poll (the ETIMEDOUT bug).
function execTimeoutFor(executable, args) {
const isInbox = executable === "node" && args[0] && path.resolve(args[0]) === INBOX_SCRIPT;
const isWait = isInbox && (args.includes("wait-otp") || args.includes("wait-link"));
if (!isWait) return EXEC_TIMEOUT_MS;
const i = args.indexOf("--within");
const within = i !== -1 ? parseInt(args[i + 1], 10) : 60;
return Math.max(EXEC_TIMEOUT_MS, (Number.isFinite(within) ? within : 60) * 1000 + 15_000);
}

function parseCommand(command) {
const args = [];
Expand Down Expand Up @@ -250,15 +276,15 @@ function executeCommand(command) {
}

const [executable, ...args] = parsed.args;
if (executable !== ALLOWED_COMMAND) {
return { output: `BLOCKED: only browse commands are allowed. Got: ${command.slice(0, 50)}`, error: true, duration_ms: 0 };
if (!isAllowedCommand(executable, args)) {
return { output: `BLOCKED: only browse and inbox.mjs commands are allowed. Got: ${command.slice(0, 50)}`, error: true, duration_ms: 0 };
}

const start = Date.now();
try {
const output = execFileSync(executable, args, {
encoding: "utf-8",
timeout: EXEC_TIMEOUT_MS,
timeout: execTimeoutFor(executable, args),
Comment thread
cursor[bot] marked this conversation as resolved.
stdio: ["pipe", "pipe", "pipe"],
maxBuffer: 1024 * 1024,
});
Expand All @@ -271,7 +297,34 @@ function executeCommand(command) {
}
}

function buildSystemPrompt(strategy, traceDir, browseEnv) {
Comment thread
cursor[bot] marked this conversation as resolved.
function buildInboxSection(inboxEmail, workspace, taskName) {
if (!inboxEmail) return "";
return `
# Agent Inbox

You have been provisioned a throwaway email inbox for this task:

${inboxEmail}

Use this address for any signup, login, or MFA / email-verification step — type it into email fields exactly as shown. To read mail that arrives (verification links, one-time codes), shell out via the execute tool:

- Wait for an OTP / verification code:
\`node ${INBOX_SCRIPT} wait-otp --workspace ${workspace} --task ${taskName} --from <sender-domain> --within 60\`
Prints just the extracted code on stdout (or fails after the timeout). Use the sending domain you expect, e.g. \`--from stripe.com\`. Default matches a 4–8 digit code; pass \`--regex "<pattern>"\` for alphanumeric codes.

- Wait for a verification / magic link, then open it:
\`node ${INBOX_SCRIPT} wait-link --workspace ${workspace} --task ${taskName} --from <sender-domain> --within 60\`
Prints just the first URL found (optionally filter with \`--match <substr>\`, e.g. \`--match verify\`). Then \`browse open <that-url>\` to complete verification.

- Read the most recent message raw (fallback if the helpers above miss):
\`node ${INBOX_SCRIPT} latest --workspace ${workspace} --task ${taskName}\`
Prints the newest message as JSON (from, subject, text, html).

Do not call AgentMail or any other email API directly — only the commands above.
`;
}

function buildSystemPrompt(strategy, traceDir, browseEnv, inboxSection) {
const openFlag = browseEnv === "remote" ? "--remote" : "--local";
const envDesc = browseEnv === "remote"
? `Use **remote mode** (Browserbase) — Browserbase Identity, Verified browsers, CAPTCHA solving, residential proxies:
Expand Down Expand Up @@ -352,7 +405,7 @@ ${envDesc}
- **Page seems empty**: Try \`browse wait timeout 1000\` then \`browse snapshot\`; if you know the target element, use \`browse wait selector "<selector>"\`
- **Dropdown didn't open**: Wait briefly, then snapshot to check
- **Slider won't move with click**: Use \`browse press ArrowRight\` / \`browse press ArrowLeft\` after clicking the slider thumb

${inboxSection}
# Current Navigation Strategy

The following strategy has been learned from previous iterations. Follow these guidelines:
Expand Down Expand Up @@ -401,7 +454,9 @@ async function main() {

const strategy = fs.readFileSync(strategyFile, "utf-8");
const task = fs.readFileSync(taskFile, "utf-8");
const systemPrompt = buildSystemPrompt(strategy, traceDir, browseEnv);
const inboxEmail = getArg("inbox-email");
const inboxSection = buildInboxSection(inboxEmail, workspace, taskName);
const systemPrompt = buildSystemPrompt(strategy, traceDir, browseEnv, inboxSection);
Comment thread
cursor[bot] marked this conversation as resolved.
Comment thread
cursor[bot] marked this conversation as resolved.

console.error(`\n${"=".repeat(60)}`);
console.error(` AUTOBROWSE — ${taskName} — Run ${runNumber}`);
Expand Down
Loading