diff --git a/README.md b/README.md index 5bc15db..6b06aea 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,7 @@ This plugin includes the following skills (see `skills/` for details): | [fetch](skills/fetch/SKILL.md) | Fetch HTML or JSON from static pages without a browser session — inspect status codes, headers, follow redirects | | [search](skills/search/SKILL.md) | Search the web and return structured results (titles, URLs, metadata) without a browser session | | [ui-test](skills/ui-test/SKILL.md) | AI-powered adversarial UI testing — analyzes git diffs to test changes, or explores the full app to find bugs | +| [browser-use-to-stagehand](skills/browser-use-to-stagehand/SKILL.md) | Migrate browser-use (Python) automation to Stagehand v3 (TypeScript) on Browserbase — maps features and picks the right determinism level per step | | [agent-experience](skills/agent-experience/SKILL.md) | Audit how agent-friendly a product, SDK, or docs site is — drops Claude subagents at it with tiny prompts, captures their traces, and scores setup friction, speed, error recovery, and doc quality | | [company-research](skills/company-research/SKILL.md) | Discover target companies matching your ICP using the Browserbase Search API, deep-research each one, and score fit into a research report and CSV | | [event-prospecting](skills/event-prospecting/SKILL.md) | Extract speakers from a conference page, filter their companies against your ICP, and deep-research the best-fit people into a person-first prospecting report | diff --git a/skills/browser-use-to-stagehand/EXAMPLES.md b/skills/browser-use-to-stagehand/EXAMPLES.md new file mode 100644 index 0000000..806ac39 --- /dev/null +++ b/skills/browser-use-to-stagehand/EXAMPLES.md @@ -0,0 +1,258 @@ +# Examples: browser-use → Stagehand + +Before/after pairs showing the migration patterns. Each "before" is a browser-use (Python) script; +each "after" is its Stagehand v3 (TypeScript) rewrite on Browserbase. Illustrative — validate +against your real site and tighten the `act(...)` prompts to the actual on-page labels. + +See [SKILL.md](SKILL.md) for the workflow, [the guide](references/guide.md) for the philosophy + +feature mapping, and [references/determinism.md](references/determinism.md) for the decision framework. + +## Running an "after" example + +```bash +npm install @browserbasehq/stagehand zod +npm install -D tsx dotenv +``` + +`.env`: +```bash +BROWSERBASE_API_KEY=... +BROWSERBASE_PROJECT_ID=... +ANTHROPIC_API_KEY=... # or OPENAI_API_KEY, matching the model string in the file +``` + +```bash +npx tsx example.ts +``` + +Swap `env: "BROWSERBASE"` for `env: "LOCAL"` (with Chrome installed) to run locally during dev. + +--- + +## 1. Simple task + +A fully-agentic task becomes a deterministic `page.goto` + one `act()` — no agent loop. + +**Before — browser-use** +```python +import asyncio + +from browser_use import Agent, ChatAnthropic + + +async def main() -> None: + agent = Agent( + task="Go to Hacker News (news.ycombinator.com) and open the top story", + llm=ChatAnthropic(model="claude-sonnet-4-6"), + ) + history = await agent.run(max_steps=20) + print(history.final_result()) + + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**After — Stagehand v3** +```typescript +import "dotenv/config"; +import { Stagehand } from "@browserbasehq/stagehand"; + +async function main() { + const stagehand = new Stagehand({ + env: "BROWSERBASE", // use "LOCAL" for local dev with a real Chrome + model: "anthropic/claude-sonnet-4-6", + }); + await stagehand.init(); + try { + const page = stagehand.context.pages()[0]; + + await page.goto("https://news.ycombinator.com"); // deterministic + await stagehand.act("click the top story's title link"); // AI: markup varies + + console.log("Opened:", page.url()); + } finally { + await stagehand.close(); + } +} + +main().catch((err) => { + console.error(err); + process.exit(1); +}); +``` + +--- + +## 2. Structured extraction + +A Pydantic `output_model_schema` becomes a zod `extract()` — no agent loop, just navigate then read. + +**Before — browser-use** +```python +import asyncio + +from pydantic import BaseModel + +from browser_use import Agent, ChatOpenAI + + +class Story(BaseModel): + title: str + points: int + comments: int + + +class TopStories(BaseModel): + stories: list[Story] + + +async def main() -> None: + agent = Agent( + task="Go to Hacker News and return the top 5 stories with title, points, and comment count", + llm=ChatOpenAI(model="gpt-5"), + output_model_schema=TopStories, + ) + history = await agent.run() + data = history.structured_output # TopStories instance + for story in data.stories: + print(story.title, story.points, story.comments) + + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**After — Stagehand v3** +```typescript +import "dotenv/config"; +import { Stagehand } from "@browserbasehq/stagehand"; +import { z } from "zod"; + +async function main() { + const stagehand = new Stagehand({ env: "BROWSERBASE", model: "openai/gpt-5" }); + await stagehand.init(); + try { + const page = stagehand.context.pages()[0]; + await page.goto("https://news.ycombinator.com"); + await page.waitForLoadState("domcontentloaded"); // settle before the AI snapshot + + const stories = await stagehand.extract( + "extract the top 5 stories with their title, points, and comment count", + z.array( + z.object({ + title: z.string(), + points: z.number(), + comments: z.number().describe("number of comments"), + }), + ), + ); + + for (const s of stories) console.log(s.title, s.points, s.comments); + } finally { + await stagehand.close(); + } +} + +main().catch((err) => { + console.error(err); + process.exit(1); +}); +``` + +--- + +## 3. Login with sensitive data + +`sensitive_data` → `variables` (secrets never reach the LLM). The known form is driven with +deterministic `act()` steps. `allowed_domains` has no direct Stagehand equivalent — replace it with +a `page.url()` host check. For repeat runs, reuse a Browserbase **Context** to skip re-login. + +**Before — browser-use** +```python +import asyncio +import os + +from browser_use import Agent, Browser, ChatAnthropic + + +async def main() -> None: + sensitive_data = { + "https://example.com": { + "x_user": os.environ["APP_USER"], + "x_pass": os.environ["APP_PASS"], + } + } + agent = Agent( + task="Log into example.com using username x_user and password x_pass, then open the dashboard", + llm=ChatAnthropic(model="claude-sonnet-4-6"), + sensitive_data=sensitive_data, + use_vision=False, # don't leak secrets via screenshots + browser=Browser(allowed_domains=["https://*.example.com"]), + ) + await agent.run() + + +if __name__ == "__main__": + asyncio.run(main()) +``` + +**After — Stagehand v3** +```typescript +import "dotenv/config"; +import { Stagehand } from "@browserbasehq/stagehand"; + +async function main() { + const stagehand = new Stagehand({ + env: "BROWSERBASE", + model: "anthropic/claude-sonnet-4-6", + // Reuse auth across runs with a Context: + // browserbaseSessionCreateParams: { + // projectId: process.env.BROWSERBASE_PROJECT_ID!, + // browserSettings: { context: { id: process.env.BB_CONTEXT_ID!, persist: true } }, + // }, + }); + await stagehand.init(); + try { + const page = stagehand.context.pages()[0]; + await page.goto("https://example.com/login"); + + await stagehand.act("type %username% into the email field", { + variables: { username: process.env.APP_USER! }, + }); + await stagehand.act("type %password% into the password field", { + variables: { password: process.env.APP_PASS! }, + }); + await stagehand.act("click the sign in button"); + + await page.waitForLoadState("domcontentloaded"); + + // Best-effort stand-in for allowed_domains=["https://*.example.com"]. browser-use + // enforces the allow-list across the ENTIRE run; a host check only covers the moment + // it runs, so call it after *every* navigation — not just sign-in. For real continuous + // enforcement use Browserbase proxy domain rules (api-mapping §5); this throw is only a + // tripwire and is flagged "needs human review" in the migration summary. + const assertAllowedHost = () => { + const host = new URL(page.url()).hostname; + if (host !== "example.com" && !host.endsWith(".example.com")) { + throw new Error(`navigated off the allow-list: ${page.url()}`); + } + }; + assertAllowedHost(); + console.log("Logged in:", page.url()); + + // Second half of the task ("…then open the dashboard") — don't stop at login. + await stagehand.act("open the dashboard"); + await page.waitForLoadState("domcontentloaded"); + assertAllowedHost(); // re-check: the guardrail must cover this navigation too + console.log("Dashboard:", page.url()); + } finally { + await stagehand.close(); + } +} + +main().catch((err) => { + console.error(err); + process.exit(1); +}); +``` diff --git a/skills/browser-use-to-stagehand/LICENSE.txt b/skills/browser-use-to-stagehand/LICENSE.txt new file mode 100644 index 0000000..f2f4397 --- /dev/null +++ b/skills/browser-use-to-stagehand/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 Browserbase, Inc. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/browser-use-to-stagehand/SKILL.md b/skills/browser-use-to-stagehand/SKILL.md new file mode 100644 index 0000000..26b4f81 --- /dev/null +++ b/skills/browser-use-to-stagehand/SKILL.md @@ -0,0 +1,197 @@ +--- +name: browser-use-to-stagehand +description: Migrate browser-use (Python) browser-automation scripts to Stagehand v3 (TypeScript) on Browserbase. Use when the user wants to convert, port, rewrite, or migrate a browser-use Agent script to Stagehand, map browser-use features/APIs to Stagehand primitives (act/extract/observe/agent), or move agentic browser automation onto Browserbase with more determinism. Triggers on "browser-use", "browser_use", or "Agent(task=...)". +compatibility: "The skill itself uses only Read/Write/Edit/Grep/Bash — no install step. The Stagehand code it generates needs Node 18+, `@browserbasehq/stagehand` (v3) and `zod`, plus `BROWSERBASE_API_KEY` / `BROWSERBASE_PROJECT_ID` and a model-provider key (e.g. `ANTHROPIC_API_KEY`) to run. The optional trace-assisted path uses the Browserbase SDK or the sibling `browser-trace` skill." +license: MIT +allowed-tools: Read, Write, Edit, Grep, Bash +--- + +# browser-use → Stagehand on Browserbase (`/browser-use-to-stagehand`) + +Convert a browser-use (Python) script into an idiomatic **Stagehand v3 (TypeScript)** script on +**Browserbase**, choosing the right level of determinism at each step rather than producing a +one-to-one agentic copy. + +**Core principle:** browser-use is agentic-by-default (the LLM decides every action). Stagehand +lets you choose how much AI to use. A good migration replaces opaque agent loops with an +inspectable, mostly-deterministic pipeline — using AI only where the page is genuinely +unpredictable. This is a refactor with judgment, not a transpile. + +> **Source of truth & versions.** This skill's durable value is the *judgment* — the determinism +> spectrum and the decompose-vs-agent decision — not the API specifics, which drift every release. +> The code mappings here are a **snapshot validated against `@browserbasehq/stagehand` 3.6.x and +> browser-use 0.13.x (2026-06)**. On any conflict, the **live docs win** — always verify against the +> installed package and these sources before emitting code: +> - Stagehand v3: · installed types: `node_modules/@browserbasehq/stagehand` +> - Browserbase: +> - browser-use: +> +> If the installed Stagehand major is **not 3**, treat this skill as conceptual only and follow the +> live docs for every signature. + +## Reference files (read as needed) + +- [`references/api-mapping.md`](references/api-mapping.md) — the mechanical browser-use → Stagehand + mapping: variant detection, the full feature table, before/after code, Browserbase platform + options, and v3 version gotchas. **Read this for any non-trivial construct.** +- [`references/determinism.md`](references/determinism.md) — how to choose `agent()` vs + `act`/`extract`/`observe` vs cached `observe`→`act`. The decision tree. **Read this when deciding + how to translate an `Agent(task=…)`.** +- [`references/trace-assisted.md`](references/trace-assisted.md) — the optional "run it on + Browserbase, read the logs, then rewrite" workflow for opaque/flaky scripts. +- [`references/guide.md`](references/guide.md) — the human migration guide: philosophy shift, + feature mapping, the determinism spectrum, and a recommended migration path. +- [`references/prompt.md`](references/prompt.md) — a self-contained, tool-agnostic version of this + skill; paste it into any AI assistant along with a browser-use script. +- [`EXAMPLES.md`](EXAMPLES.md) — before/after script pairs. + +## Workflow + +### 1. Get the source +Obtain the browser-use script(s). If the user only described a script, ask for the file(s). Note +the target: **TypeScript Stagehand on Browserbase** unless they say otherwise. + +> **First, gate on scope — is this even migratable?** Not every browser-use file is an +> `Agent(task=…)` script. If the source is **browser-use running as an MCP server** +> (`uvx browser-use --mcp`, a `mcpServers` config) there is **no Stagehand equivalent** — flag it as +> out of scope, don't invent one (see api-mapping §3.7b). If the browser-use call is **embedded in a +> larger app** (a class/tool wrapper, web route, queue task), convert only the browser-use surface and +> preserve the surrounding app glue — see api-mapping §3.8. + +### 2. Detect the browser-use variant +Identify legacy (pre-0.12) vs stable vs Rust beta (only when imports come from `browser_use.beta`) +— see api-mapping §1. Note: the classic top-level `from browser_use import Agent, ChatBrowserUse` +surface is alive and well in 0.13.x — `ChatBrowserUse` alone is **not** a beta tell; only a +`browser_use.beta` import is. All variants translate identically, so when unsure, proceed with the +stable mapping. Normalize legacy names before translating. State which variant you found. + +### 3. Inventory the script +Extract a structured inventory before writing any TypeScript: +- **Task(s)** — the `task=` string(s); split each into its implied ordered steps. +- **Model** — the `Chat*` provider + model id. +- **Browser config** — local vs `cdp_url`/Browserbase; headless; proxies; `user_data_dir`/`storage_state`. +- **Structured output** — any `output_model_schema` Pydantic models. +- **Secrets** — `sensitive_data`, env-var usage, login flows. +- **Guardrails** — `allowed_domains`, `max_steps`. +- **Custom actions** — `@tools.action` / `Controller` functions, and whether each is a deterministic + side-effect or an agent capability. +- **Setup** — `initial_actions`, secondary models (`page_extraction_llm`, `planner_llm`). + +### 4. Decide the determinism level per step +For each step from the inventory, apply the decision tree in determinism.md: +- Navigate to a known URL → `page.goto(url)` on the Stagehand page (no AI). +- On-page action → `act("…")`; if it repeats, `observe()` once then replay `act(action)` (no LLM call). +- Reading data → `extract("…", zodSchema)`. +- Genuinely open-ended → keep `stagehand.agent().execute(...)` (tightened with `maxSteps`/`systemPrompt`). + +Default to **decomposition** when the flow is known; keep `agent()` only where it isn't. For a +first lift-and-shift, a faithful `agent()` translation is acceptable — say so and note the +optimization path. + +### 5. Produce the Stagehand v3 rewrite +**First, verify the API.** Before writing, confirm the exact signatures you're about to use against +the installed package (`node_modules/@browserbasehq/stagehand` types) or . +The mappings below are a 3.6.x snapshot; if anything differs in the installed version, the installed +version wins. Then emit runnable TypeScript. Always: +- `import { Stagehand } from "@browserbasehq/stagehand";` and `import { z } from "zod";` when extracting. +- Get the page via `const page = stagehand.context.pages()[0];`. +- Call AI methods on the **instance**: `stagehand.act(...)`, `stagehand.extract(...)`, + `stagehand.observe(...)` — **never** `page.act(...)`. +- Set the model as a `"provider/model"` string. +- Default to `env: "BROWSERBASE"`; show `env: "LOCAL"` as the dev option. +- Pass secrets via `variables` and `process.env`, never hardcoded. +- `await stagehand.init()` at the start, `await stagehand.close()` in a `finally`. + +Include the project setup so it runs (see the templates below). + +### 6. Write the migration summary +Alongside the code, produce a short summary: +- **Variant detected** and the determinism choices made (which steps became deterministic vs AI vs agent), with the reasoning. +- **Needs human review** — anything that didn't map 1:1: lost `allowed_domains` guardrails, + custom-action logic, secondary-model intent, ambiguous task strings. +- **Recommended next step** — Browserbase Context for auth reuse, caching for production, or the + trace-assisted path if the flow was opaque. + +### 7. Offer the trace-assisted path (only if warranted) +If the source was one large opaque `agent(task=…)`, was flaky, or your rewrite can't be confidently +mapped, offer the trace-assisted workflow (trace-assisted.md): run the original on Browserbase, pull +`sessions.logs.list`, and rewrite from observed behavior. Don't run anything without the user's go-ahead. + +## Output templates + +**`package.json`** +```json +{ + "name": "stagehand-migration", + "type": "module", + "scripts": { "start": "tsx index.ts" }, + "dependencies": { + "@browserbasehq/stagehand": "^3.0.0", + "dotenv": "^16.0.0", + "zod": "^3.25.0" + }, + "devDependencies": { "tsx": "^4.0.0", "typescript": "^5.0.0" } +} +``` +> Add `"ai": "^5.0.0"` (Vercel AI SDK) **only** if a custom browser-use action maps to an agent +> `tool`. **Pin v5, not v4** — Stagehand 3.6.x bundles `ai` v5 and types `agent({ tools })` as the v5 +> `ToolSet`, where a tool's schema field is **`inputSchema`**. The v4 `tool()` helper emits +> `parameters` instead and will **fail to type-check** against Stagehand's v5 `ToolSet`. If you can't +> control the hoisted `ai` version, skip the `tool()` helper and pass a plain object +> `{ description, inputSchema: zodSchema, execute }` — it satisfies the v5 `ToolSet` regardless of which +> `ai` major resolves. + +**`.env`** +```bash +BROWSERBASE_API_KEY=... +BROWSERBASE_PROJECT_ID=... +ANTHROPIC_API_KEY=... # or the provider matching your model string +``` + +**`index.ts` skeleton** (decomposed, the preferred shape) +```typescript +import "dotenv/config"; +import { Stagehand } from "@browserbasehq/stagehand"; +import { z } from "zod"; + +async function main() { + const stagehand = new Stagehand({ + env: "BROWSERBASE", + model: "anthropic/claude-sonnet-4-6", + }); + await stagehand.init(); + try { + const page = stagehand.context.pages()[0]; + + await page.goto("https://example.com"); // deterministic skeleton + await stagehand.act("…"); // AI where the page varies + const data = await stagehand.extract("…", z.object({ /* … */ })); // structured reads + + console.log(data); + } finally { + await stagehand.close(); + } +} + +main().catch((err) => { console.error(err); process.exit(1); }); +``` + +## Validation checklist (before declaring done) +- [ ] AI methods are on the **instance** (`stagehand.act/extract/observe`), not the page. +- [ ] Page obtained via `stagehand.context.pages()[0]`. +- [ ] Model is a `"provider/model"` string; the matching provider key is in `.env`. +- [ ] `extract` uses a zod schema; `zod` is in dependencies. +- [ ] Secrets use `variables` + `process.env`; nothing hardcoded. +- [ ] `init()` / `close()` present; `close()` in `finally`. +- [ ] Each browser-use step is accounted for, placed deliberately on the determinism spectrum. +- [ ] Migration summary lists determinism choices and "needs human review" items. + +## Common mistakes to avoid +- **Copying v2 syntax** (`page.act()`, `stagehand.page`, `modelName`/`modelClientOptions`, + `enableCaching`) from old blog posts. Use v3 — see api-mapping "Version notes". +- **Translating every step into `act()`** — navigate with `page.goto` and cache repeatable steps via `observe`→`act`; don't spend an LLM call on every action. +- **Defaulting everything to `agent()`** — that just reproduces browser-use's non-determinism in a + new framework. Decompose where the flow is known. +- **Silently dropping `allowed_domains`** — Stagehand has no domain firewall; flag it for review. +- **Inventing Browserbase/Stagehand options** — if unsure of a field, check + / rather than guessing. diff --git a/skills/browser-use-to-stagehand/references/api-mapping.md b/skills/browser-use-to-stagehand/references/api-mapping.md new file mode 100644 index 0000000..27b0b2a --- /dev/null +++ b/skills/browser-use-to-stagehand/references/api-mapping.md @@ -0,0 +1,505 @@ +# browser-use → Stagehand + Browserbase: API Mapping + +The authoritative, mechanical mapping the `/browser-use-to-stagehand` skill uses to translate code. Pair it +with [`determinism.md`](determinism.md) (which Stagehand primitive to reach for) and +[`trace-assisted.md`](trace-assisted.md) (the optional run-and-observe path). + +Stagehand examples here target **v3** (`@browserbasehq/stagehand` ≥ 3.x), the current major +version. v3 is a rewrite — see the [Version notes](#version-notes-read-before-translating) at +the bottom before trusting any older snippet you find online. + +> ⚠️ **This is a point-in-time snapshot (validated against Stagehand 3.6.x / browser-use 0.13.x, +> 2026-06), not a live spec.** Signatures and option names drift every release. The **live docs +> supersede this table on any conflict** — verify against the installed package +> (`node_modules/@browserbasehq/stagehand`) or / +> / before relying on an exact signature. + +--- + +## 1. Detect the browser-use variant first + +browser-use is mid-transition across three API shapes. Identify which one the source script +uses before translating, because the imports and class names differ. + +| Variant | Tell-tale imports / calls | Notes | +|---|---|---| +| **Legacy** (pre-0.12) | `from browser_use import Browser, BrowserConfig`; `Browser(config=BrowserConfig(...))`; `BrowserContext`; `Controller()`, `@controller.action` | Deprecated. Normalize to current names first (see §6), then translate. | +| **Stable** (0.12.x) | `from browser_use import Agent, Browser, BrowserProfile, Tools, ChatOpenAI` ; `Browser(browser_profile=BrowserProfile(...))` ; `Tools()`, `@tools.action` | What essentially all teams run today. The primary migration source. | +| **Rust beta** (0.13.x) | imports specifically from **`from browser_use.beta import ...`** | Opt-in new loop. The *only* reliable tell is the `browser_use.beta` import path. Translate the same way — the public surface (`Agent(task=, llm=)`, `agent.run()`) is identical. | + +> **Note on 0.13.x:** the classic top-level `from browser_use import Agent, ChatBrowserUse, ChatOpenAI` +> surface is still the default in 0.13.x — `ChatBrowserUse` is browser-use's hosted-model class and +> appears in *stable* code, **not** just the beta. Don't classify a script as "Rust beta" just because +> it imports `ChatBrowserUse`; require a literal `browser_use.beta` import. When in doubt, use the +> stable mapping (all variants translate the same way anyway). + +Key renames to recognize (all three may appear in a codebase): +- `Browser` is now an **alias for `BrowserSession`** — same class. Old `BrowserContext` is gone (folded into the session). +- `Controller` is a backwards-compat alias for **`Tools`**; `@controller.action` ≡ `@tools.action`. +- In custom actions, the injected browser param must be named exactly **`browser_session: BrowserSession`**. + +--- + +## 2. Top-level mapping table + +| browser-use (Python) | Stagehand v3 (TypeScript) / Browserbase | +|---|---| +| `Agent(task=..., llm=...)` + `await agent.run()` | **Decompose** into `stagehand.act` / `extract` / `observe` when the flow is known (preferred); else faithful `stagehand.agent().execute(...)`. See [determinism.md](determinism.md). | +| `llm=ChatAnthropic(model="claude-sonnet-4-6")` | `new Stagehand({ model: "anthropic/claude-sonnet-4-6" })` (or per-call `{ model }`) | +| `llm=ChatOpenAI(model="gpt-5")` | `model: "openai/gpt-5"` | +| `llm=ChatGoogle(model="gemini-2.5-flash")` | `model: "google/gemini-2.5-flash"` | +| `llm=ChatBrowserUse()` (default) | pick a provider string, e.g. `"google/gemini-2.5-flash"` (fast/cheap) or `"anthropic/claude-sonnet-4-6"` | +| `agent.run(max_steps=30)` | `agent().execute({ instruction, maxSteps: 30 })` | +| `output_model_schema=MyPydanticModel` | **Preferred:** decompose and `stagehand.extract("...", zodSchema)` for the typed read. For an agentic run, `agent().execute({ output: zodObjectSchema })` — but see the two gotchas below. | +| `history.final_result()` | `extract(...)` return value, or `result.message` from an agent run | +| `history.structured_output` | typed `extract(...)` return, or `result.output` from `agent().execute({ output })` (gotchas below) | + +> **Agent `output` gotchas (both bite at runtime, not compile time — verified on 3.6.x):** +> 1. **Requires experimental mode.** `agent().execute({ output })` throws +> `ExperimentalNotConfiguredError` unless the Stagehand constructor sets `experimental: true`. +> Note experimental mode changes execution (it bypasses the managed Stagehand API path), so prefer +> the alternative below for production. **This applies to three agent features — `output` schema, +> custom `tools` (§3.5), and MCP `integrations` (§3.7) all require `experimental: true`** (validated +> together at agent creation). If you keep any of them, set it on the constructor and note the +> managed-API-bypass tradeoff in the migration summary. +> 2. **Must be a zod *object*** (`output?: StagehandZodObject`) — unlike `extract`, a top-level +> `z.array(...)` is rejected. Wrap lists: `z.object({ items: z.array(...) })`. +> +> **Recommended non-experimental pattern for a typed result from an agentic task:** run the agent +> without `output`, then do a separate `stagehand.extract("...", zodSchema)` on the final page (or +> over `result.message`). This keeps the managed API path and lets `extract` use a top-level array. +| `Browser()` (local default) | `new Stagehand({ env: "LOCAL", localBrowserLaunchOptions: { ... } })` | +| `Browser(cdp_url=session.connect_url)` (Browserbase) | `new Stagehand({ env: "BROWSERBASE" })` — Stagehand creates & manages the session | +| `BrowserProfile(headless=False)` | `localBrowserLaunchOptions: { headless: false }` | +| `BrowserProfile(proxy=...)` / Browserbase proxies | `browserbaseSessionCreateParams: { proxies: true }` | +| `BrowserProfile(user_data_dir=...)` / `storage_state=...` | Browserbase **Context** (`browserSettings.context: { id, persist: true }`) or LOCAL `localBrowserLaunchOptions.userDataDir` | +| `BrowserProfile(allowed_domains=[...])` | No first-class equivalent — see [§5 Gaps](#5-gaps-no-clean-equivalent) | +| `sensitive_data={...}` | `act("...%key%...", { variables: { key } })` | +| `initial_actions=[...]` | plain deterministic code before the AI calls: `await page.goto(...)`, etc. | +| `Tools()` / `@tools.action(...)` | `agent({ tools: { name: tool({...}) } })` (Vercel AI SDK), or just plain TS for deterministic side-effects | +| `use_vision=True / False / "auto"` | agent `mode: "hybrid"/"cua"` (vision) vs `"dom"` (default, no vision) | +| `page_extraction_llm=...` | `extract("...", schema, { model })` | +| `planner_llm=...` | `agent({ model, executionModel })` — `model` plans, `executionModel` runs the inner act/observe | + +--- + +## 3. Detailed translations + +### 3.1 The Agent (the central decision) + +A browser-use `Agent` is a fully-agentic loop: the LLM decides every click. In Stagehand you +choose how much of that to keep. **Default to decomposition** when the script's intent reveals a +concrete sequence; fall back to `agent()` only for genuinely open-ended tasks. Always note the +choice in the migration summary. (Full decision framework: [determinism.md](determinism.md).) + +**Before — browser-use** +```python +from browser_use import Agent, ChatAnthropic + +agent = Agent( + task="Go to Hacker News and open the top story", + llm=ChatAnthropic(model="claude-sonnet-4-6"), +) +history = await agent.run(max_steps=30) +print(history.final_result()) +``` + +**After — decomposed (preferred: deterministic, debuggable, cheaper)** +```typescript +import { Stagehand } from "@browserbasehq/stagehand"; + +const stagehand = new Stagehand({ env: "BROWSERBASE", model: "anthropic/claude-sonnet-4-6" }); +await stagehand.init(); + +const page = stagehand.context.pages()[0]; +await page.goto("https://news.ycombinator.com"); +await stagehand.act("click the top story link"); + +await stagehand.close(); +``` + +**After — faithful agentic (when the flow is open-ended)** +```typescript +const stagehand = new Stagehand({ env: "BROWSERBASE", model: "anthropic/claude-sonnet-4-6" }); +await stagehand.init(); + +const agent = stagehand.agent(); +const result = await agent.execute({ + instruction: "Go to Hacker News and open the top story", + maxSteps: 30, +}); +console.log(result.message); + +await stagehand.close(); +``` + +### 3.2 Structured output → `extract()` with a zod schema + +Pydantic models become zod schemas. v3 supports a **top-level array schema** (no wrapper object +needed). Prefer extracting *after* navigating to the page deterministically. + +**Before** +```python +from pydantic import BaseModel +from browser_use import Agent, ChatOpenAI + +class Story(BaseModel): + title: str + points: int + +class Stories(BaseModel): + stories: list[Story] + +agent = Agent( + task="Get the top 5 Hacker News stories with title and points", + llm=ChatOpenAI(model="gpt-5"), + output_model_schema=Stories, +) +history = await agent.run() +data = history.structured_output # Stories instance +``` + +**After** +```typescript +import { z } from "zod"; + +const page = stagehand.context.pages()[0]; +await page.goto("https://news.ycombinator.com"); + +const stories = await stagehand.extract( + "extract the top 5 stories with their title and points", + z.array(z.object({ + title: z.string(), + points: z.number(), + })), +); +// stories is fully typed: { title: string; points: number }[] +``` + +Use `.describe()` on fields to steer the model, mirroring Pydantic `Field(description=...)`: +```typescript +z.object({ price: z.string().describe("price including the currency symbol") }) +``` + +### 3.3 Sensitive data / login → `variables` + +browser-use's `sensitive_data` keeps secrets out of the prompt by injecting placeholder keys. +Stagehand's `variables` do the same: the `%key%` token is sent to the LLM, the real value is +substituted locally and never leaves your machine. + +**Before** +```python +sensitive_data = { + "https://example.com": {"x_user": "real@email.com", "x_pass": "s3cret"}, +} +agent = Agent( + task="Log into example.com with username x_user and password x_pass", + llm=llm, + sensitive_data=sensitive_data, + use_vision=False, + browser=Browser(allowed_domains=["https://*.example.com"]), +) +await agent.run() +``` + +**After** +```typescript +const page = stagehand.context.pages()[0]; +await page.goto("https://example.com/login"); + +await stagehand.act("type %username% into the email field", { + variables: { username: process.env.APP_USER! }, +}); +await stagehand.act("type %password% into the password field", { + variables: { password: process.env.APP_PASS! }, +}); +await stagehand.act("click the sign in button"); +``` +For repeat runs, prefer a **Browserbase Context** so you log in once and reuse the authenticated +state (see §4) — this is the biggest reliability win in most migrations. + +### 3.4 Browser configuration + +**Local (dev)** +```python +# browser-use +browser = Browser(browser_profile=BrowserProfile(headless=False)) +``` +```typescript +// Stagehand +const stagehand = new Stagehand({ + env: "LOCAL", + localBrowserLaunchOptions: { headless: false, viewport: { width: 1280, height: 720 } }, +}); +``` + +**Browserbase (prod)** — in browser-use you create the session yourself and pass `cdp_url`. In +Stagehand, `env: "BROWSERBASE"` creates and manages the session for you. +```python +# browser-use +bb = Browserbase(api_key=os.environ["BROWSERBASE_API_KEY"]) +session = bb.sessions.create(project_id=os.environ["BROWSERBASE_PROJECT_ID"]) +browser = Browser(browser_profile=BrowserProfile(cdp_url=session.connect_url)) +``` +```typescript +// Stagehand — apiKey/projectId default to BROWSERBASE_API_KEY / BROWSERBASE_PROJECT_ID env vars +const stagehand = new Stagehand({ env: "BROWSERBASE" }); +``` + +### 3.5 Custom actions (`@tools.action`) + +Decide whether the action is a **deterministic side-effect** (just write TypeScript) or a +**capability you want the autonomous agent to choose** (register a tool). + +**Before** +```python +from browser_use import Tools, ActionResult, BrowserSession + +tools = Tools() + +@tools.action("Save the current URL to a file") +async def save_url(browser_session: BrowserSession) -> ActionResult: + url = await browser_session.get_current_page_url() + with open("url.txt", "w") as f: + f.write(url) + return ActionResult(extracted_content=f"saved {url}") + +agent = Agent(task="...", llm=llm, tools=tools) +``` + +**After — option A: plain code (preferred for deterministic side-effects)** +```typescript +import { writeFile } from "node:fs/promises"; + +const page = stagehand.context.pages()[0]; +const url = page.url(); +await writeFile("url.txt", url); +``` + +**After — option B: a tool the agent can call** (uses the Vercel AI SDK `tool()`; **requires +`experimental: true`** — see the experimental callout below). **`ai` must be v5** (`"ai": "^5.0.0"`): +Stagehand 3.6.x types `tools` as the v5 `ToolSet` (schema field **`inputSchema`**); the v4 `tool()` +helper emits `parameters` and fails to type-check. If you can't pin v5, drop the `tool()` helper and +use a plain object `{ description, inputSchema: zodSchema, execute }` (satisfies the v5 `ToolSet` +regardless of the resolved `ai` major). +```typescript +import { tool } from "ai"; // add the "ai" package (Vercel AI SDK) to your deps +import { z } from "zod"; +import { writeFile } from "node:fs/promises"; + +const stagehand = new Stagehand({ + env: "BROWSERBASE", + experimental: true, // REQUIRED for agent `tools` (throws ExperimentalNotConfiguredError otherwise) + model: "anthropic/claude-sonnet-4-6", +}); +await stagehand.init(); + +const page = stagehand.context.pages()[0]; + +const agent = stagehand.agent({ + tools: { + saveUrl: tool({ + description: "Save the current page URL to a file", + // The original reads the URL from `browser_session`, NOT the LLM. So take + // no model args and close over `page` — otherwise the model can pass a + // guessed/hallucinated URL. (Only put a field in inputSchema when the value + // genuinely has to come from the agent's reasoning.) + inputSchema: z.object({}), + execute: async () => { + const url = page.url(); + await writeFile("url.txt", url); + return `saved ${url}`; + }, + }), + }, +}); +await agent.execute("..."); +``` + +**Custom-action gaps (flag for human review — no clean 1:1):** +- **Injected special params.** browser-use auto-wires `browser_session`, `cdp_client`, + `page_extraction_llm`, `available_file_paths`, etc. into an action by signature. Stagehand tools + get none of that — instead **close over** what you need in the `execute` body (capture the + `stagehand` instance / `page`, call `stagehand.act/extract` inside the tool). +- **`domains=[...]` per-action filtering** — no equivalent; gate inside `execute` with a `page.url()` + host check (same pattern as the `allowed_domains` gap in §5). +- **`terminates_sequence=True`** — no equivalent control-flow flag. + +### 3.6 Secondary models + +| browser-use | Stagehand v3 | +|---|---| +| `page_extraction_llm=ChatOpenAI(model="gpt-5-mini")` | `extract("...", schema, { model: "openai/gpt-5-mini" })` | +| `planner_llm=...` + main `llm=...` | `agent({ model: "", executionModel: "" })` | + +### 3.7 MCP integration + +browser-use has MCP in **two directions** — map only the *client* direction. + +**(a) browser-use as MCP _client_ (consumes an external MCP server)** → Stagehand +`agent({ integrations })` (**requires `experimental: true`**). Stagehand accepts either a URL string +or an MCP `Client` from `connectToMCPServer({ command, args, env })`, so both browser-use transports map: + +**Before — browser-use (`MCPClient`, stdio/local server)** +```python +from browser_use.mcp.client import MCPClient +from browser_use import Tools, Agent, ChatOpenAI + +tools = Tools() +mcp = MCPClient(server_name="filesystem", command="npx", + args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]) +await mcp.connect() +await mcp.register_to_tools(tools) # external MCP tools become agent actions +agent = Agent(task="...", llm=ChatOpenAI(model="gpt-4o"), tools=tools) +``` +**After — Stagehand** +```typescript +import { Stagehand, connectToMCPServer } from "@browserbasehq/stagehand"; + +const stagehand = new Stagehand({ + env: "BROWSERBASE", + experimental: true, // REQUIRED for `integrations` + model: "openai/gpt-4.1-mini", +}); +await stagehand.init(); + +// local/stdio MCP server -> Client instance: +const fsClient = await connectToMCPServer({ + command: "npx", + args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"], +}); +const agent = stagehand.agent({ integrations: [fsClient] }); +// remote MCP server -> just the URL: integrations: ["https://mcp.example.com/mcp?key=..."] +``` +- `register_to_tools(..., tool_filter=[...], prefix="...")` has **no equivalent** — Stagehand attaches + all of the server's tools; flag if the source filtered/renamed them. + +**(b) browser-use as MCP _server_** (`uvx browser-use --mcp`, exposing the browser as MCP tools to +Claude Desktop etc.) → **no Stagehand equivalent.** This is not a script migration — flag it as out of +scope rather than converting it. + +For both MCP-client transports, dispose the connection in `finally`: an MCP `Client` exposes +`.close()` (the analog of browser-use's `await mcp.disconnect()`). + +> ⚠️ **Known runtime bug (Stagehand 3.6.0, verified live):** passing a **`Client` instance** (i.e. a +> local/stdio server via `connectToMCPServer`) into `integrations` throws +> `TypeError: Converting circular structure to JSON` *before the agent runs* — `agent()` does +> `JSON.stringify(options.integrations)` for event logging and the Client object is circular +> (reproduced with two different MCP servers). A plain **URL-string** integration is unaffected, so +> prefer **remote/URL MCP** until this is fixed upstream; for a stdio-only server, flag that the +> migration is correct but blocked by this Stagehand bug. + +### 3.8 Real-world patterns (embedded code, vision, legacy result handling) + +Real browser-use lives inside apps, not clean `main()` scripts. Handle these explicitly: + +- **Embedded / wrapped code** (a LangChain `BaseTool`, FastAPI route, Celery/queue task, an + `Executor` class). Convert **only the browser-use surface** — `Agent(...)` construction, + `agent.run()`, result handling, custom tools — and **preserve the surrounding app glue** (the class, + decorators, routing, logging) as-is. The `init()`/`close()`-in-`main()` skeleton and "page via + `context.pages()[0]`" checklist items don't apply verbatim to a wrapper; adapt them to the wrapper's + lifecycle. +- **Sync-over-async.** Legacy embedded code often drives the async `agent.run()` from a sync method via + `asyncio.new_event_loop()`. There's no faithful Node equivalent — the converted method becomes + `async` and callers must `await` it. Note this in the migration summary. +- **Long-lived stateful executors.** A persistent mutable `self.agent` (`add_new_task`, `pause/resume`, + mid-run controller swaps, EventBus re-init) has **no Stagehand analog** — Stagehand agents aren't + long-lived mutable objects. Model continuity with the stateless `messages` array (`AgentResult.messages` + → next `execute({ messages })`) and rebuild tools per `agent()` call. `pause/resume` ≈ `AbortSignal`; + `rerun_history(plan)` ≈ observe→act caching, **not** a 1:1. Flag these. +- **Vision intent.** If the source LLM is a vision model (`vl_`/`ChatVL`-style, `use_vision=True`) but + you map it to a text model string, you silently drop vision. Preserve the intent with agent + `mode: "hybrid"` or `"cua"` (both need `experimental: true`), or flag it. +- **Legacy result API.** Besides `history.final_result()` (method), legacy code uses the attribute form + `result.final_result` and `isinstance(result, AgentHistoryList)` coercion → all map to + `result.message` / `result.output` from the Stagehand agent result. + +--- + +## 4. Browserbase platform features + +Everything you could set on a raw Browserbase session is reachable through +`browserbaseSessionCreateParams` (it is literally Browserbase's `SessionCreateParams`). + +| Need | browser-use today | Stagehand v3 | +|---|---|---| +| Persistent auth / cookies | `storage_state` / `user_data_dir` | **Context**: `browserSettings.context: { id, persist: true }` | +| Proxies | profile proxy / BB session proxies | `proxies: true` (or array form for geo/domain rules) | +| Stealth / fingerprinting | (mostly via BB session) | `browserSettings.advancedStealth: true` (Scale plan); Verified Sessions | +| Captcha solving | BB session | `browserSettings.solveCaptchas: true` (on by default) | +| Ad blocking | — | `browserSettings.blockAds: true` | +| Region | — | `region: "us-east-1" | "us-west-2" | "eu-central-1" | "ap-southeast-1"` | +| Keep session alive | `keep_alive=True` | `keepAlive: true` | +| Downloads | — | `bb.sessions.downloads.list(id)` via `@browserbasehq/sdk` | + +```typescript +const stagehand = new Stagehand({ + env: "BROWSERBASE", + browserbaseSessionCreateParams: { + projectId: process.env.BROWSERBASE_PROJECT_ID!, + region: "us-east-1", + proxies: true, + keepAlive: true, + browserSettings: { + blockAds: true, + solveCaptchas: true, + context: { id: process.env.BB_CONTEXT_ID!, persist: true }, + }, + }, +}); +``` + +--- + +## 5. Gaps (no clean equivalent) + +Call these out explicitly in the migration summary — they are guardrails or behaviors that do +not transfer 1:1. + +- **`allowed_domains`** — browser-use can hard-block navigation off an allow-list. Stagehand has + no built-in domain firewall. Mitigations: (a) constrain the autonomous agent via `systemPrompt` + ("only operate within example.com"); (b) use Browserbase proxy **domain rules**; (c) enforce in + your own code — check `page.url()` before acting. If the source relied on `allowed_domains` as a + security boundary (it pairs with `sensitive_data`), flag it as **needs human review**. +- **Per-step thinking / `use_thinking` / `flash_mode`** — browser-use exposes loop-level + reasoning knobs. In Stagehand, decomposed `act`/`extract` calls are already targeted; for speed, + use a fast model (`google/gemini-2.5-flash`) and decomposition rather than a "flash mode" flag. +- **`max_actions_per_step`** — no equivalent; decomposition makes each action explicit instead. +- **`initial_actions`** — there is no special field; it simply becomes ordinary code that runs + before your first AI call. + +--- + +## 6. Legacy browser-use patterns → current (normalize before translating) + +| Legacy | Current browser-use | +|---|---| +| `Browser(config=BrowserConfig(...))` | `Browser(browser_profile=BrowserProfile(...))` or kwargs on `Browser(...)` | +| `BrowserContext` | gone — folded into `BrowserSession` | +| `Controller()` + `@controller.action` ; `controller=` | `Tools()` + `@tools.action` ; `tools=` | +| custom-action param `browser: Browser` | `browser_session: BrowserSession` | +| `cdp_url` inside `BrowserConfig` | `cdp_url` on `Browser` / `BrowserProfile` | + +--- + +## Version notes (read before translating) + +- **Stagehand v3 moved the AI methods onto the instance.** It is `stagehand.act(...)`, + `stagehand.extract(...)`, `stagehand.observe(...)` — **not** `page.act(...)`. The page object + (for `goto`, locators, etc.) is `stagehand.context.pages()[0]`. Most v2 examples online still + show `page.act()` / `stagehand.page` — do not copy them verbatim. +- **Models are `"provider/model"` strings** (e.g. `"anthropic/claude-sonnet-4-6"`), set via the + `model` constructor field. Prefer the string form; the object form + `{ modelName, ...clientOptions }` still exists for passing client options, but the bare string is + the idiomatic v3 path. +- **Page settling:** use `page.waitForLoadState("domcontentloaded")` / `"load"`, **not + `"networkidle"`** (it times out on Google/analytics/long-poll pages). +- **Agent `output` schema requires `experimental: true`** on the constructor and must be a zod + *object* (not a top-level array). See the "Agent `output` gotchas" callout in §2; prefer the + agent-then-`extract` pattern to avoid experimental mode. +- **Caching:** `cacheDir` (local) and `serverCache` (Browserbase, default on) replace v2's + `enableCaching`. `domSettleTimeoutMs` → `domSettleTimeout`. +- **zod is a peer dependency** (v3 or v4): the consuming project must `npm install zod`. +- Always confirm exact signatures against the installed version: . +- browser-use model strings move fast; the **class names and `model=` parameter are stable**, the + exact model ids are not. Pin them per the team's installed version. diff --git a/skills/browser-use-to-stagehand/references/determinism.md b/skills/browser-use-to-stagehand/references/determinism.md new file mode 100644 index 0000000..93796db --- /dev/null +++ b/skills/browser-use-to-stagehand/references/determinism.md @@ -0,0 +1,139 @@ +# Choosing the right determinism level + +The single most important judgment in a browser-use → Stagehand migration is **how much AI to +keep at each step**. browser-use is agentic-by-default: the LLM decides every action, every run. +Stagehand lets you dial that down wherever the flow is known. This file is the decision framework +the `/browser-use-to-stagehand` skill applies. + +--- + +## The spectrum (most → least agentic) + +| Level | Stagehand surface | Use when | Cost / reliability | +|---|---|---|---| +| **1. Autonomous** | `stagehand.agent().execute("…")` | Task is open-ended; the path isn't known ahead of time; exploration. | Highest token cost, lowest determinism, hardest to debug. Closest to browser-use. | +| **2. Per-step AI** | `stagehand.act("…")`, `stagehand.extract("…", schema)` | You know the *steps* but not the exact selectors; pages change often. | One LLM call per action. Moderate cost. Each step is inspectable. | +| **3. Observe → act (cached)** | `observe("…")` → replay `act(action)` | You know the steps and want to resolve + reuse concrete selectors. | `act(action)` makes **no LLM call**. Fast, repeatable. | +| **4. Self-heal + cache** | `selfHeal: true`, `cacheDir`, `serverCache` | Production runs that should replay deterministically but recover when the DOM drifts. | Cheapest steady-state; AI only re-engages on a cache miss/break. | +| **5. Navigation (no AI)** | `page.goto(url)`, `page.url()` on the Stagehand page | Loading a known URL or reading the current location. | No AI, no cost. Element interactions live at Levels 2–4, not here. | + +A good migration is usually a **mix**: `page.goto` for navigation, cached `observe`→`act` for the +repeatable skeleton (known forms, nav), per-step `act`/`extract` for the parts that genuinely vary, +and `agent()` reserved for the one open-ended stretch (if any). Determinism comes from Stagehand's +own caching — not from hand-written selectors. + +--- + +## Decision tree (apply per browser-use step / sub-task) + +``` +Navigating to a known URL? +├─ YES → page.goto(url) (Level 5, no AI) +└─ NO → Reading structured data off the page? + ├─ YES → extract("…", schema) (Level 2) + └─ NO → It's an on-page action (click / type / select). + Will this exact step repeat / need deterministic replay? + ├─ YES → observe("…") once, persist it, replay with act(action) (Level 3/4, no LLM on replay) + └─ NO → Is the route genuinely open-ended (LLM must decide)? + ├─ YES → stagehand.agent().execute(...) (Level 1) + └─ NO → act("natural-language instruction") (Level 2) +``` + +Reading data off a page is always `extract("…", schema)` (Level 2) — there is no reason to use a +full agent just to read structured data. + +--- + +## How to read a browser-use script + +A single `Agent(task="…")` usually hides several sub-tasks inside one natural-language prompt. +Split the task string into its implied steps, then place each step on the spectrum. + +> **browser-use:** `Agent(task="Go to the store, search for 'wireless mouse', add the cheapest to cart, and checkout with my saved card")` + +Decomposes to: +1. "Go to the store" → known URL → `page.goto(...)` (Level 5) +2. "search for 'wireless mouse'" → known field, varying markup → `act("search for 'wireless mouse'")` (Level 2) +3. "add the cheapest to cart" → needs reading + a decision → `extract` the prices, pick min in code, then `act` (Level 2 + plain code) +4. "checkout with saved card" → sensitive, repeatable → `observe`→`act` + `variables`, or a Browserbase Context (Level 3/4) + +This is the core value of the migration: what was one opaque agent run becomes an inspectable, +mostly-deterministic pipeline — with AI used only where the page is actually unpredictable. + +--- + +## When to KEEP `agent()` + +Don't force decomposition where it doesn't fit. Keep an autonomous agent when: +- The task is **exploratory** ("find the contact email anywhere on this site"). +- The **number/order of steps is unknown** at authoring time. +- The script is a **one-off** where authoring a deterministic pipeline isn't worth it. +- You're doing a **first-pass lift-and-shift** and want behavior parity before optimizing. + +Even then, tighten it: set `maxSteps`, pass a `systemPrompt` with constraints, use `output` for a +typed result (**requires `experimental: true` on the constructor**, and `output` must be a zod +*object*; to avoid experimental mode, prefer running the agent then a separate `extract`), and +consider `executionModel` (a cheaper model for the agent's inner act/observe calls). For +computer-use-style visual tasks, `mode: "cua"`; otherwise the default `mode: "dom"`. + +--- + +## The observe → act caching pattern (Level 3) + +`observe()` turns a natural-language instruction into concrete `Action` objects (selector + method ++ args). Feeding an `Action` back into `act()` **executes it without another LLM call** — that's +how you get repeatability. + +```typescript +// Resolve once (one LLM call) +const [loginButton] = await stagehand.observe("the login button"); + +// Replay deterministically (no LLM call) — persist `loginButton` to reuse across runs +if (loginButton) { + await stagehand.act(loginButton); +} +``` + +Combine with caching for production: +```typescript +const stagehand = new Stagehand({ + env: "BROWSERBASE", + selfHeal: true, // re-resolve with AI only if a cached selector breaks + cacheDir: "./stagehand-cache", + // serverCache: true, // default on under BROWSERBASE; key = instruction + page + options +}); +// inspect with result.cacheStatus === "HIT" | "MISS" +``` + +--- + +## Best practices for deterministic runs + +From Stagehand's own guidance — bake these into rewrites: +- **Wait for the page to settle** before an AI snapshot: `await page.waitForLoadState("domcontentloaded")` + (or `"load"`). **Avoid `"networkidle"`** — it never fires on sites with continuous background + traffic (Google, analytics, long-poll/websocket apps) and will throw a 15s timeout. When a specific + element matters, wait for it explicitly instead of a global load state. +- **Scope** extractions/observations with `selector` (CSS or xpath) to cut noise and cost: + `extract("…", schema, { selector: "//main" })`. +- **Lock the viewport** so cached selectors stay valid: `await page.setViewportSize(1280, 720)` + (v3 takes **positional** args `setViewportSize(width, height)`, not Playwright's `{ width, height }` object). +- **Use `variables`** so different inputs share one cache entry (and keep secrets out of prompts). +- **Anchor prompts to visible UI labels** ("click the *Sign in* button"), not internal structure. +- **Iterating an extracted list** (browser-use's "open the links one by one" / loop-an-action + patterns): `extract` the list first, then loop in plain TypeScript — there is no AI in the loop. + Resolve relative hrefs to absolute before navigating: `new URL(href, page.url()).toString()` (a + bare `page.goto("/foo")` or `goto("")` throws `Cannot navigate to invalid URL`). Wrap each + iteration in try/catch so one dead link doesn't abort the run. + +--- + +## What this buys the team (put this in the migration summary) + +- **Determinism & debuggability** — each step is explicit and inspectable, vs one opaque agent loop. +- **Cost** — fewer/cheaper LLM calls; cached steps cost nothing. +- **Reliability** — self-heal recovers from DOM drift without a full re-plan. +- **Control** — secrets via `variables`, auth via Contexts, network controls via Browserbase. + +The honest trade-off: decomposition is **more upfront authoring** than a single `task=` string. +Recommend it where the flow is known and the script runs repeatedly; keep `agent()` where it isn't. diff --git a/skills/browser-use-to-stagehand/references/guide.md b/skills/browser-use-to-stagehand/references/guide.md new file mode 100644 index 0000000..14f5e9f --- /dev/null +++ b/skills/browser-use-to-stagehand/references/guide.md @@ -0,0 +1,242 @@ +# Migrating from browser-use to Stagehand on Browserbase + +A guide for teams moving browser automation from **browser-use** (Python) to **Stagehand** +(TypeScript) running on **Browserbase**. It focuses on mapping features and choosing the right +level of determinism — not a rigid line-by-line transpile, because real migrations differ by team, +tooling, and how much autonomy each workflow actually needs. + +If you want an agent to do the rewriting for you, pair this guide with the **`/browser-use-to-stagehand` skill**, +which applies the mappings and determinism framework below to your actual scripts. + +--- + +## TL;DR + +- **browser-use** is *agentic by default*: an LLM decides every action on every run. Great for + exploration; harder to make deterministic, cheap, and debuggable. +- **Stagehand** gives you a *spectrum*: cached `observe`→`act` for stable, repeatable steps, + `act`/`extract` for the parts that vary, and a full `agent()` only when the path is open-ended. +- **Browserbase** is the cloud runtime under both. Stagehand in `env: "BROWSERBASE"` manages the + session for you, and unlocks Contexts (persistent auth), proxies, stealth, and observability. +- The migration is a **refactor with judgment**: decide, per step, how much AI you actually need. + The payoff is determinism, lower cost, and debuggability — at the price of more upfront authoring. + +--- + +## 1. The philosophy shift + +| | browser-use | Stagehand | +|---|---|---| +| Default control model | Fully agentic — LLM picks each action | You choose, per step, how much AI to use | +| Unit of work | A natural-language `task` | Primitives: `act`, `extract`, `observe`, `agent` | +| Page perception | Indexed DOM/accessibility tree (vision optional) | Targeted per-call; DOM by default, vision in agent `hybrid`/`cua` modes | +| Determinism | Hard — every run re-reasons | A dial: from cached, replayable actions to full agent | +| Best at | Open-ended exploration, prototyping | Production flows you want repeatable and cheap | +| Language | Python | TypeScript (also Python) | + +Teams usually migrate because an agentic script that was perfect for a demo becomes expensive, +flaky, or impossible to debug in production. The fix isn't "a better agent" — it's *using AI only +where the page is actually unpredictable* and making everything else deterministic. + +--- + +## 2. Mental model: the loop vs. the primitives + +**browser-use** runs a perceive → decide → act loop. Each step it snapshots the page into an +indexed element tree, asks the LLM which action to take, executes it, and repeats until the task is +done or `max_steps` is hit. One `Agent(task="…")` can hide a dozen decisions. + +**Stagehand** exposes four primitives you compose yourself: + +- **`act(instruction)`** — perform one action ("click the login button"). One LLM call. +- **`extract(instruction, schema)`** — pull structured (zod-typed) data off the page. +- **`observe(instruction)`** — resolve an instruction into concrete actions *without doing them*; + replay them later with no LLM call (the determinism trick). +- **`agent().execute(instruction)`** — the full autonomous loop, when you genuinely need it. + +Navigation to known URLs is just `page.goto()` on the Stagehand page (no AI), and you lock in +repeatable steps by caching an `observe()` result and replaying it with `act()`. The migration is +mostly deciding which primitive each browser-use step becomes. + +--- + +## 3. Feature mapping + +The most-used mappings (see the skill's +[`api-mapping.md`](api-mapping.md) for the exhaustive table and +before/after code): + +| browser-use | Stagehand v3 / Browserbase | +|---|---| +| `Agent(task=…)` + `agent.run()` | Decompose into `act`/`extract`/`observe` when the flow is known; else `stagehand.agent().execute(…)` | +| `llm=ChatAnthropic(model="claude-sonnet-4-6")` | `new Stagehand({ model: "anthropic/claude-sonnet-4-6" })` | +| `output_model_schema=PydanticModel` | `extract("…", zodSchema)` | +| `history.final_result()` / `.structured_output` | `extract(...)` return / `result.output` | +| `Browser()` (local) | `new Stagehand({ env: "LOCAL", localBrowserLaunchOptions })` | +| `Browser(cdp_url=session.connect_url)` | `new Stagehand({ env: "BROWSERBASE" })` | +| `sensitive_data={…}` | `act("…%key%…", { variables: { key } })` | +| `storage_state` / `user_data_dir` | Browserbase **Context** (`browserSettings.context: { id, persist: true }`) | +| proxies / stealth / captcha | `browserbaseSessionCreateParams` (`proxies`, `browserSettings.advancedStealth`, `solveCaptchas`) | +| `@tools.action` custom action | plain TS (deterministic), or `agent({ tools })` (agent capability) | +| `max_steps` | `agent().execute({ maxSteps })` | +| `allowed_domains` | ⚠️ no direct equivalent — see Gotchas | + +--- + +## 4. The determinism spectrum (the important part) + +Every browser-use step lands somewhere on this spectrum. Choosing well is the whole game. (The +skill's [`determinism.md`](determinism.md) has the decision tree.) + +| Level | Stagehand | Use when | Cost / determinism | +|---|---|---|---| +| **1. Autonomous** | `agent().execute("…")` | Path is open-ended / unknown | Highest cost, lowest determinism (≈ browser-use) | +| **2. Per-step AI** | `act("…")`, `extract("…", schema)` | Steps known, markup varies | One call/step; inspectable | +| **3. Observe → act** | `observe()` then `act(action)` | Known steps you want to replay | `act(action)` makes **no** LLM call | +| **4. Self-heal + cache** | `selfHeal`, `cacheDir`, `serverCache` | Production replay that tolerates DOM drift | Cheapest steady state | +| **5. Navigation (no AI)** | `page.goto(url)`, `page.url()` | Loading a known URL (element interactions are Levels 2–4) | No AI, no cost | + +A healthy rewrite is a **mix**: `page.goto` for navigation + cached `observe`→`act` for the +repeatable skeleton + per-step `act`/`extract` for the variable bits + a Context for auth. Reserve +`agent()` for the one genuinely open-ended stretch, if any. + +> **Example decomposition.** browser-use: +> `Agent(task="Go to the store, search 'wireless mouse', add the cheapest to cart, checkout with my saved card")` +> becomes: `page.goto(store)` (L5) → `act("search 'wireless mouse'")` (L2) → `extract` prices, pick +> the min in code, `act("add to cart")` (L2 + code) → checkout via a Browserbase **Context** so the +> card/session is already authenticated (L3/4). One opaque agent run → an inspectable pipeline. + +--- + +## 5. A recommended migration path + +You don't have to rewrite everything at once. A low-risk sequence: + +1. **Lift-and-shift onto Browserbase (no rewrite).** Point the existing browser-use script at a + Browserbase session via `cdp_url=session.connect_url`. You immediately gain observability, + proxies, and stealth, and you establish a behavior baseline. *(Optional but recommended first + step — it de-risks everything after.)* +2. **Observe (optional, for opaque scripts).** Run it once on Browserbase and read the **Session + Logs API** (`sessions.logs.list`) to see the real navigations and network calls. Use the video + recording for human QA. See the skill's + [`trace-assisted.md`](trace-assisted.md). +3. **Rewrite incrementally in Stagehand.** Translate the skeleton — navigation to `page.goto`, + repeatable steps to cached `observe`→`act` — the variable steps to `act`/`extract`, and only the + open-ended parts to `agent()`. Move auth to a **Context**, secrets to `variables`. +4. **Validate against the baseline.** Run the Stagehand version on Browserbase and compare logs and + end state to step 1. Reuse the same Context so you're comparing like with like. +5. **Harden for production.** Turn on `selfHeal` + caching, pin models, scope extracts with + `selector`, lock the viewport, wait for `domcontentloaded` (never `networkidle`) before AI snapshots. + +--- + +## 6. A worked example + +**Before — browser-use (Python)** +```python +from pydantic import BaseModel +from browser_use import Agent, ChatOpenAI + +class Story(BaseModel): + title: str + points: int + +class Stories(BaseModel): + stories: list[Story] + +agent = Agent( + task="Go to Hacker News and return the top 5 stories with their title and points", + llm=ChatOpenAI(model="gpt-5"), + output_model_schema=Stories, +) +history = await agent.run() +print(history.structured_output) +``` + +**After — Stagehand v3 (TypeScript) on Browserbase** +```typescript +import "dotenv/config"; +import { Stagehand } from "@browserbasehq/stagehand"; +import { z } from "zod"; + +async function main() { + const stagehand = new Stagehand({ env: "BROWSERBASE", model: "openai/gpt-5" }); + await stagehand.init(); + try { + const page = stagehand.context.pages()[0]; + await page.goto("https://news.ycombinator.com"); // deterministic + + const stories = await stagehand.extract( // structured read + "extract the top 5 stories with their title and points", + z.array(z.object({ title: z.string(), points: z.number() })), + ); + console.log(stories); + } finally { + await stagehand.close(); + } +} + +main().catch((err) => { console.error(err); process.exit(1); }); +``` + +The navigation that the agent used to "decide" is now an explicit `page.goto`, and the data read is +a single typed `extract` instead of a full agent loop — faster, cheaper, and deterministic. See +[`EXAMPLES.md`](../EXAMPLES.md) for more before/after pairs (simple task, structured extraction, login). + +--- + +## 7. What you gain from Browserbase + +These are often the real reason to migrate, beyond determinism: + +- **Contexts** — persist auth/cookies across runs; log in once, reuse everywhere. The biggest + reliability win in most migrations (no more brittle login flows every run). +- **Proxies** — residential/datacenter, with geo and domain rules. +- **Stealth & Verified Sessions** — maintained fingerprints for sites that fight automation. +- **Captcha solving** — on by default. +- **Observability** — Session Replay (video), Session Inspector, and the Logs API (CDP + network/console/lifecycle events) for debugging and validation. + +All reachable from Stagehand via `browserbaseSessionCreateParams`. + +--- + +## 8. Gotchas & version notes + +- **Stagehand v3 is a rewrite.** AI methods are on the **instance** (`stagehand.act/extract/observe`), + not the page; get the page via `stagehand.context.pages()[0]`; models are `"provider/model"` + strings. Most examples online are v2 (`page.act()`, `stagehand.page`) — don't copy them. See the + [v2→v3 migration guide](https://docs.stagehand.dev/v3/migrations/v2). +- **browser-use has three API shapes** — legacy (pre-0.12), stable (0.12.x), and the 0.13 Rust beta + (`browser_use.beta`). Identify which the source uses before translating; class names differ + (`Browser` ≡ `BrowserSession`, `Controller` → `Tools`). +- **`allowed_domains` has no Stagehand equivalent.** browser-use can hard-block off-domain + navigation (and pairs it with `sensitive_data` as a security boundary). In Stagehand, constrain + the agent via `systemPrompt`, use Browserbase proxy domain rules, or check `page.url()` in code — + and treat it as a deliberate review item, not an automatic drop. +- **Don't translate everything into `agent()`.** That just reproduces browser-use's + non-determinism. Decompose where the flow is known. +- **Secrets** — use `variables` + environment variables; never hardcode. Prefer Contexts for auth. + +--- + +## 9. Resources + +**Stagehand** +- Docs: +- act / extract / observe / agent: +- Configuration (browser, models): +- Caching & determinism: +- v2 → v3 migration: + +**Browserbase** +- Docs: +- Stagehand on Browserbase: +- Sessions & CDP: +- Contexts: +- Observability (logs/replay): + +**browser-use** +- Docs: +- Remote browser / CDP: +- Browserbase integration: diff --git a/skills/browser-use-to-stagehand/references/prompt.md b/skills/browser-use-to-stagehand/references/prompt.md new file mode 100644 index 0000000..3b4df89 --- /dev/null +++ b/skills/browser-use-to-stagehand/references/prompt.md @@ -0,0 +1,161 @@ + + +# Migrate browser-use → Stagehand on Browserbase (AI prompt) + +A copy-pasteable prompt that turns a **browser-use** (Python) script into a **Stagehand v3** +(TypeScript) script on **Browserbase** — choosing the right level of determinism per step instead +of producing a one-to-one agentic copy. + +**How to use** +1. Copy everything below the line (or share this raw file's URL). +2. Paste it into your AI coding assistant. +3. Add your browser-use script(s). +4. Review the generated Stagehand code and migration summary against your real site — tighten the + `act(...)` prompts to the actual on-page labels, and confirm any flagged items. + +> Prefer it as a one-command tool inside Claude Code? The same logic ships as the `/browser-use-to-stagehand` +> skill. This prompt is the universal, tool-agnostic form. + +--- + +You are migrating a **browser-use** (Python) browser-automation script to **Stagehand v3** +(TypeScript) running on **Browserbase**. Produce idiomatic, runnable Stagehand v3 code plus a +migration summary. This is a refactor with judgment, not a line-by-line transpile. + +## Core principle + +browser-use is agentic-by-default: an LLM decides every action on every run. Stagehand lets you +choose how much AI to use at each step. A good migration replaces opaque agent loops with an +inspectable, mostly-deterministic pipeline — using AI only where the page is genuinely +unpredictable. The payoff is determinism, lower cost, and debuggability. + +## Stagehand v3 — get these exactly right + +Most training data and blog posts show Stagehand **v2**. Use **v3**: + +- **Construct & lifecycle:** + ```typescript + import "dotenv/config"; + import { Stagehand } from "@browserbasehq/stagehand"; + import { z } from "zod"; + + const stagehand = new Stagehand({ env: "BROWSERBASE", model: "anthropic/claude-sonnet-4-6" }); + await stagehand.init(); + try { + // ... work ... + } finally { + await stagehand.close(); + } + ``` +- **Page object:** `const page = stagehand.context.pages()[0];` — **not** `stagehand.page`. +- **AI methods are on the instance:** `stagehand.act(...)`, `stagehand.extract(...)`, + `stagehand.observe(...)` — **not** `page.act(...)`. +- **Models are `"provider/model"` strings:** e.g. `"anthropic/claude-sonnet-4-6"`, `"openai/gpt-5"`, + `"google/gemini-2.5-flash"`. (Pin to whatever the team uses; ids move fast.) +- **`extract` uses a zod schema**; v3 supports a top-level `z.array(...)` with no wrapper object. +- **Secrets via `variables`** — the `%token%` is sent to the LLM, the real value is substituted + locally and never leaves the machine: + ```typescript + await stagehand.act("type %username% into the email field", { + variables: { username: process.env.APP_USER! }, + }); + ``` +- **Default to `env: "BROWSERBASE"`**; show `env: "LOCAL"` (with `localBrowserLaunchOptions`) only as + the dev option. +- **Determinism/caching options:** `selfHeal: true`, `cacheDir`, `serverCache` (on by default under + BROWSERBASE). + +## The determinism spectrum — choose per step + +| Level | Stagehand | Use when | +|---|---|---| +| 1. Autonomous | `stagehand.agent().execute("…")` | Path is genuinely open-ended/unknown. | +| 2. Per-step AI | `act("…")`, `extract("…", schema)` | Step is known but markup varies. | +| 3. Observe → act | `const [a] = await stagehand.observe("…"); if (a) await stagehand.act(a);` | Known step you want to resolve once and replay (the `act(a)` call makes no LLM call). | +| 4. Self-heal + cache | `selfHeal`, `cacheDir`, `serverCache` | Production replay that should recover from DOM drift. | +| 5. Navigation (no AI) | `page.goto(url)`, `page.url()` | Loading a known URL or reading the current location. No LLM call. (Element interactions are Levels 2–4.) | + +**Decision rule:** split each browser-use `task="…"` string into its implied ordered steps, then +place each on the spectrum. **Default to decomposition (levels 2–5)** when the flow is known; keep +`agent()` (level 1) only for genuinely open-ended tasks (tighten it with `maxSteps`, a `systemPrompt`, +and `output` for typed results). **Reading data is always `extract`**, never a full agent. + +Example: `task="Go to the store, search 'wireless mouse', add the cheapest to cart, checkout with +my saved card"` → `page.goto(store)` (L5) → `act("search 'wireless mouse'")` (L2) → `extract` the +prices + pick the min in code + `act("add to cart")` (L2) → checkout via a Browserbase **Context** +so auth is already present (L3/4). + +## Feature mapping + +| browser-use | Stagehand v3 / Browserbase | +|---|---| +| `Agent(task=…)` + `agent.run()` | Decompose into `act`/`extract`/`observe` when the flow is known; else `stagehand.agent().execute(…)` | +| `llm=ChatAnthropic(model="claude-sonnet-4-6")` | `new Stagehand({ model: "anthropic/claude-sonnet-4-6" })` | +| `llm=ChatOpenAI(model="gpt-5")` / `ChatGoogle(...)` | `model: "openai/gpt-5"` / `"google/gemini-2.5-flash"` | +| `output_model_schema=PydanticModel` | `stagehand.extract("…", zodSchema)` (preferred); or `agent().execute({ output: zodObjectSchema })` — needs `experimental: true`, zod **object** only (see ⚠️ below) | +| `history.final_result()` / `.structured_output` | `extract(...)` return / `result.output` | +| `Browser()` (local) | `new Stagehand({ env: "LOCAL", localBrowserLaunchOptions })` | +| `Browser(cdp_url=session.connect_url)` (Browserbase) | `new Stagehand({ env: "BROWSERBASE" })` (Stagehand manages the session) | +| `sensitive_data={…}` | `act("…%key%…", { variables: { key } })` | +| `storage_state` / `user_data_dir` | Browserbase **Context**: `browserbaseSessionCreateParams.browserSettings.context: { id, persist: true }` | +| proxies / stealth / captcha / region | `browserbaseSessionCreateParams` (`proxies`, `browserSettings.advancedStealth`, `solveCaptchas`, `region`) | +| `@tools.action` (deterministic side-effect) | plain TypeScript | +| `@tools.action` (capability the agent must choose) | `stagehand.agent({ tools: { name: tool({ description, inputSchema: z.object({…}), execute }) } })` — `tool` from the **`ai`** package (pin **`ai@^5`**); needs `experimental: true` (see ⚠️ below) | +| `page_extraction_llm=…` | `extract("…", schema, { model })` | +| `planner_llm=…` + main `llm=…` | `agent({ model, executionModel })` | +| `max_steps` | `agent().execute({ maxSteps })` | + +> ⚠️ **Experimental gate:** agent `output`, custom `tools`, and MCP `integrations` each require +> `experimental: true` on the `Stagehand` constructor (it bypasses the managed API path). For a typed +> result from an agentic run, prefer running the agent then a separate `stagehand.extract(...)`. + +## No clean equivalent — flag these in the summary + +- **`allowed_domains`** — Stagehand has no domain firewall, and it's often a security boundary + (it pairs with `sensitive_data`). Mitigate with a `page.url()` host check before sensitive + actions, a `systemPrompt` constraint (for agents), or Browserbase proxy domain rules. **Never drop + it silently** — flag it as needs-review. +- **`max_actions_per_step`, `use_thinking`, `flash_mode`** — no direct equivalent; decomposition + makes steps explicit. For speed, use a fast model (`google/gemini-2.5-flash`) + decomposition. +- **`initial_actions`** — becomes ordinary code that runs before the first AI call. + +## browser-use variants to recognize (translate the same way) + +- **Legacy (pre-0.12):** `Browser(config=BrowserConfig(...))`, `BrowserContext`, `Controller` / + `@controller.action`. Normalize names first. +- **Stable (0.12.x):** `Browser(browser_profile=BrowserProfile(...))`, `Tools()` / `@tools.action`; + `Browser` ≡ `BrowserSession`. (Most scripts.) +- **Rust beta (0.13.x):** imports from `browser_use.beta`. Same public surface. + +## Output + +1. **The Stagehand v3 TypeScript** — runnable, with a `package.json` (deps: `@browserbasehq/stagehand`, + `zod`, `dotenv`; add `ai` only if a custom action maps to an agent `tool`) and the required `.env` + keys (`BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`, the provider key matching the model, plus + any app secrets). +2. **A migration summary:** + - **Variant detected** and the import/class tells. + - **Determinism choices per step** (a short table: each browser-use step → Stagehand surface → + level → one-line reasoning). + - **Needs human review** — lost `allowed_domains` guardrails, custom-action logic, placeholder + URLs/labels, anything ambiguous. + - **Recommended next step** — usually a Browserbase **Context** for auth reuse, then `selfHeal` + + caching for production. + +## Self-check before finishing + +- AI methods on the **instance** (`stagehand.act/extract/observe`), page via + `stagehand.context.pages()[0]`. +- `model` is a `"provider/model"` string; the matching provider key is in `.env`. +- `extract` uses a zod schema; secrets use `variables` + `process.env`; nothing hardcoded. +- `init()` / `close()` present (`close()` in a `finally`). +- Every browser-use step is placed deliberately on the determinism spectrum; `allowed_domains` is + not silently dropped. + +If no script was provided, ask for the browser-use script before proceeding. Now migrate the +browser-use script provided by the user. diff --git a/skills/browser-use-to-stagehand/references/trace-assisted.md b/skills/browser-use-to-stagehand/references/trace-assisted.md new file mode 100644 index 0000000..df343c9 --- /dev/null +++ b/skills/browser-use-to-stagehand/references/trace-assisted.md @@ -0,0 +1,124 @@ +# Trace-assisted migration (optional, advanced) + +A static read of a browser-use script tells you *what it was asked to do*. Running it once on +Browserbase tells you *what it actually did* — the real URLs, network calls, and page transitions. +For opaque or flaky scripts, that observed behavior makes the Stagehand rewrite far more accurate. + +This is the optional path. Skip it for simple scripts; reach for it when the source is a large +`agent(task="…")` blob whose actual flow is unclear, or when a migration's first rewrite doesn't +reproduce the original behavior. + +> **Sibling skill:** this repo's [`browser-trace`](../../browser-trace/SKILL.md) productizes CDP +> trace capture — the full DevTools firehose, screenshots, and DOM dumps, bisected into per-page +> searchable buckets, and able to attach to a live Browserbase session. Use it to capture the +> browser-use run, then feed the per-page summaries into the rewrite. The Session Logs API below is +> the lighter-weight alternative when you just need navigations + network calls. + +--- + +## Important reframe: what the "trace" actually is + +The original idea was "attach a browser-trace as a CDP listener." On Browserbase, the +machine-readable trace is the **Session Logs API**, not the session recording: + +- **Session Logs API** — `bb.sessions.logs.list(sessionId)` returns **CDP events**: network + activity, console logs, and page lifecycle (navigations). **This is the structured trace** you + parse to reconstruct the flow. +- **Session Recording / Replay** — now H.264 video (HLS segments), *not* an rrweb/DOM event + stream. Great for **human QA** ("did it do the right thing?"), not for programmatic diffing. + +So: **logs to drive the rewrite, video to eyeball it.** Don't promise a DOM-event stream from the +recording — it isn't that anymore. + +--- + +## The workflow + +### Step 1 — Run the existing browser-use script on Browserbase, unmodified + +browser-use connects to any remote browser via `cdp_url`, and Browserbase is officially supported. +Point the existing script at a Browserbase session; record it and tag it for later retrieval. + +```python +import os +from browserbase import Browserbase +from browser_use import Agent, Browser, BrowserProfile, ChatAnthropic + +bb = Browserbase(api_key=os.environ["BROWSERBASE_API_KEY"]) +session = bb.sessions.create( + project_id=os.environ["BROWSERBASE_PROJECT_ID"], + browser_settings={"record_session": True}, # keep recording on (default) + user_metadata={"framework": "browser-use", "migration": "true"}, +) +print(f"Session: https://www.browserbase.com/sessions/{session.id}") + +# The ONLY change to the original script: hand it the Browserbase CDP endpoint +browser = Browser(browser_profile=BrowserProfile(cdp_url=session.connect_url)) + +agent = Agent(task="", llm=ChatAnthropic(model="claude-sonnet-4-6"), browser=browser) +await agent.run() +await browser.stop() +``` + +Notes: +- You must connect promptly — a new session terminates if nothing connects within ~5 minutes. +- Configure proxies/stealth/region on the **Browserbase session**, not on browser-use. +- Keep `session.id` — it's how you pull logs and the recording next. + +### Step 2 — Pull the trace after the run + +```python +logs = bb.sessions.logs.list(session.id) # CDP network / console / lifecycle events +recording = bb.sessions.recording.retrieve(session.id) # video (for human QA) +``` +TypeScript equivalent with `@browserbasehq/sdk`: +```typescript +const logs = await bb.sessions.logs.list(sessionId); +``` + +From the logs, extract the spine of the flow: +- **Navigations** (page lifecycle events) → the `page.goto(...)` calls in the rewrite, and the URLs + to anchor each phase. +- **Network calls** → which requests actually mattered (and whether some data could be read from an + API response instead of scraped — sometimes a cleaner rewrite). +- **Console errors** → fragile spots to handle explicitly. + +You can also open the **Session Inspector** in the dashboard for an Events/Pages timeline and, if +the run used Stagehand, act/extract results and token usage. + +### Step 3 — Author the Stagehand rewrite from observed behavior + +Map what you observed onto the determinism spectrum (see [determinism.md](determinism.md)): +- Observed fixed navigations → deterministic `page.goto(...)`. +- Observed interactions on varying markup → `act("…")`, or `observe()`→`act()` for repeatable steps. +- Observed data reads → `extract("…", schema)`. +- Reuse the same `browserbaseSessionCreateParams` (region/proxies/context) so behavior matches the + baseline run. + +### Step 4 — Validate against the baseline + +Run the Stagehand rewrite under `env: "BROWSERBASE"` and compare its logs/recording to the +browser-use baseline: same navigations, same end state, same extracted data. Reuse the same +**Context** so auth carries over and you're comparing like with like. + +--- + +## When to use this vs. skip it + +| Use trace-assisted | Skip (static rewrite is enough) | +|---|---| +| One giant `agent(task="…")` whose real flow is unclear | Script already reads as clear, ordered steps | +| Script is flaky / behavior varies run to run | Deterministic, well-understood script | +| First static rewrite doesn't reproduce the original | Simple task / extraction | +| Auth, redirects, or heavy network make the path opaque | No login, single domain | + +--- + +## Caveats + +- The recording is **video**, not DOM events — use the **Logs API** for anything programmatic. +- Logs are retrieved **after** the run completes. +- A live CDP listener attaches on *your* side of the connection and suits JS/TS automation; for a + Python browser-use run, prefer the post-run Logs API (or the `browser-trace` skill) over wiring up + live listeners. +- Running the script incurs real LLM + browser cost — it's a deliberate, opt-in step.