Add browser-use-to-stagehand skill: migrate browser-use to Stagehand on Browserbase by rcbrowder · Pull Request #130 · browserbase/skills

rcbrowder · 2026-06-09T02:16:54Z

Converts browser-use (Python) automation to Stagehand v3 (TypeScript) on Browserbase, choosing the right level of determinism per step rather than a one-to-one agentic copy.

SKILL.md: detect the browser-use variant, decompose the task across the determinism spectrum, emit Stagehand v3 + a migration summary
references/: full API mapping, determinism decision framework, and an optional trace-assisted path (pairs with the browser-trace skill)
guide.md: human migration guide (philosophy, feature mapping, determinism)
prompt.md: tool-agnostic docs prompt (works in any AI assistant)
EXAMPLES.md: before/after pairs
README.md (root): add to the skills table

Targets Stagehand v3, validated against @browserbasehq/stagehand 3.6.0 / browser-use 0.13.1.

Hardening + validation during review

The original skill was validated by a live end-to-end eval and hardened across several commits. The eval converts real browser-use scripts through the skill (subagents reading only the skill), then runs the generated Stagehand on live Browserbase and compares against the originals + ground truth.

Tier-1 (9 browser-use example scripts): 9/9 compile; live runs 3 PASS / 3 PARTIAL / 3 FAIL → 6 PASS / 2 PARTIAL / 0 FAIL after fixes:

networkidle → domcontentloaded (times out on Google/SPA pages)
agent output needs experimental: true + zod object (recommend agent-then-extract)
setViewportSize(w, h) is positional, not { width, height }
variant table: ChatBrowserUse is not a Rust-beta tell (only a browser_use.beta import is)
"iterate an extracted list / resolve relative URLs" pattern

Tier-2 (real-world: agent-tools, MCP, embedded apps — LangManus / BLAST): 5/5 handled correctly (4 convert + tsc-clean, MCP-server correctly flagged out-of-scope). Also caught:

ai must be pinned to v5 — Stagehand 3.6.x types agent tools as the v5 ToolSet (inputSchema); v4's tool() emits parameters and won't compile
§3.7 MCP mapping (client → integrations via connectToMCPServer/URL; server-direction = out of scope)
§3.8 real-world patterns (embedded/wrapped code, sync→async, stateful executors → messages, vision → mode: hybrid/cua, legacy result.final_result)
"when NOT to migrate" scope gate in SKILL.md

Doc-resilience (keep-but-demote): the API specifics drift every release (the variant table was already stale), so the tables are kept but demoted under a version-provenance header + "live docs supersede this snapshot" + a "verify signatures against the installed package" workflow step.

Naming: renamed bu-to-bb → browser-use-to-stagehand (verified against the Anthropic Agent Skills naming/discovery docs).

All 5 Cursor Bugbot findings addressed + threads resolved. validate CI green. The eval harness doubles as a drift detector to re-run on each Stagehand / browser-use release.

Known limitations (by design, not defects)

2 tier-1 partials: case03 relative-URL loop; case04 HF "likes" reading as downloads (extraction fidelity, not API).
The skill's own trace-assisted path and prompt.md were not run-tested.
MCP stdio is mapped correctly but hits an upstream Stagehand 3.6.0 serialization bug (fixed by [STG-2405] fix: agent({ integrations }) with an MCP Client throws circular-structure JSON stagehand#2278); remote/URL MCP is unaffected.

Note

Low Risk
Documentation and agent skill content only; no production code, auth, or runtime behavior changes in the repo.

Overview
Adds a new browser-use-to-stagehand agent skill and registers it in the root README skills table.

The skill guides converting browser-use (Python) scripts to Stagehand v3 (TypeScript) on Browserbase by decomposing flows across a determinism spectrum (page.goto, act/extract/observe, cached replay, agent() only when needed) instead of a one-to-one agent port. SKILL.md defines scope gates (e.g. MCP server mode out of scope), variant detection, inventory, output templates, and a migration summary checklist.

Supporting docs include references/api-mapping.md (feature table, gaps like allowed_domains, v3 gotchas), determinism.md, trace-assisted.md (Session Logs / optional browser-trace), human guide.md, pasteable prompt.md, EXAMPLES.md before/after pairs, and MIT LICENSE.txt.

^{Reviewed by Cursor Bugbot for commit 828211d. Bugbot is set up for automated code reviews on this repo. Configure here.}

Converts browser-use (Python) automation to Stagehand v3 (TypeScript) on Browserbase, choosing the right level of determinism per step rather than a one-to-one agentic copy. - SKILL.md: detect the browser-use variant, decompose the task across the determinism spectrum, emit Stagehand v3 + a migration summary - references/: full API mapping, determinism decision framework, and an optional trace-assisted path (pairs with the browser-trace skill) - GUIDE.md: human migration guide (philosophy, feature mapping, determinism) - PROMPT.md: tool-agnostic docs prompt (works in any AI assistant) - EXAMPLES.md: before/after pairs - README.md (skill): index; root README.md: add to the skills table Targets Stagehand v3 (verified against live docs); validated by running the skill on fresh scripts a clean agent had never seen. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Express determinism purely through Stagehand primitives — cached observe()→act() plus selfHeal/cacheDir/serverCache — instead of falling back to Playwright selectors. Drops all "Playwright" references and page.locator()/page.fill() calls across SKILL.md, GUIDE.md, PROMPT.md, README.md, and the determinism / trace-assisted references. page.goto/page.url stay (Stagehand's own page methods). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

shrey150 · 2026-06-11T18:33:28Z

Standards pass (see #36 / the conventions all merged skills follow):

Missing LICENSE.txt — every skill ships MIT with Copyright (c) 2026 Browserbase, Inc.; copy skills/browser/LICENSE.txt verbatim.
Extra top-level docs (README.md, GUIDE.md, PROMPT.md) in the skill dir: agents only auto-load SKILL.md, so the convention is SKILL.md + REFERENCE.md/EXAMPLES.md or a references/ dir. Suggest folding GUIDE/PROMPT into references/ and dropping the skill-local README so content isn't stranded.

A CONTRIBUTING.md + CI validator for these checks is landing shortly.

@shrey150

Per @shrey150's standards pass: - add LICENSE.txt (MIT, verbatim from skills/browser) - move GUIDE.md -> references/guide.md and PROMPT.md -> references/prompt.md, both linked from SKILL.md so the content isn't stranded - drop the skill-local README.md - fix relative links in references/guide.md and EXAMPLES.md Skill dir now follows the convention: SKILL.md + EXAMPLES.md + references/ + LICENSE.txt. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Validated by converting 9 real browser-use examples through the skill and running the generated Stagehand on live Browserbase. Fixes 3 runtime failures that typecheck clean but broke at runtime: - Page settling: networkidle -> domcontentloaded. networkidle never fires on Google/analytics/SPA pages and throws a 15s timeout (broke 2 of 9 cases). - Agent output schema: agent().execute({ output }) throws ExperimentalNotConfiguredError unless experimental:true, and output must be a zod object (not a top-level array). Document both, and recommend the agent-then-extract pattern to stay on the managed API path (broke 1 case). - setViewportSize takes positional (width, height), not Playwright's { width, height } object (would not compile). - Variant table: ChatBrowserUse is browser-use's hosted-model class and appears in stable 0.13.x code -- it is NOT a Rust-beta tell. Only a literal browser_use.beta import is. Prevents variant mis-detection. - Add an "iterate an extracted list / resolve relative URLs" pattern (new URL(href, page.url())) to avoid "Cannot navigate to invalid URL". Doc-resilience (keep the API tables, demote them): add a version-provenance header, state that the live docs supersede this snapshot on any conflict, and make "verify signatures against the installed package" an explicit workflow step -- so the skill fails safe as Stagehand / browser-use drift. Validated against @browserbasehq/stagehand 3.6.0 / browser-use 0.13.1. Eval: 9/9 compile; live Browserbase 3 PASS/3 PARTIAL/3 FAIL -> 6 PASS/2 PARTIAL/0 FAIL. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

shrey150 · 2026-06-25T18:26:15Z

Pushed 5b3c33e to this branch with fixes validated by a live end-to-end eval. Sharing the evidence since these are runtime bugs that tsc couldn't catch.

What I did

Converted 9 real browser-use/examples scripts through this skill (subagents reading only the skill — no prior Stagehand knowledge), then ran the generated Stagehand on live Browserbase and compared outcomes against the original browser-use runs + ground truth.

E2E Test Matrix (before → after the fixes)

browser-use example	exercises	before (this PR's skill)	after (`5b3c33e`)
getting_started/03_data_extraction	extract, stable ground truth	✅ PASS	✅
features/custom_output	Pydantic→zod, top-level array	✅ PASS	✅
features/small_model_for_extraction	`page_extraction_llm`→`extract({model})`	⚠️ partial	⚠️ partial (founders correct; relative-URL loop)
custom-functions/save_to_file_hugging_face	custom `@tools.action`	⚠️ unsorted	⚠️ sorted now
features/sensitive_data	`sensitive_data`→`variables`	✅ PASS (httpbin echo)	✅
features/restrict_urls	`allowed_domains` gap	❌ `networkidle` timeout	✅ runs clean
getting_started/04_multi_step_task	decompose vs agent	❌ `ExperimentalNotConfiguredError`	✅ PASS (Beautiful Soup/Scrapy/Selenium)
models/gpt-5-mini	faithful `agent()`	⚠️ no verdict	✅ real verdict
getting_started/01_basic_search	search	❌ `networkidle` timeout	✅ PASS

9/9 compile both times; live runs went 3 PASS / 3 PARTIAL / 3 FAIL → 6 PASS / 2 PARTIAL / 0 FAIL. Validated against @browserbasehq/stagehand 3.6.0 / browser-use 0.13.1.

Fixes in `5b3c33e`

networkidle → domcontentloaded (4 spots): networkidle never fires on Google/analytics/SPA pages → 15s timeout. (fixed restrict_urls, basic_search)
Agent output schema: throws ExperimentalNotConfiguredError without experimental: true, and must be a zod object (not a top-level array). Documented both + recommend agent-then-extract to stay on the managed API path. (fixed multi_step)
setViewportSize(w, h) is positional, not Playwright's { width, height } (the object form won't compile).
Variant table: ChatBrowserUse is browser-use's hosted-model class and appears in stable 0.13.x — it's not a Rust-beta tell (only a literal browser_use.beta import is). Prevents variant mis-detection.
New pattern: iterating an extracted list + resolving relative URLs (new URL(href, page.url())) to avoid Cannot navigate to invalid URL.

Doc-resilience (the durable part)

The skill's API specifics drift every Stagehand/browser-use release (the variant table was already stale). Kept the API tables but demoted them: added a version-provenance header, "live docs supersede this snapshot on conflict," and made "verify signatures against the installed package" an explicit workflow step — so the skill fails safe rather than silently emitting a dead API. The eval harness itself doubles as a drift detector to re-run on each release.

Cryptic abbreviation -> descriptive, discoverable name: - "bu-to-bb" was opaque, and "bb" collided with the (now-deprecated) bb CLI. - "bu" is also a loaded token in browser-use land -- it's the prefix for their hosted models (bu-2-0, bu-30b), not the library -- so "bu-to-*" misreads. - The name is loaded into the system prompt as discovery metadata, so a name carrying "browser-use" + "stagehand" aids triggering, not just human scanning. - Matches the repo's descriptive naming convention (competitor-analysis, browser-trace, cookie-sync). Renames the directory, frontmatter name, SKILL.md heading, README row, and all internal /bu-to-bb references. Validator passes (node scripts/validate-skills.mjs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…er-2 eval) Patches the holes a tier-2 eval found by converting real-world browser-use code (custom agent-tools, MCP client/server, and BU embedded in real app code: LangManus, BLAST) through the skill and type-checking against installed 3.6.0. Correctness: - Agent custom `tools` require `experimental: true` (same gate as agent `output` and MCP `integrations`) -- the prior tools example omitted it and would throw ExperimentalNotConfiguredError at runtime. Consolidated callout added. - `ai` must be pinned to v5: Stagehand 3.6.x types `tools` as the v5 ToolSet (schema field `inputSchema`); the v4 `tool()` helper emits `parameters` and fails to type-check. Template now pins `^5.0.0` and documents the plain-object `{ description, inputSchema, execute }` alternative. New mappings: - §3.7 MCP: browser-use MCPClient (stdio + remote) -> Stagehand `agent({ integrations })` via `connectToMCPServer(...)` / URL string, with the experimental requirement and the tool_filter/prefix + server-direction gaps. - §3.8 Real-world patterns: embedded/wrapped code (convert the browser-use surface, keep app glue), sync-over-async -> async, long-lived stateful executors -> stateless `messages`, vision intent -> `mode: hybrid/cua`, legacy `result.final_result` attribute form. - Custom-action gaps: injected special params (browser_session etc.) -> closures, `domains=` -> in-tool host check, `terminates_sequence` -> no equivalent. SKILL.md: added a "when NOT to migrate" scope gate (MCP-server / non-Agent / embedded) so the workflow doesn't assume every input is a convertible script. Tier-2: 5/5 handled correctly (4 convert+compile clean; MCP-server correctly flagged out-of-scope without fabricating). Validated against stagehand 3.6.0 / browser-use 0.13.1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

shrey150 · 2026-06-25T18:59:16Z

Pushed f24de18 — a second validation round targeting the real-world surface the first eval didn't cover (custom agent-tools, MCP, and browser-use embedded in real app code). Method: converted these through the skill (skill-only subagents) and type-checked against installed @browserbasehq/stagehand 3.6.0.

Tier-2 matrix

Source	Dimension	Result
`custom-functions/advanced_search.py`	agent-callable `@tools.action`	✅ converts + tsc-clean
synthetic `MCPClient` (stdio filesystem)	MCP client	✅ → `connectToMCPServer()` + `integrations` + `experimental`
`browser-use --mcp` (server)	MCP server	✅ correctly flagged out-of-scope, no fabricated conversion
`LangManus/src/tools/browser.py`	embedded, legacy API	✅ detected legacy, preserved wrapper, tsc-clean
`BLAST/blastai/executor.py` (449L)	embedded + `Controller`	✅ scoped BU surface, tsc-clean

5/5 handled correctly (4 convert+compile, 1 correctly-refused).

Bugs/holes this round caught → fixed in `f24de18`

ai v4 vs v5 (the big one): Stagehand 3.6.x types agent tools as the v5 ToolSet (inputSchema), but the template pinned "ai": "^4.0.0", whose tool() emits parameters → won't compile. Template now pins ^5.0.0 + documents the plain-object {description, inputSchema, execute} fallback.
Agent tools need experimental: true (same gate as output / integrations) — the prior tools example omitted it → runtime ExperimentalNotConfiguredError. Added a consolidated callout.
New §3.7 MCP mapping (client→integrations via connectToMCPServer/URL; server-direction = out of scope; tool_filter/prefix gaps).
New §3.8 real-world patterns (embedded/wrapped code, sync→async, stateful executors→messages, vision→mode: hybrid/cua, legacy result.final_result) + custom-action gaps (injected params→closures, domains=, terminates_sequence).
SKILL.md "when NOT to migrate" gate so the workflow doesn't assume every input is a convertible Agent(task=) script.

Note: anti-bot/captcha/stealth deliberately not in the eval — that's session/infra-layer (Browserbase), not framework-layer, so it isn't attributable to the conversion. The skill maps the config (api-mapping §4); the outcome is Browserbase's. validate CI green.

Live e2e smoke of the converted MCP-client script surfaced a Stagehand bug: passing a Client instance (local/stdio server via connectToMCPServer) into agent({ integrations }) throws "Converting circular structure to JSON" before the agent runs -- agent() does JSON.stringify(options.integrations) (v3.ts:1992) and the Client object is circular. Reproduced with two different MCP servers (filesystem, everything). URL-string integrations are unaffected. The skill's mapping is correct (it connects to the server); this is a framework bug, so §3.7 now flags it and recommends remote/URL MCP until fixed upstream. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…in example - prompt.md + determinism.md recommended agent().execute({ output }) (and prompt.md the agent `tools` path) without the experimental:true requirement that api-mapping already documents -> would throw ExperimentalNotConfiguredError. Added the experimental gate (output/tools/integrations) + prefer agent-then-extract, and pinned ai@^5 for the tools row. - EXAMPLES.md login: the browser-use task is "log in then open the dashboard" but the Stagehand after-script stopped at the allow-list check. Added the dashboard step so the migration completes the same task. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

prompt.md numbered the spectrum 1=Navigation..5=Autonomous, inverted vs determinism.md/guide.md/workflow (1=Autonomous..5=Navigation), so "Level N" citations could contradict across docs. Flipped prompt.md's table + the "decomposition (levels 2-5)" rule + the example's page.goto (L5) cite to match the canonical scale. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The agent-tool example took `url` from the model via inputSchema, but the original browser-use action reads the current URL from browser_session. That let the model pass a guessed URL and contradicted the section's own "close over page/stagehand in execute" guidance. The tool now takes no model args and reads page.url() directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 649ef83. Configure here.}

…ample The host check ran once after sign-in, but the added dashboard step navigated past it — and browser-use's allowed_domains enforces across the whole run. Extracted the check into assertAllowedHost() and call it after each navigation, with a comment that even this is best-effort (real continuous enforcement = Browserbase proxy domain rules, api-mapping §5). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Chris Browder and others added 2 commits June 8, 2026 19:14

rcbrowder marked this pull request as ready for review June 10, 2026 22:58

rcbrowder requested a review from shrey150 June 11, 2026 21:08

rcbrowder and others added 2 commits June 24, 2026 09:23

Merge branch 'main' into bu-to-bb

e8d714a

cursor Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread skills/browser-use-to-stagehand/references/prompt.md Outdated

Comment thread skills/browser-use-to-stagehand/EXAMPLES.md

shrey150 changed the title ~~Add bu-to-bb skill: migrate browser-use to Stagehand on Browserbase~~ Add browser-use-to-stagehand skill: migrate browser-use to Stagehand on Browserbase Jun 25, 2026

shrey150 and others added 2 commits June 25, 2026 12:15

cursor Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread skills/browser-use-to-stagehand/references/prompt.md

cursor Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread skills/browser-use-to-stagehand/references/api-mapping.md Outdated

cursor Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread skills/browser-use-to-stagehand/EXAMPLES.md

shrey150 approved these changes Jun 25, 2026

View reviewed changes

shrey150 merged commit e841873 into main Jun 25, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add browser-use-to-stagehand skill: migrate browser-use to Stagehand on Browserbase#130

Add browser-use-to-stagehand skill: migrate browser-use to Stagehand on Browserbase#130
shrey150 merged 12 commits into
mainfrom
bu-to-bb

rcbrowder commented Jun 9, 2026 •

edited by shrey150

Loading

Uh oh!

shrey150 commented Jun 11, 2026

Uh oh!

shrey150 commented Jun 25, 2026

Uh oh!

Uh oh!

Uh oh!

shrey150 commented Jun 25, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rcbrowder commented Jun 9, 2026 • edited by shrey150 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hardening + validation during review

Known limitations (by design, not defects)

Uh oh!

shrey150 commented Jun 11, 2026

Uh oh!

shrey150 commented Jun 25, 2026

What I did

E2E Test Matrix (before → after the fixes)

Fixes in 5b3c33e

Doc-resilience (the durable part)

Uh oh!

Uh oh!

Uh oh!

shrey150 commented Jun 25, 2026

Tier-2 matrix

Bugs/holes this round caught → fixed in f24de18

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rcbrowder commented Jun 9, 2026 •

edited by shrey150

Loading

Fixes in `5b3c33e`

Bugs/holes this round caught → fixed in `f24de18`