Skip to content

Add browser-use-to-stagehand skill: migrate browser-use to Stagehand on Browserbase#130

Merged
shrey150 merged 12 commits into
mainfrom
bu-to-bb
Jun 25, 2026
Merged

Add browser-use-to-stagehand skill: migrate browser-use to Stagehand on Browserbase#130
shrey150 merged 12 commits into
mainfrom
bu-to-bb

Conversation

@rcbrowder

@rcbrowder rcbrowder commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Converts browser-use (Python) automation to Stagehand v3 (TypeScript) on Browserbase, choosing the right level of determinism per step rather than a one-to-one agentic copy.

  • SKILL.md: detect the browser-use variant, decompose the task across the determinism spectrum, emit Stagehand v3 + a migration summary
  • references/: full API mapping, determinism decision framework, and an optional trace-assisted path (pairs with the browser-trace skill)
  • guide.md: human migration guide (philosophy, feature mapping, determinism)
  • prompt.md: tool-agnostic docs prompt (works in any AI assistant)
  • EXAMPLES.md: before/after pairs
  • README.md (root): add to the skills table

Targets Stagehand v3, validated against @browserbasehq/stagehand 3.6.0 / browser-use 0.13.1.


Hardening + validation during review

The original skill was validated by a live end-to-end eval and hardened across several commits. The eval converts real browser-use scripts through the skill (subagents reading only the skill), then runs the generated Stagehand on live Browserbase and compares against the originals + ground truth.

Tier-1 (9 browser-use example scripts): 9/9 compile; live runs 3 PASS / 3 PARTIAL / 3 FAIL → 6 PASS / 2 PARTIAL / 0 FAIL after fixes:

  • networkidledomcontentloaded (times out on Google/SPA pages)
  • agent output needs experimental: true + zod object (recommend agent-then-extract)
  • setViewportSize(w, h) is positional, not { width, height }
  • variant table: ChatBrowserUse is not a Rust-beta tell (only a browser_use.beta import is)
  • "iterate an extracted list / resolve relative URLs" pattern

Tier-2 (real-world: agent-tools, MCP, embedded apps — LangManus / BLAST): 5/5 handled correctly (4 convert + tsc-clean, MCP-server correctly flagged out-of-scope). Also caught:

  • ai must be pinned to v5 — Stagehand 3.6.x types agent tools as the v5 ToolSet (inputSchema); v4's tool() emits parameters and won't compile
  • §3.7 MCP mapping (client → integrations via connectToMCPServer/URL; server-direction = out of scope)
  • §3.8 real-world patterns (embedded/wrapped code, sync→async, stateful executors → messages, vision → mode: hybrid/cua, legacy result.final_result)
  • "when NOT to migrate" scope gate in SKILL.md

Doc-resilience (keep-but-demote): the API specifics drift every release (the variant table was already stale), so the tables are kept but demoted under a version-provenance header + "live docs supersede this snapshot" + a "verify signatures against the installed package" workflow step.

Naming: renamed bu-to-bbbrowser-use-to-stagehand (verified against the Anthropic Agent Skills naming/discovery docs).

All 5 Cursor Bugbot findings addressed + threads resolved. validate CI green. The eval harness doubles as a drift detector to re-run on each Stagehand / browser-use release.

Known limitations (by design, not defects)


Note

Low Risk
Documentation and agent skill content only; no production code, auth, or runtime behavior changes in the repo.

Overview
Adds a new browser-use-to-stagehand agent skill and registers it in the root README skills table.

The skill guides converting browser-use (Python) scripts to Stagehand v3 (TypeScript) on Browserbase by decomposing flows across a determinism spectrum (page.goto, act/extract/observe, cached replay, agent() only when needed) instead of a one-to-one agent port. SKILL.md defines scope gates (e.g. MCP server mode out of scope), variant detection, inventory, output templates, and a migration summary checklist.

Supporting docs include references/api-mapping.md (feature table, gaps like allowed_domains, v3 gotchas), determinism.md, trace-assisted.md (Session Logs / optional browser-trace), human guide.md, pasteable prompt.md, EXAMPLES.md before/after pairs, and MIT LICENSE.txt.

Reviewed by Cursor Bugbot for commit 828211d. Bugbot is set up for automated code reviews on this repo. Configure here.

Chris Browder and others added 2 commits June 8, 2026 19:14
Converts browser-use (Python) automation to Stagehand v3 (TypeScript) on
Browserbase, choosing the right level of determinism per step rather than a
one-to-one agentic copy.

- SKILL.md: detect the browser-use variant, decompose the task across the
  determinism spectrum, emit Stagehand v3 + a migration summary
- references/: full API mapping, determinism decision framework, and an
  optional trace-assisted path (pairs with the browser-trace skill)
- GUIDE.md: human migration guide (philosophy, feature mapping, determinism)
- PROMPT.md: tool-agnostic docs prompt (works in any AI assistant)
- EXAMPLES.md: before/after pairs
- README.md (skill): index; root README.md: add to the skills table

Targets Stagehand v3 (verified against live docs); validated by running the
skill on fresh scripts a clean agent had never seen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Express determinism purely through Stagehand primitives — cached observe()→act()
plus selfHeal/cacheDir/serverCache — instead of falling back to Playwright
selectors. Drops all "Playwright" references and page.locator()/page.fill()
calls across SKILL.md, GUIDE.md, PROMPT.md, README.md, and the determinism /
trace-assisted references. page.goto/page.url stay (Stagehand's own page methods).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@rcbrowder rcbrowder marked this pull request as ready for review June 10, 2026 22:58
@shrey150

Copy link
Copy Markdown
Contributor

Standards pass (see #36 / the conventions all merged skills follow):

  • Missing LICENSE.txt — every skill ships MIT with Copyright (c) 2026 Browserbase, Inc.; copy skills/browser/LICENSE.txt verbatim.
  • Extra top-level docs (README.md, GUIDE.md, PROMPT.md) in the skill dir: agents only auto-load SKILL.md, so the convention is SKILL.md + REFERENCE.md/EXAMPLES.md or a references/ dir. Suggest folding GUIDE/PROMPT into references/ and dropping the skill-local README so content isn't stranded.

A CONTRIBUTING.md + CI validator for these checks is landing shortly.

Per @shrey150's standards pass:
- add LICENSE.txt (MIT, verbatim from skills/browser)
- move GUIDE.md -> references/guide.md and PROMPT.md -> references/prompt.md,
  both linked from SKILL.md so the content isn't stranded
- drop the skill-local README.md
- fix relative links in references/guide.md and EXAMPLES.md

Skill dir now follows the convention: SKILL.md + EXAMPLES.md + references/ + LICENSE.txt.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@rcbrowder rcbrowder requested a review from shrey150 June 11, 2026 21:08
rcbrowder and others added 2 commits June 24, 2026 09:23
Validated by converting 9 real browser-use examples through the skill and
running the generated Stagehand on live Browserbase. Fixes 3 runtime failures
that typecheck clean but broke at runtime:

- Page settling: networkidle -> domcontentloaded. networkidle never fires on
  Google/analytics/SPA pages and throws a 15s timeout (broke 2 of 9 cases).
- Agent output schema: agent().execute({ output }) throws
  ExperimentalNotConfiguredError unless experimental:true, and output must be a
  zod object (not a top-level array). Document both, and recommend the
  agent-then-extract pattern to stay on the managed API path (broke 1 case).
- setViewportSize takes positional (width, height), not Playwright's
  { width, height } object (would not compile).
- Variant table: ChatBrowserUse is browser-use's hosted-model class and appears
  in stable 0.13.x code -- it is NOT a Rust-beta tell. Only a literal
  browser_use.beta import is. Prevents variant mis-detection.
- Add an "iterate an extracted list / resolve relative URLs" pattern
  (new URL(href, page.url())) to avoid "Cannot navigate to invalid URL".

Doc-resilience (keep the API tables, demote them): add a version-provenance
header, state that the live docs supersede this snapshot on any conflict, and
make "verify signatures against the installed package" an explicit workflow
step -- so the skill fails safe as Stagehand / browser-use drift.

Validated against @browserbasehq/stagehand 3.6.0 / browser-use 0.13.1.
Eval: 9/9 compile; live Browserbase 3 PASS/3 PARTIAL/3 FAIL -> 6 PASS/2 PARTIAL/0 FAIL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shrey150

Copy link
Copy Markdown
Contributor

Pushed 5b3c33e to this branch with fixes validated by a live end-to-end eval. Sharing the evidence since these are runtime bugs that tsc couldn't catch.

What I did

Converted 9 real browser-use/examples scripts through this skill (subagents reading only the skill — no prior Stagehand knowledge), then ran the generated Stagehand on live Browserbase and compared outcomes against the original browser-use runs + ground truth.

E2E Test Matrix (before → after the fixes)

browser-use example exercises before (this PR's skill) after (5b3c33e)
getting_started/03_data_extraction extract, stable ground truth ✅ PASS
features/custom_output Pydantic→zod, top-level array ✅ PASS
features/small_model_for_extraction page_extraction_llmextract({model}) ⚠️ partial ⚠️ partial (founders correct; relative-URL loop)
custom-functions/save_to_file_hugging_face custom @tools.action ⚠️ unsorted ⚠️ sorted now
features/sensitive_data sensitive_datavariables ✅ PASS (httpbin echo)
features/restrict_urls allowed_domains gap networkidle timeout ✅ runs clean
getting_started/04_multi_step_task decompose vs agent ExperimentalNotConfiguredError ✅ PASS (Beautiful Soup/Scrapy/Selenium)
models/gpt-5-mini faithful agent() ⚠️ no verdict ✅ real verdict
getting_started/01_basic_search search networkidle timeout ✅ PASS

9/9 compile both times; live runs went 3 PASS / 3 PARTIAL / 3 FAIL → 6 PASS / 2 PARTIAL / 0 FAIL. Validated against @browserbasehq/stagehand 3.6.0 / browser-use 0.13.1.

Fixes in 5b3c33e

  • networkidledomcontentloaded (4 spots): networkidle never fires on Google/analytics/SPA pages → 15s timeout. (fixed restrict_urls, basic_search)
  • Agent output schema: throws ExperimentalNotConfiguredError without experimental: true, and must be a zod object (not a top-level array). Documented both + recommend agent-then-extract to stay on the managed API path. (fixed multi_step)
  • setViewportSize(w, h) is positional, not Playwright's { width, height } (the object form won't compile).
  • Variant table: ChatBrowserUse is browser-use's hosted-model class and appears in stable 0.13.x — it's not a Rust-beta tell (only a literal browser_use.beta import is). Prevents variant mis-detection.
  • New pattern: iterating an extracted list + resolving relative URLs (new URL(href, page.url())) to avoid Cannot navigate to invalid URL.

Doc-resilience (the durable part)

The skill's API specifics drift every Stagehand/browser-use release (the variant table was already stale). Kept the API tables but demoted them: added a version-provenance header, "live docs supersede this snapshot on conflict," and made "verify signatures against the installed package" an explicit workflow step — so the skill fails safe rather than silently emitting a dead API. The eval harness itself doubles as a drift detector to re-run on each release.

Comment thread skills/browser-use-to-stagehand/references/prompt.md Outdated
Comment thread skills/browser-use-to-stagehand/EXAMPLES.md
Cryptic abbreviation -> descriptive, discoverable name:
- "bu-to-bb" was opaque, and "bb" collided with the (now-deprecated) bb CLI.
- "bu" is also a loaded token in browser-use land -- it's the prefix for their
  hosted models (bu-2-0, bu-30b), not the library -- so "bu-to-*" misreads.
- The name is loaded into the system prompt as discovery metadata, so a name
  carrying "browser-use" + "stagehand" aids triggering, not just human scanning.
- Matches the repo's descriptive naming convention (competitor-analysis,
  browser-trace, cookie-sync).

Renames the directory, frontmatter name, SKILL.md heading, README row, and all
internal /bu-to-bb references. Validator passes (node scripts/validate-skills.mjs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shrey150 shrey150 changed the title Add bu-to-bb skill: migrate browser-use to Stagehand on Browserbase Add browser-use-to-stagehand skill: migrate browser-use to Stagehand on Browserbase Jun 25, 2026
…er-2 eval)

Patches the holes a tier-2 eval found by converting real-world browser-use code
(custom agent-tools, MCP client/server, and BU embedded in real app code:
LangManus, BLAST) through the skill and type-checking against installed 3.6.0.

Correctness:
- Agent custom `tools` require `experimental: true` (same gate as agent `output`
  and MCP `integrations`) -- the prior tools example omitted it and would throw
  ExperimentalNotConfiguredError at runtime. Consolidated callout added.
- `ai` must be pinned to v5: Stagehand 3.6.x types `tools` as the v5 ToolSet
  (schema field `inputSchema`); the v4 `tool()` helper emits `parameters` and
  fails to type-check. Template now pins `^5.0.0` and documents the plain-object
  `{ description, inputSchema, execute }` alternative.

New mappings:
- §3.7 MCP: browser-use MCPClient (stdio + remote) -> Stagehand
  `agent({ integrations })` via `connectToMCPServer(...)` / URL string, with the
  experimental requirement and the tool_filter/prefix + server-direction gaps.
- §3.8 Real-world patterns: embedded/wrapped code (convert the browser-use
  surface, keep app glue), sync-over-async -> async, long-lived stateful
  executors -> stateless `messages`, vision intent -> `mode: hybrid/cua`, legacy
  `result.final_result` attribute form.
- Custom-action gaps: injected special params (browser_session etc.) -> closures,
  `domains=` -> in-tool host check, `terminates_sequence` -> no equivalent.

SKILL.md: added a "when NOT to migrate" scope gate (MCP-server / non-Agent /
embedded) so the workflow doesn't assume every input is a convertible script.

Tier-2: 5/5 handled correctly (4 convert+compile clean; MCP-server correctly
flagged out-of-scope without fabricating). Validated against stagehand 3.6.0 /
browser-use 0.13.1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shrey150

Copy link
Copy Markdown
Contributor

Pushed f24de18 — a second validation round targeting the real-world surface the first eval didn't cover (custom agent-tools, MCP, and browser-use embedded in real app code). Method: converted these through the skill (skill-only subagents) and type-checked against installed @browserbasehq/stagehand 3.6.0.

Tier-2 matrix

Source Dimension Result
custom-functions/advanced_search.py agent-callable @tools.action ✅ converts + tsc-clean
synthetic MCPClient (stdio filesystem) MCP client ✅ → connectToMCPServer() + integrations + experimental
browser-use --mcp (server) MCP server correctly flagged out-of-scope, no fabricated conversion
LangManus/src/tools/browser.py embedded, legacy API ✅ detected legacy, preserved wrapper, tsc-clean
BLAST/blastai/executor.py (449L) embedded + Controller ✅ scoped BU surface, tsc-clean

5/5 handled correctly (4 convert+compile, 1 correctly-refused).

Bugs/holes this round caught → fixed in f24de18

  • ai v4 vs v5 (the big one): Stagehand 3.6.x types agent tools as the v5 ToolSet (inputSchema), but the template pinned "ai": "^4.0.0", whose tool() emits parameterswon't compile. Template now pins ^5.0.0 + documents the plain-object {description, inputSchema, execute} fallback.
  • Agent tools need experimental: true (same gate as output / integrations) — the prior tools example omitted it → runtime ExperimentalNotConfiguredError. Added a consolidated callout.
  • New §3.7 MCP mapping (client→integrations via connectToMCPServer/URL; server-direction = out of scope; tool_filter/prefix gaps).
  • New §3.8 real-world patterns (embedded/wrapped code, sync→async, stateful executors→messages, vision→mode: hybrid/cua, legacy result.final_result) + custom-action gaps (injected params→closures, domains=, terminates_sequence).
  • SKILL.md "when NOT to migrate" gate so the workflow doesn't assume every input is a convertible Agent(task=) script.

Note: anti-bot/captcha/stealth deliberately not in the eval — that's session/infra-layer (Browserbase), not framework-layer, so it isn't attributable to the conversion. The skill maps the config (api-mapping §4); the outcome is Browserbase's. validate CI green.

shrey150 and others added 2 commits June 25, 2026 12:15
Live e2e smoke of the converted MCP-client script surfaced a Stagehand bug:
passing a Client instance (local/stdio server via connectToMCPServer) into
agent({ integrations }) throws "Converting circular structure to JSON" before
the agent runs -- agent() does JSON.stringify(options.integrations) (v3.ts:1992)
and the Client object is circular. Reproduced with two different MCP servers
(filesystem, everything). URL-string integrations are unaffected. The skill's
mapping is correct (it connects to the server); this is a framework bug, so §3.7
now flags it and recommends remote/URL MCP until fixed upstream.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in example

- prompt.md + determinism.md recommended agent().execute({ output }) (and prompt.md
  the agent `tools` path) without the experimental:true requirement that api-mapping
  already documents -> would throw ExperimentalNotConfiguredError. Added the
  experimental gate (output/tools/integrations) + prefer agent-then-extract, and
  pinned ai@^5 for the tools row.
- EXAMPLES.md login: the browser-use task is "log in then open the dashboard" but
  the Stagehand after-script stopped at the allow-list check. Added the dashboard
  step so the migration completes the same task.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread skills/browser-use-to-stagehand/references/prompt.md
prompt.md numbered the spectrum 1=Navigation..5=Autonomous, inverted vs
determinism.md/guide.md/workflow (1=Autonomous..5=Navigation), so "Level N"
citations could contradict across docs. Flipped prompt.md's table + the
"decomposition (levels 2-5)" rule + the example's page.goto (L5) cite to match
the canonical scale.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread skills/browser-use-to-stagehand/references/api-mapping.md Outdated
The agent-tool example took `url` from the model via inputSchema, but the
original browser-use action reads the current URL from browser_session. That
let the model pass a guessed URL and contradicted the section's own "close over
page/stagehand in execute" guidance. The tool now takes no model args and reads
page.url() directly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 649ef83. Configure here.

Comment thread skills/browser-use-to-stagehand/EXAMPLES.md
…ample

The host check ran once after sign-in, but the added dashboard step navigated
past it — and browser-use's allowed_domains enforces across the whole run.
Extracted the check into assertAllowedHost() and call it after each navigation,
with a comment that even this is best-effort (real continuous enforcement =
Browserbase proxy domain rules, api-mapping §5).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shrey150 shrey150 merged commit e841873 into main Jun 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants