Skip to content

Add live browser action tools for direct Selenium control#5

Merged
meenurani1 merged 1 commit into
mainfrom
feature/browser-action-tools
Jun 2, 2026
Merged

Add live browser action tools for direct Selenium control#5
meenurani1 merged 1 commit into
mainfrom
feature/browser-action-tools

Conversation

@meenurani1
Copy link
Copy Markdown
Collaborator

Summary

  • Adds 8 new MCP tools that let an AI agent drive a real browser step by step, without needing to generate or execute a Robot Framework `.robot` file
  • All tools share a module-level WebDriver session and support Robot Framework-style selector prefixes (`id=`, `css=`, `xpath=`, `name=`, `class=`, `tag=`, `link=`, `partial_link=`), with plain CSS as the default fallback
  • Fixes a dead-code bug in `create_extended_selenium_keywords` where an unreferenced `return template` statement followed the actual return

New Tools

Tool Description
`browser_launch` Open Chrome or Firefox (headless optional), navigate to URL
`browser_navigate` Go to a new URL in the active session
`browser_click` Click an element, waits for it to be clickable first
`browser_send_keys` Type into an input, clears existing text by default
`browser_get_text` Read visible text from an element
`browser_wait_for_element` Wait for `visible`, `present`, `clickable`, or `hidden` state
`browser_screenshot` Save a timestamped PNG and return its absolute path
`browser_close` Quit the browser and clean up the session

Example Usage

```python
browser_launch("https://example.com", browser="Chrome", headless=False)
browser_send_keys("id=username", "admin")
browser_send_keys("id=password", "secret")
browser_click("css=button[type='submit']")
browser_screenshot("results/after_login.png")
browser_close()
```

Test Plan

  • `browser_launch` opens Chrome/Firefox and navigates to the URL
  • `browser_send_keys` types into input fields correctly
  • `browser_click` clicks buttons and links
  • `browser_get_text` returns correct element text
  • `browser_wait_for_element` respects all 4 states (visible, present, clickable, hidden)
  • `browser_screenshot` saves PNG and returns correct path
  • `browser_close` closes session cleanly
  • All tools return clear error messages when no session is open
  • Selector prefixes `id=`, `css=`, `xpath=`, `name=` all resolve correctly

Note: ChromeDriver must match the installed Chrome version. Update via `brew upgrade chromedriver` if versions diverge.

Adds 8 new MCP tools that let an AI agent drive a real browser step by
step without generating or running a Robot Framework file:

- browser_launch   — open Chrome/Firefox (headless supported), navigate to URL
- browser_navigate — go to a new URL in the active session
- browser_click    — click an element, waits for it to be clickable
- browser_send_keys — type into an input, clears existing text by default
- browser_get_text — read visible text from an element
- browser_wait_for_element — wait for visible/present/clickable/hidden state
- browser_screenshot — save a timestamped PNG and return its path
- browser_close    — quit the browser and clean up the session

All tools share a module-level WebDriver session and support Robot
Framework-style selector prefixes (id=, css=, xpath=, name=, class=,
tag=, link=, partial_link=) with plain CSS as the default fallback.

Also fixes a dead-code bug in create_extended_selenium_keywords where
an unreferenced return template statement followed the actual return.
Copilot AI review requested due to automatic review settings June 2, 2026 18:54
@meenurani1 meenurani1 merged commit 17e9ef8 into main Jun 2, 2026
1 check passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the MCP server beyond Robot Framework code generation by adding “live” Selenium-driven browser action tools that operate against a shared module-level WebDriver session, enabling step-by-step browser control via MCP tool calls. It also removes a dead/unreachable return template in create_extended_selenium_keywords.

Changes:

  • Added 8 new browser_* MCP tools for launching, navigating, clicking, typing, reading text, waiting for elements, taking screenshots, and closing a shared Selenium session.
  • Implemented Robot Framework-style selector prefixes (id=, css=, xpath=, etc.) with CSS as the fallback.
  • Removed unreachable dead code in create_extended_selenium_keywords.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mcp_server.py
Comment on lines +205 to +206
service = webdriver.ChromeService()
_driver = webdriver.Chrome(options=opts, service=service)
Comment thread mcp_server.py
service = webdriver.ChromeService()
_driver = webdriver.Chrome(options=opts, service=service)

_driver.maximize_window()
Comment thread mcp_server.py
if clear_first:
element.clear()
element.send_keys(text)
return f"Typed into {selector}: '{text}'"
Comment thread mcp_server.py
Comment on lines +191 to +197
if browser_lower == "firefox":
opts = FirefoxOptions()
if headless:
opts.add_argument("--headless")
_driver = webdriver.Firefox(options=opts)
else:
opts = ChromeOptions()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants