Blog / Spider MCP v2: Browser Automation for AI Agents

Spider MCP v2: Browser Automation for AI Agents

Spider's MCP server now ships 22 tools, including 9 browser automation tools that give AI agents direct control of cloud browsers with anti-bot bypass, proxy rotation, and session management.

9 min read Jeff Mendez

Spider MCP v2: Browser Automation for AI Agents

AI agents are good at reasoning. They’re bad at interacting with the web. They can’t click buttons, fill forms, or navigate through login flows, at least not without a browser. Spider MCP v2 fixes that. It gives any MCP-compatible AI agent (Claude, Cursor, Windsurf) 22 tools for web interaction, including 9 new browser automation tools that connect to Spider’s cloud browsers.

No local Chrome install. No Selenium setup. No CAPTCHA walls. Just npx spider-cloud-mcp and your agent can browse the web.

What Changed From v1 to v2

v1 shipped 13 tools: 8 core REST API tools and 5 AI tools. Every tool was a one-shot HTTP call: send a URL, get content back. That covers most scraping and crawling use cases, but it falls apart when a task requires multiple steps.

Consider what it takes to extract data from a dashboard that requires authentication:

  1. Navigate to the login page
  2. Fill in the email field
  3. Fill in the password field
  4. Click the submit button
  5. Wait for the redirect
  6. Navigate to the reports page
  7. Extract the data

v1 couldn’t do this. v2 can. The 9 new browser tools give agents stateful, multi-step control over a remote browser session.

Tool Inventory

CategoryCountTools
Core8spider_crawl, spider_scrape, spider_search, spider_links, spider_screenshot, spider_unblocker, spider_transform, spider_get_credits
AI5spider_ai_crawl, spider_ai_scrape, spider_ai_search, spider_ai_browser, spider_ai_links
Browser9spider_browser_open, spider_browser_navigate, spider_browser_click, spider_browser_fill, spider_browser_screenshot, spider_browser_content, spider_browser_evaluate, spider_browser_wait_for, spider_browser_close

The core and AI tools are unchanged from v1. Every existing workflow keeps working.

The Browser Tools

Each browser tool operates on a session. You open a session, perform actions, then close it. Sessions run in Spider’s cloud, so the agent never needs a local browser.

Opening a Session

spider_browser_open: {
  browser: "chrome",
  stealth: 2,
  country: "us",
  mode: "scraping"
}

This returns a session_id that you pass to every subsequent browser tool call. The stealth parameter controls proxy quality on a 0-3 scale, where higher levels use stronger proxies for harder-to-access sites. The mode parameter chooses between scraping (headless, fast, cheap) and cua (full rendering with screenshot/video support).

Sessions auto-close after 5 minutes of inactivity. You can have up to 5 concurrent sessions.

spider_browser_navigate: {
  session_id: "abc-123",
  url: "https://example.com",
  wait_until: "networkidle0"
}

The wait_until parameter supports four modes: load (default), domcontentloaded (faster), networkidle0 (waits for all requests to finish), and networkidle2 (allows up to 2 in-flight requests). Pick networkidle0 for SPAs that load data asynchronously.

To read the page after navigating:

spider_browser_content: {
  session_id: "abc-123",
  format: "text"
}

Returns the visible text content. Use format: "html" for the full DOM.

Interacting With Elements

Click a button:

spider_browser_click: {
  session_id: "abc-123",
  selector: "button[type='submit']"
}

Fill a form field (clears existing text first):

spider_browser_fill: {
  session_id: "abc-123",
  selector: "input[name='email']",
  value: "user@example.com"
}

Both tools wait for the element to appear before acting, with a configurable timeout. If the selector doesn’t match anything within 10 seconds (default), the tool returns an error instead of hanging.

Screenshots

spider_browser_screenshot: {
  session_id: "abc-123",
  full_page: true
}

Returns a base64-encoded PNG directly as an MCP image content block. The agent sees the actual screenshot, not a text description of it. This is useful for visual verification (“did the form submit correctly?”) and for debugging when a workflow doesn’t behave as expected.

You can also screenshot a specific element:

spider_browser_screenshot: {
  session_id: "abc-123",
  selector: "#chart-container"
}

JavaScript Execution

For anything the other tools don’t cover, there’s spider_browser_evaluate:

spider_browser_evaluate: {
  session_id: "abc-123",
  expression: "document.querySelectorAll('.product-card').length"
}

The expression runs in the page context with full DOM access. Use it for scrolling (window.scrollBy(0, 1000)), complex data extraction, or triggering custom events.

Waiting for Dynamic Content

SPAs and dynamic pages need explicit waits after interactions:

spider_browser_wait_for: {
  session_id: "abc-123",
  selector: ".results-loaded"
}

Three wait modes: selector (element appears in DOM), navigation (page navigates), or neither (defaults to network idle, meaning no requests for 500ms).

Closing Sessions

spider_browser_close: {
  session_id: "abc-123"
}

Always close sessions when done. Open sessions consume resources and credits. The MCP server also closes all sessions on shutdown and cleans up idle sessions automatically, but explicit cleanup is the right pattern.

How It Works

The browser tools use spider-browser (our TypeScript CDP client) to connect to browser.spider.cloud via the Chrome DevTools Protocol (CDP) over WebSocket. No Chrome binary is installed locally. spider-browser is a protocol client only.

AI Agent ─── MCP ──→ Spider MCP Server ─── CDP/WebSocket ──→ browser.spider.cloud

This is the same endpoint that spider-browser (our TypeScript client) connects to. Standard Puppeteer and Playwright users can connect to it too. The MCP server wraps it into discrete tool calls so AI agents don’t need to manage WebSocket connections or CDP commands.

Each session gets its own isolated browser context. Cookies, storage, and state don’t leak between sessions.

Session Lifecycle

  1. spider_browser_open: Connects via CDP WebSocket, stores the session.
  2. Tool calls: Each call looks up the session by ID, refreshes the idle timer, executes the action, returns the result.
  3. spider_browser_close: Disconnects and removes the session.
  4. Idle timeout: Sessions idle for 5+ minutes are closed automatically.
  5. Server shutdown: All sessions are closed on SIGINT/SIGTERM.

Speed, Cost, Reliability

These are the three things we optimize for across all 22 tools.

Speed

The REST API tools inherit Spider’s crawling speed: 100K+ pages per second for spider_crawl, sub-second responses for spider_scrape. The request: "smart" default auto-detects whether a page needs JavaScript rendering and picks HTTP or Chrome accordingly. Most pages don’t need Chrome, so most requests complete in the fast path.

For browser sessions, spider_browser_open connects to a cloud browser without a cold-start penalty. Navigation speed depends on the target site, not on Spider.

Cost

Core tools are pay-per-use credits. No subscription, no monthly minimum. Check your balance anytime with spider_get_credits. Credit costs scale with page complexity and whether JavaScript rendering is needed. HTTP-only requests are cheapest, Chrome rendering costs more, and premium proxies multiply the base cost. See spider.cloud/credits/new for exact pricing.

Browser sessions are metered per-second based on bandwidth. The stealth parameter controls proxy quality and cost. Level 1 is cheapest, and level 3 uses premium mobile proxies for the hardest-to-access sites.

AI tools require a subscription but eliminate the need to write extraction logic, CSS selectors, or automation scripts. For one-off tasks, the time savings alone justify the cost.

Reliability

Every browser session includes anti-bot protection. The stealth parameter controls the proxy quality tier, where higher levels use stronger proxies for harder-to-access sites.

The spider_unblocker REST tool is the heavy-duty option for one-shot access to bot-protected pages. It handles fingerprinting, proxy rotation, and retries in a single call.

Browser sessions support Chrome and Firefox. You pick the engine in spider_browser_open, and the session runs on Spider’s managed infrastructure.

Getting Started

Install

claude mcp add spider -- npx -y spider-cloud-mcp

Or for Claude Desktop, add to your config:

{
  "mcpServers": {
    "spider": {
      "command": "npx",
      "args": ["-y", "spider-cloud-mcp"],
      "env": {
        "SPIDER_API_KEY": "your-api-key"
      }
    }
  }
}

Get your API key at spider.cloud/api-keys.

Try It

Once connected, ask your AI agent:

  • “Scrape spider.cloud and give me the pricing details” → uses spider_scrape
  • “Search for recent papers on retrieval-augmented generation” → uses spider_search
  • “Open a browser, go to Hacker News, and get the top 10 story titles” → uses spider_browser_*

The agent picks the right tool based on the task. One-shot content retrieval uses the REST tools. Multi-step interaction uses the browser tools. You don’t have to specify which.

Real-World Workflows

Monitoring a Competitor’s Pricing Page

1. spider_scrape: {
     url: "https://competitor.com/pricing",
     return_format: "markdown",
     cache: { maxAge: 3600 }
   }

One call. Returns clean markdown with all plan names, prices, and feature lists. The cache parameter avoids redundant requests if you check multiple times per hour.

Filling Out a Web Form

1. spider_browser_open: { mode: "cua" }
2. spider_browser_navigate: { url: "https://forms.example.com/apply" }
3. spider_browser_fill: { selector: "#name", value: "Acme Corp" }
4. spider_browser_fill: { selector: "#email", value: "contact@acme.com" }
5. spider_browser_click: { selector: "select#industry" }
6. spider_browser_evaluate: {
     expression: "document.querySelector('select#industry').value = 'technology'"
   }
7. spider_browser_click: { selector: "button[type='submit']" }
8. spider_browser_wait_for: { selector: ".confirmation-message" }
9. spider_browser_content: { format: "text" }
10. spider_browser_close: {}

The agent drives the form step by step. If something goes wrong (element not found, navigation timeout), the tool returns a descriptive error and the agent can adapt: try a different selector, wait longer, or take a screenshot to understand the page state.

Building a RAG Pipeline

1. spider_crawl: {
     url: "https://docs.example.com",
     limit: 200,
     return_format: "markdown",
     filter_output_main_only: true,
     readability: true
   }

Returns up to 200 pages of clean markdown with navigation, ads, and boilerplate stripped. Feed the output directly into your vector database or RAG pipeline.

Extracting Structured Data Without Selectors

1. spider_ai_scrape: {
     url: "https://news.ycombinator.com",
     prompt: "Extract the top 30 stories as JSON with fields: rank, title, url, points, author, comment_count"
   }

No CSS selectors. No DOM inspection. The AI figures out the page structure and returns clean JSON. Works on pages where the HTML structure is messy, inconsistent, or obfuscated.

Architecture

The MCP server is 4 TypeScript files totaling ~870 lines:

FileLinesResponsibility
src/index.ts25Entry point, stdio transport, graceful shutdown
src/api.ts120REST API client, JSONL streaming parser, response truncation
src/browser.ts130Browser session pool, idle cleanup, connection lifecycle
src/server.ts600All 22 tool registrations with Zod schemas and error handling

Three runtime dependencies: @modelcontextprotocol/sdk, spider-browser, and zod. No bundled browser binary. The package is 21KB compressed.

Every tool has the same error handling pattern: try the operation, catch any error, return it as a structured MCP error response with isError: true. The server never crashes on a failed tool call. JSONL responses are truncated at 200K characters to prevent context window overflow. If you hit the limit, use the limit parameter to reduce the result set.

When to Use Which Tool

TaskToolWhy
Get content from one URLspider_scrapeFastest, cheapest. One HTTP call.
Get content from many pagesspider_crawlFollows links automatically. Set limit to control scope.
Search the webspider_searchReturns URLs or full content. Time filtering with tbs.
Access a bot-protected pagespider_unblockerHeavy-duty anti-bot bypass.
Extract structured dataspider_ai_scrapeNatural language → JSON. No selectors needed.
Multi-step workflow (login, navigate, interact)spider_browser_*Stateful sessions with full browser control.
Convert HTML you already havespider_transformNo web requests. Pure transformation.
Check your billspider_get_creditsReturns remaining credits.

The REST tools (spider_crawl, spider_scrape, etc.) should be your default. They’re faster and cheaper because they don’t maintain a persistent browser session. Use the browser tools when you need statefulness: login flows, multi-page forms, interactions that depend on previous actions.

The AI tools sit between the two: they handle complex extraction and automation through natural language but still operate as one-shot HTTP calls. Use spider_ai_scrape when you need structured data but don’t want to write selectors. Use spider_ai_browser when you need browser automation but don’t want to manage individual click/fill/navigate steps.

Source Code

The full source is on GitHub: spider-rs/spider-cloud-mcp-v2

Install with:

npx -y spider-cloud-mcp

Or add as an MCP server to Claude Code:

claude mcp add spider -- npx -y spider-cloud-mcp

Empower any project with AI-ready data

Join thousands of developers using Spider to power their data pipelines.