The Spider Open Source Ecosystem

Spider’s cloud API is backed by a collection of open source libraries. Each one is independently useful, and you can pull in just the parts you need for your own projects.

This guide covers every OSS project in the ecosystem, what it does, and how to get started with it.

Overview

Project	Language	License	What it does
spider	Rust	MIT	Core async web crawler
spider-browser	TypeScript, Python, Rust	MIT	Browser automation client
spider-clients	Python, JS, Rust, Go	MIT	API client SDKs
spider_transformations	Rust	MIT	HTML to markdown/text conversion
spider_fingerprint	Rust	MIT	TLS/HTTP fingerprinting
spider_firewall	Rust	MIT	URL blocking and filtering rules

spider: Core Crawler

GitHub: spider-rs/spider Crate: crates.io/crates/spider

The core engine. A high-performance async web crawler built on tokio with zero-copy HTML parsing.

Quick start

cargo add spider

use spider::website::Website;
use spider::configuration::Configuration;

#[tokio::main]
async fn main() {
    let mut config = Configuration::new();
    config.with_limit(50);
    config.with_respect_robots_txt(true);

    let mut website = Website::new("https://example.com")
        .with_configuration(config)
        .build()
        .unwrap();

    website.crawl().await;

    for page in website.get_pages().unwrap().iter() {
        println!("{}: {} bytes", page.get_url(), page.get_html_bytes_u8().len());
    }
}

Key features

Async concurrent crawling with configurable limits
Link following with depth control
Robots.txt compliance
Domain blacklisting/whitelisting
External domain support
Custom user agents
Configurable delays between requests

See the self-hosting guide for detailed configuration and Docker setup.

spider-browser: Browser Automation Client

GitHub: spider-rs/spider-browser npm: spider-browser

A WebSocket client for Spider’s browser automation service. Provides CDP (Chrome DevTools Protocol) access plus AI-powered methods for natural language interaction.

Quick start (TypeScript)

npm install spider-browser

import { SpiderBrowser } from "spider-browser";

const spider = new SpiderBrowser({
  apiKey: process.env.SPIDER_API_KEY,
  stealth: 0, // auto-escalates when blocked
});

await spider.init();
await spider.page.goto("https://example.com");

// Standard CDP methods
const title = await spider.page.title();
console.log("Title:", title);

// AI methods
const data = await spider.page.extract(
  "Get the main heading and first paragraph"
);
console.log("Extracted:", data);

await spider.close();

Key features

Full CDP protocol access (navigate, click, type, scroll, screenshot)
AI extract(): describe what data you want in plain English, get structured JSON
AI act(): describe actions in plain English (“click the login button”)
AI agent(): multi-step autonomous workflows
AI observe(): describe what to watch for, get notified when it appears
Stealth mode with auto-escalation when blocked
Available in TypeScript, Python, and Rust

spider-clients: API Client SDKs

GitHub: spider-rs/spider-clients

Official client libraries for the Spider cloud API. Available in Python, JavaScript/TypeScript, Rust, and Go.

Python

pip install spider-client

from spider import Spider

spider = Spider()

# Crawl a site
pages = spider.crawl_url(
    "https://example.com",
    params={"limit": 10, "return_format": "markdown"},
)
for page in pages:
    print(f"{page['url']}: {len(page['content'])} chars")

# AI extraction
result = spider.ai_scrape(
    "https://example.com",
    "Extract the main content and any contact information",
)
print(result)

JavaScript/TypeScript

npm install @spider-cloud/spider-client

import { Spider } from "@spider-cloud/spider-client";

const spider = new Spider({ apiKey: process.env.SPIDER_API_KEY });

// Crawl with streaming
await spider.crawlUrl(
  "https://example.com",
  { limit: 10, return_format: "markdown" },
  true,
  (page) => {
    console.log(`${page.url}: ${page.content?.length} chars`);
  }
);

Go

go get github.com/spider-rs/spider-clients/go

package main

import (
    "context"
    "fmt"
    spider "github.com/spider-rs/spider-clients/go"
)

func main() {
    client := spider.New("")  // Uses SPIDER_API_KEY env var

    pages, err := client.CrawlURL(context.Background(), "https://example.com", &spider.SpiderParams{
        Limit:        10,
        ReturnFormat: spider.FormatMarkdown,
    })
    if err != nil {
        panic(err)
    }

    for _, page := range pages {
        fmt.Printf("%s: %d chars\n", page.URL, len(page.Content))
    }
}

Rust

cargo add spider-client

use spider_client::Spider;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let spider = Spider::new(None)?;

    let pages = spider.crawl_url(
        "https://example.com",
        Default::default(),
        false,
        None,
    ).await?;

    for page in &pages {
        println!("{}: {} chars", page.url, page.content.as_ref().map(|c| c.len()).unwrap_or(0));
    }
    Ok(())
}

spider_transformations: HTML to Markdown

Part of the spider monorepo. Converts raw HTML into clean markdown optimized for LLM consumption.

Quick start

cargo add spider_transformations

use spider_transformations::transformation::content::{transform_content, TransformConfig};

fn main() {
    let html = r#"
        <html>
        <head><title>Example</title></head>
        <body>
            <nav>Home | About | Contact</nav>
            <main>
                <h1>Welcome</h1>
                <p>This is the main content of the page.</p>
                <ul>
                    <li>Feature one</li>
                    <li>Feature two</li>
                </ul>
            </main>
            <footer>Copyright 2026</footer>
        </body>
        </html>
    "#;

    let config = TransformConfig::default();
    let markdown = transform_content(html, "https://example.com", &config);
    println!("{}", markdown);
    // Output:
    // # Welcome
    //
    // This is the main content of the page.
    //
    // - Feature one
    // - Feature two
}

What it does

Strips navigation, sidebars, footers, cookie banners
Preserves semantic structure (headings, lists, tables, code blocks)
Converts links to markdown format
Handles nested HTML structures
Configurable boilerplate removal

spider_fingerprint: TLS Fingerprinting

Part of the spider monorepo. Manages TLS/HTTP fingerprints to avoid detection by anti-bot systems.

What it does

Generates realistic TLS client hello fingerprints
Rotates HTTP/2 settings and header order
Mimics real browser fingerprint patterns
Helps avoid bot detection at the TLS layer

This crate is used internally by the spider crawler when the fingerprint feature is enabled.

spider_firewall: URL Blocking Rules

Part of the spider monorepo. Implements URL filtering and blocking rules for crawlers.

What it does

Pattern-based URL blocking (regex and glob)
Domain-level allow/deny lists
Path-based filtering
Resource type filtering (images, scripts, stylesheets)

Used internally to implement blacklist/whitelist behavior and resource optimization during crawls.

How the pieces fit together

For a self-hosted crawling pipeline:

spider (crawl) → spider_transformations (convert to markdown) → your pipeline

For cloud API usage:

spider-clients (API call) → Spider Cloud → results

For browser automation:

spider-browser (WebSocket) → Spider Browser Cloud → live browser session

The OSS ecosystem gives you building blocks. The cloud API assembles them into a managed service with additional capabilities (proxies, anti-bot, AI, scaling) that aren’t practical to self-host.

Contributing

All Spider OSS projects accept contributions. The main repos:

Core crawler: spider-rs/spider
Browser client: spider-rs/spider-browser
API clients: spider-rs/spider-clients

Issues and PRs welcome. The projects use MIT licensing, so no CLA is required.

Try the cloud API. Free credits to start, no card required.

The Spider Open Source Ecosystem

The Spider Open Source Ecosystem

Overview

spider: Core Crawler

Quick start

Key features

spider-browser: Browser Automation Client

Quick start (TypeScript)

Key features

spider-clients: API Client SDKs

Python

JavaScript/TypeScript

Go

Rust

spider_transformations: HTML to Markdown

Quick start

What it does

spider_fingerprint: TLS Fingerprinting

What it does

spider_firewall: URL Blocking Rules

What it does

How the pieces fit together

Contributing

Empower any project with AI-ready data