NEW AI Studio is now available Try it now
POST /pipeline/*

AI Data Extraction

Deprecated

Go beyond raw content extraction. Spider's AI pipelines understand web pages semantically, pulling out contacts and leads, generating question-answer pairs, categorizing websites, and filtering links based on relevance. Crawl + AI in one integrated workflow.

Four AI Pipelines

Extract Contacts

POST /pipeline/extract-contacts

Crawl websites and use AI to identify and extract contact information including email addresses, phone numbers, social profiles, and business details. Results are stored and queryable via the contacts data API.

Email discovery Phone numbers Social profiles Company data

Questions & Answers

POST /pipeline/extract-qa

Crawl a website and generate structured Q&A pairs from its content. Provide an inquiry or topic and Spider produces relevant questions with answers grounded in the actual page content.

FAQ generation Topic-focused Training data Knowledge bases

Label Website

POST /pipeline/label

Crawl a website and have AI categorize it into topics, industries, or custom labels. Useful for building directories, classifying leads, or organizing large collections of URLs.

Auto-categorization Industry detection Custom labels Topic tagging

Filter Links

POST /pipeline/filter-links

Crawl a website's links and use AI to filter them based on relevance, content type, or custom criteria. Keep only the URLs that match your data collection goals, eliminating noise.

AI relevance scoring Content-type filtering Custom criteria Noise elimination

Bonus: Crawl from Text

POST /pipeline/crawl-text

Paste raw text or markdown containing URLs, and Spider will automatically extract every link and crawl them. Skip the step of parsing URLs yourself. Just send the document, email body, or notes and let Spider handle discovery. Supports up to 10 MB of input text.

Code Examples

Extract contacts from a website Python
from spider import Spider

client = Spider()

# Extract contacts from a company website
contacts = client.extract_contacts(
    "https://example.com",
    params={
        "limit": 50,
    }
)

for contact in contacts:
    print(contact)
Generate Q&A pairs from a knowledge base cURL
curl -X POST https://api.spider.cloud/pipeline/extract-qa \
  -H "Authorization: Bearer $SPIDER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "limit": 25,
    "return_format": "markdown"
  }'
Label a website by category JavaScript
import Spider from "@spider-cloud/spider-client";

const client = new Spider();

const result = await client.label(
  "https://example.com",
  { limit: 10 }
);

console.log(result);
// [{ url: "...", labels: ["Technology", "SaaS", "Developer Tools"] }]

Popular AI Extraction Use Cases

Sales Lead Generation

Crawl target company websites to extract emails, phone numbers, and team member details. Build prospect lists automatically instead of manual research.

Fine-Tuning Datasets

Generate Q&A pairs from documentation sites and knowledge bases to create training data for domain-specific language models and chatbots.

Website Directories

Label and categorize large collections of URLs for building topical directories, industry databases, or content recommendation systems.

Smart Link Discovery

Filter a website's links to find only product pages, blog posts, or documentation. Skip navigation, legal pages, and irrelevant content.

Related Resources

Extract intelligence from any website

Combine web crawling with AI understanding. Pull structured data that goes beyond raw HTML.