NEW AI Studio is now available Try it now
LangChain logo

Integration

LangChain + Spider

Use Spider as a document loader in your LangChain applications. Crawl websites, search the web, and feed clean markdown directly into your RAG chains, agents, and retrieval pipelines.

from langchain_community.document_loaders import SpiderLoader

# Crawl a website and load as LangChain documents
loader = SpiderLoader(
    url="https://docs.example.com",
    mode="crawl",
    params={
        "return_format": "markdown",
        "limit": 50,
    }
)

docs = loader.load()
# Each doc has .page_content (markdown) and .metadata (url, title, etc.)

# Feed into a RAG chain
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

Document Loader

SpiderLoader returns LangChain Document objects with page_content and metadata. Drop it into any existing chain.

Crawl, Scrape, or Search

Set mode to "crawl" for full-site indexing, "scrape" for specific pages, or use the search endpoint for web-wide discovery.

RAG-Ready Markdown

Clean markdown output that embedding models perform well on. Navigation, ads, and boilerplate are stripped automatically.

Streaming Support

Use lazy_load() to stream documents as they are crawled. Start embedding while Spider is still fetching.

All Crawl Parameters

Pass any Spider parameter through the loader: proxy mode, browser rendering, readability, custom selectors, and more.

Source Attribution

Every document includes its URL, crawl timestamp, and page metadata for citation grounding in your RAG responses.

Search + LangChain for Live RAG

Combine Spider's Search API with LangChain to answer questions using real-time web data.

from spider import Spider
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

spider = Spider()
llm = ChatOpenAI(model="gpt-4o")

# Search the web and get content in one call
results = spider.search(
    "latest changes to GDPR enforcement",
    params={
        "search_limit": 5,
        "fetch_page_content": True,
        "return_format": "markdown",
    }
)

context = "\n---\n".join(
    [f"[{r['url']}]\n{r['content'][:3000]}" for r in results if r.get("content")]
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using the sources. Cite URLs."),
    ("user", "Sources:\n{context}\n\nQuestion: {question}"),
])

chain = prompt | llm
answer = chain.invoke({"context": context, "question": "What changed?"})
print(answer.content)

Start building with LangChain + Spider

Free credits on signup. No subscription required.