POST /transform

HTML Transform API

Already have the HTML? Transform it into clean markdown, plain text, or sanitized HTML without re-crawling. The most cost-efficient way to process web content you've already collected.

Start Transforming Try in Playground

Input HTML

</div>

<h1>Hello World</h1>

<p>Content here</p>

</article>

→

Output Markdown

# Hello World

Content here

nav, footer, classes stripped

0.1

credits per page

No browser rendering, no proxy costs. The cheapest way to convert web content you already have.

Why Use Transform Instead of Scrape?

Already Have the HTML

If you've already fetched pages from the web via your own crawlers, browser extensions, or cached content, Transform converts them without paying for another network request.

Cost Efficient

At just 0.1 credits per HTML document (up to 10 credits for PDFs), Transform is the cheapest endpoint. No browser rendering or proxy costs involved.

Batch Processing

Send an array of HTML documents in one request. Process entire collections of saved pages in a single API call, up to 10 MB total.

Three Cleaning Levels

Light

Standard

Basic HTML-to-format conversion. Preserves all content structure including navigation, footers, and sidebars.

No flags needed

Recommended

AI Clean

Removes navigation, footers, ads, and boilerplate. Keeps main article content, optimized for feeding into language models.

"clean": true

Deep

Full Clean

Strips all non-essential HTML attributes: classes, IDs, inline styles. Produces minimal, semantic markup.

"clean_full": true

Key Capabilities

Readability Extraction

Enable readability to extract just the main content using Mozilla's readability algorithm. Perfect for articles and blog posts.

Multiple Output Formats

Convert to markdown, text, or sanitized html. Markdown for LLMs, text for NLP, clean HTML for re-rendering.

URL Context

Pass the source URL alongside HTML so relative links resolve to absolute URLs. Ensures links in markdown output work correctly.

Batch Input

Send an array of {html, url} objects. Transform dozens of pages in a single request to minimize round-trips.

PDF Support

Also handles PDF content extraction. Convert PDF documents to markdown or text at up to 10 credits per page.

10 MB Payload

Process up to 10 MB of HTML per request. Large pages, long articles, and complex documents handled without truncation.

Code Examples

Python cURL JavaScript

from spider import Spider

client = Spider()

html_content = "<html><body><h1>Hello</h1><p>World</p></body></html>"

result = client.transform(
    [{ "html": html_content, "url": "https://example.com" }],
    params={
        "return_format": "markdown",
        "clean": True,
    }
)

print(result[0]["content"])
# Output: # Hello

World

curl -X POST https://api.spider.cloud/transform \
  -H "Authorization: Bearer $SPIDER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      {"html": "<h1>Page One</h1><p>Content...</p>", "url": "https://example.com/1"},
      {"html": "<h1>Page Two</h1><p>Content...</p>", "url": "https://example.com/2"}
    ],
    "return_format": "markdown",
    "readability": true
  }'

import Spider from "@spider-cloud/spider-client";

const client = new Spider();

const result = await client.transform(
  [{ html: "<h1>Hello</h1><p>World</p>", url: "https://example.com" }],
  {
    return_format: "markdown",
    readability: true,
  }
);

console.log(result[0].content);

Popular Use Cases

Post-Processing Cached Content

You've already saved HTML from your own crawlers or a CDN cache. Transform converts it to clean markdown without consuming browser or proxy credits.

Email & Newsletter Parsing

Convert HTML emails into readable text or markdown for indexing, summarization, or feeding into language models.

CMS Content Migration

Export HTML from one CMS and transform it to markdown for import into a static site generator, wiki, or headless CMS.

Document Preprocessing

Clean and normalize HTML documents before embedding or indexing. Strip formatting artifacts and extract pure semantic content.

Related Resources

Scrape API — Fetch and extract in one step Crawl API — Crawl with built-in transformation Full API Reference — All transform parameters