JSON Scraping

Many websites embed structured data directly in their HTML — product details in JSON-LD, page data in Next.js __NEXT_DATA__ scripts, or API payloads in SSR hydration. Spider can extract this structured data directly, bypassing the need to parse HTML. Set return_json_data: true to get the embedded JSON from any page.

Example JSON-LD Microformat

Extracting JSON-LD from a Page

Here is a practical example: extracting recipe data from a page that embeds JSON-LD structured data. The return_format: "empty" parameter tells Spider to skip the HTML content and only return the extracted JSON.

Scraping JSON from HTML Using API in Python

import requests import os headers = { 'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}', 'Content-Type': 'application/json', } params = { "url": "https://www.allrecipes.com/recipe/223312/nutella-hazelnut-cookies", "return_format": "empty", "return_json_data": True # Return the JSON data embedded in the HTML } response = requests.post( 'https://api.spider.cloud/scrape', headers=headers, json=params ) print(response.json())

Example Response

The JSON will be in the response under the other_scripts array:

Example Response JSON

{ "costs": { "ai_cost": 0, "file_cost": 0.0005, "total_cost": 0.0005 }, "error": null, "json_data": { "other_scripts": [ { "@context": "http://schema.org", "@type": ["Recipe"], "aggregateRating": { "@type": "AggregateRating", "ratingCount": "102", "ratingValue": "4.7" }, "author": [ { "@type": "Person", "name": "Carmella DiNardo" } ], "cookTime": "PT10M", "datePublished": "2020-06-18T23:50:15.000-04:00", "description": "Nutella cookies made with chocolate-hazelnut spread...", "headline": "Nutella Cookies", "name": "Nutella Cookies" } ] } }

Scrape Next.js SSR data embedded in the HTML

Using the same return_json_data parameter, we can also scrape the SSR data on Next.js pages and other similar JS frontend frameworks. Example response for SSR, the JSON object is found in the NEXT_DATA property:

Scraping Next.js SSR Data

[ { "content": null, "costs": { "ai_cost": 0, "bytes_transferred_cost": 0, "compute_cost": 0, "file_cost": 0.0005, "total_cost": 0.0005, "transform_cost": 0 }, "error": null, "json_data": { "NEXT_DATA": { "props": { "pageProps": { "geo": { "_id": "city:ca_san-jose", "area_type": "city", "city": "San Jose", "state_code": "CA", "country": "USA" }, "pageType": "forSale", "page": 1, "properties": [ { "property_id": "1143655170", "list_price": 6498000, "primary_photo": { "href": "https://ap.rdcpix.com/..." } } ] } } } } } ]