JSON Scraping
Many websites embed structured data directly in their HTML — product details in JSON-LD, page data in Next.js __NEXT_DATA__ scripts, or API payloads in SSR hydration. Spider can extract this structured data directly, bypassing the need to parse HTML. Set return_json_data: true to get the embedded JSON from any page.
Example JSON-LD Microformat
Extracting JSON-LD from a Page
Here is a practical example: extracting recipe data from a page that embeds JSON-LD structured data. The return_format: "empty" parameter tells Spider to skip the HTML content and only return the extracted JSON.
Scraping JSON from HTML Using API in Python
import requests
import os
headers = {
'Authorization': f'Bearer {os.getenv("SPIDER_API_KEY")}',
'Content-Type': 'application/json',
}
params = {
"url": "https://www.allrecipes.com/recipe/223312/nutella-hazelnut-cookies",
"return_format": "empty",
"return_json_data": True # Return the JSON data embedded in the HTML
}
response = requests.post(
'https://api.spider.cloud/scrape',
headers=headers,
json=params
)
print(response.json())Example Response
The JSON will be in the response under the other_scripts array:
Example Response JSON
{
"costs": {
"ai_cost": 0,
"file_cost": 0.0005,
"total_cost": 0.0005
},
"error": null,
"json_data": {
"other_scripts": [
{
"@context": "http://schema.org",
"@type": ["Recipe"],
"aggregateRating": {
"@type": "AggregateRating",
"ratingCount": "102",
"ratingValue": "4.7"
},
"author": [
{
"@type": "Person",
"name": "Carmella DiNardo"
}
],
"cookTime": "PT10M",
"datePublished": "2020-06-18T23:50:15.000-04:00",
"description": "Nutella cookies made with chocolate-hazelnut spread...",
"headline": "Nutella Cookies",
"name": "Nutella Cookies"
}
]
}
}Scrape Next.js SSR data embedded in the HTML
Using the same return_json_data parameter, we can also scrape the SSR data on Next.js pages and other similar JS frontend frameworks. Example response for SSR, the JSON object is found in the NEXT_DATA property:
Scraping Next.js SSR Data
[
{
"content": null,
"costs": {
"ai_cost": 0,
"bytes_transferred_cost": 0,
"compute_cost": 0,
"file_cost": 0.0005,
"total_cost": 0.0005,
"transform_cost": 0
},
"error": null,
"json_data": {
"NEXT_DATA": {
"props": {
"pageProps": {
"geo": {
"_id": "city:ca_san-jose",
"area_type": "city",
"city": "San Jose",
"state_code": "CA",
"country": "USA"
},
"pageType": "forSale",
"page": 1,
"properties": [
{
"property_id": "1143655170",
"list_price": 6498000,
"primary_photo": {
"href": "https://ap.rdcpix.com/..."
}
}
]
}
}
}
}
}
]