POST/v1/scrape
Scrape
Scrape a single URL and return the content in your desired format. Uses a 5-tier parallel scraping engine with automatic strategy selection and domain-level strategy caching.
Target Latency
1.2s - 4.5s
Credits
1 cr/req
Parameters
| Name | Type | Requirement | Description |
|---|---|---|---|
| url | string | REQUIRED | The URL to scrape. Protocol is auto-prepended if missing. |
| formats | string[] | optional | Output formats: "markdown", "html", "raw_html", "links", "screenshot", "structured_data", "headings", "images". |
| only_main_content | boolean | optional | Extract only the main content, removing navs, footers, sidebars. |
| mobile | boolean | optional | Emulate a mobile device viewport. |
| mobile_device | string | optional | Device preset name: "iphone_14", "pixel_7", "ipad_pro". |
| timeout | number | optional | Request timeout in milliseconds. |
| wait_for | number | optional | Wait this many ms after page load before extracting. |
| css_selector | string | optional | Only extract content matching this CSS selector. |
| xpath | string | optional | XPath expression for targeted extraction. |
| include_tags | string[] | optional | Only include these HTML tags in extraction. |
| exclude_tags | string[] | optional | Exclude these HTML tags from extraction. |
| use_proxy | boolean | optional | Route request through a configured proxy for anti-bot bypass. |
| headers | object | optional | Custom HTTP headers to send (e.g. { "Cookie": "session=abc" }). |
| cookies | object | optional | Custom cookies to send as name/value pairs. |
| actions | ActionStep[] | optional | Browser actions to perform before extraction: click, wait, scroll, type, screenshot, hover, press, select, fill_form, evaluate. |
| extract | object | optional | LLM extraction config: { prompt: string, schema: JSONSchema }. |
| webhook_url | string | optional | Webhook URL for job completion notification. |
| webhook_secret | string | optional | HMAC secret for webhook signature verification. |
| capture_network | boolean | optional | Capture browser network requests/responses. |
cURL Example
curl -X POST "https://api.datablue.dev/v1/scrape" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"formats": [
"markdown",
"links"
],
"only_main_content": true
}'System Responses
200 OK
Request processed successfully.
401 UNAUTHORIZED
Missing or invalid API key.
429 RATE LIMIT
System capacity exceeded.
500 SYSTEM FAILURE
Internal core exception.
EXAMPLE RESPONSE
{
"success": true,
"data": {
"markdown": "# Hacker News\n\n1. Show HN: I built an open-source web scraper with strategy caching\n2. Why Rust is eating the world...",
"links": [
"https://news.ycombinator.com/item?id=39912345",
"https://news.ycombinator.com/item?id=39912346",
"https://news.ycombinator.com/newest"
],
"metadata": {
"title": "Hacker News",
"description": null,
"language": "en",
"source_url": "https://news.ycombinator.com",
"status_code": 200,
"word_count": 1847,
"reading_time_seconds": 7,
"content_length": 12340
}
}
}