POST/v1/scrape

Scrape

Scrape a single URL and return the content in your desired format. Uses a 5-tier parallel scraping engine with automatic strategy selection and domain-level strategy caching.

Target Latency

1.2s - 4.5s

Credits

1 cr/req

Parameters

Name	Type	Requirement	Description
url	string	REQUIRED	The URL to scrape. Protocol is auto-prepended if missing.
formats	string[]	optional	Output formats: "markdown", "html", "raw_html", "links", "screenshot", "structured_data", "headings", "images".
only_main_content	boolean	optional	Extract only the main content, removing navs, footers, sidebars.
mobile	boolean	optional	Emulate a mobile device viewport.
mobile_device	string	optional	Device preset name: "iphone_14", "pixel_7", "ipad_pro".
timeout	number	optional	Request timeout in milliseconds.
wait_for	number	optional	Wait this many ms after page load before extracting.
css_selector	string	optional	Only extract content matching this CSS selector.
xpath	string	optional	XPath expression for targeted extraction.
include_tags	string[]	optional	Only include these HTML tags in extraction.
exclude_tags	string[]	optional	Exclude these HTML tags from extraction.
use_proxy	boolean	optional	Route request through a configured proxy for anti-bot bypass.
headers	object	optional	Custom HTTP headers to send (e.g. { "Cookie": "session=abc" }).
cookies	object	optional	Custom cookies to send as name/value pairs.
actions	ActionStep[]	optional	Browser actions to perform before extraction: click, wait, scroll, type, screenshot, hover, press, select, fill_form, evaluate.
extract	object	optional	LLM extraction config: { prompt: string, schema: JSONSchema }.
webhook_url	string	optional	Webhook URL for job completion notification.
webhook_secret	string	optional	HMAC secret for webhook signature verification.
capture_network	boolean	optional	Capture browser network requests/responses.

cURL Example

curl -X POST "https://api.datablue.dev/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://news.ycombinator.com",
  "formats": [
    "markdown",
    "links"
  ],
  "only_main_content": true
}'

System Responses

200 OK

Request processed successfully.

401 UNAUTHORIZED

Missing or invalid API key.

429 RATE LIMIT

System capacity exceeded.

500 SYSTEM FAILURE

Internal core exception.

EXAMPLE RESPONSE

{
  "success": true,
  "data": {
    "markdown": "# Hacker News\n\n1. Show HN: I built an open-source web scraper with strategy caching\n2. Why Rust is eating the world...",
    "links": [
      "https://news.ycombinator.com/item?id=39912345",
      "https://news.ycombinator.com/item?id=39912346",
      "https://news.ycombinator.com/newest"
    ],
    "metadata": {
      "title": "Hacker News",
      "description": null,
      "language": "en",
      "source_url": "https://news.ycombinator.com",
      "status_code": 200,
      "word_count": 1847,
      "reading_time_seconds": 7,
      "content_length": 12340
    }
  }
}