POST/v1/crawl

Crawl

Start a crawl job that discovers and scrapes pages starting from a seed URL using BFS, DFS, or best-first strategy. Returns a job ID for tracking progress via SSE or polling.

Target Latency

1.2s - 4.5s

Credits

1 cr/req

Parameters

Name	Type	Requirement	Description
url	string	REQUIRED	The starting URL to crawl.
max_pages	number	optional	Maximum pages to crawl (1-1000).
max_depth	number	optional	Maximum link depth from the starting URL (1-10).
concurrency	number	optional	Number of parallel scrape workers (1-10).
crawl_strategy	string	optional	Strategy: "bfs" (breadth-first), "dfs" (depth-first), or "bff" (best-first).
allow_external_links	boolean	optional	Follow links to external domains.
respect_robots_txt	boolean	optional	Obey the target site's robots.txt rules.
include_paths	string[]	optional	Only crawl URLs matching these glob path patterns.
exclude_paths	string[]	optional	Skip URLs matching these glob path patterns.
scrape_options	object	optional	Options passed to each page scrape: { formats, only_main_content, wait_for, timeout, include_tags, exclude_tags, mobile, extract }.
use_proxy	boolean	optional	Route all requests through a configured proxy.
filter_faceted_urls	boolean	optional	Deduplicate faceted/navigation URL variations.
webhook_url	string	optional	Webhook URL for crawl completion notification.
webhook_secret	string	optional	HMAC secret for webhook signature verification.

cURL Example

curl -X POST "https://api.datablue.dev/v1/crawl" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://docs.python.org/3/",
  "max_pages": 50,
  "max_depth": 2,
  "crawl_strategy": "bfs",
  "include_paths": [
    "/3/library/*"
  ]
}'

System Responses

200 OK

Request processed successfully.

401 UNAUTHORIZED

Missing or invalid API key.

429 RATE LIMIT

System capacity exceeded.

500 SYSTEM FAILURE

Internal core exception.

EXAMPLE RESPONSE

{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "started",
  "message": "Crawl job started"
}