Introduction
DataBlue is a self-hosted web scraping and structured data API platform. It provides a Firecrawl-compatible REST API for scraping, crawling, searching, and extracting content from any website, plus 22+ Data APIs for Google, YouTube, Twitter, and Reddit.
Key Features
- 5-tier parallel scraping engine — HTTP, stealth browser, Playwright, headless Chrome, and anti-bot bypass run in a staggered race. First valid result wins. Strategy cache remembers winning strategy per domain.
- Self-learning strategy cache — Domains that require stealth are automatically detected and upgraded to hard mode on subsequent requests. No manual configuration needed.
- 22+ structured Data APIs — Google Search, Maps, News, Finance. YouTube channels, videos, comments. Twitter profiles, tweets. Reddit subreddits, posts, users. All return clean JSON.
- 100% self-hosted — Deploy on your own infrastructure with Docker Compose. No third-party SaaS dependencies. Your data never leaves your servers.
- Firecrawl-compatible API — Drop-in replacement for Firecrawl. Same endpoint paths, same request/response shapes. Migrate existing integrations with zero code changes.
Search
Search the web with Google, DuckDuckGo, or Brave and get scraped content from each result page. Returns markdown, HTML, links, screenshots, and structured data.
Scrape
Scrape any URL with automatic anti-bot bypass. Returns markdown, HTML, raw HTML, links, screenshots, headings, images, structured data, and LLM-extracted fields.
Extract
Extract structured data from any page using natural language prompts and JSON Schema. Powered by LLMs (OpenAI, Anthropic, Groq). Returns typed JSON matching your schema.
Data APIs
22+ structured data endpoints for Google (Search, Maps, News, Finance), YouTube (channels, videos, comments), Twitter (profiles, tweets), and Reddit (subreddits, posts, users).
Make Your First Request
curl -X POST "https://api.datablue.dev/v1/scrape" \
-H "Authorization: Bearer wh_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"formats": ["markdown", "links"]
}'