Installation
The official datablue Python SDK provides both synchronous and asynchronous clients for the DataBlue API. Built on httpx and Pydantic v2, it offers full type safety, automatic retries with exponential backoff, and strongly-typed response models.
Requirements
| Dependency | Version |
|---|---|
| Python | >= 3.10 |
| httpx | >= 0.27.0 |
| pydantic | >= 2.0.0 |
Install from PyPI
pip install datablue
Or with a specific version:
pip install datablue==2.0.0
Install with Poetry / uv
# Poetry
poetry add datablue
# uv
uv add datablue
Verify Installation
python -c "import datablue; print(datablue.__version__)"
# 2.0.0
Quick Start
from datablue import DataBlue
with DataBlue(api_key="wh_your_api_key") as client:
result = client.scrape("https://example.com")
print(result.data.markdown)
Async support: Every method available on DataBlue (sync) is also available on AsyncDataBlue (async) with the same signature. Use await and async with for the async variant.
Using with AI Assistants
The v2.0.0 SDK is designed to be AI-first. It ships with two machine-readable reference files that AI coding assistants (Claude Code, Cursor, GitHub Copilot, etc.) can read for accurate code generation:
- CLAUDE.md — A structured quick-reference file at the SDK root (
sdk/CLAUDE.md) containing all method signatures, response models, error types, and common patterns. AI assistants automatically read this file for context. - llms.txt — A standardized machine-readable documentation file following the
llms.txtconvention. Provides a condensed API surface for LLM consumption.
Tip: When using an AI coding assistant with the DataBlue SDK, point it at the sdk/ directory. The CLAUDE.md file contains every method signature, typed model, and error type with copy-pasteable examples. This eliminates hallucinated API calls and ensures the AI generates code that actually works.
SDK Features
- Sync + Async clients —
DataBluefor synchronous code,AsyncDataBluefor asyncio/FastAPI/Django - Pydantic v2 response models — every response is a typed dataclass with autocomplete and validation
- Automatic retries — exponential backoff on 429 and 5xx errors, configurable max retries
- Context manager support —
with/async withfor clean resource management - Job polling built-in — crawl/search blocking methods poll automatically with configurable timeout
- Batch scraping — concurrent scraping with semaphore-based concurrency control and streaming results
- Typed error hierarchy — catch specific errors like
RateLimitError,AuthenticationError, etc. - Environment variable config — zero-config setup with
DataBlue.from_env()