Installation

The official datablue Python SDK provides both synchronous and asynchronous clients for the DataBlue API. Built on httpx and Pydantic v2, it offers full type safety, automatic retries with exponential backoff, and strongly-typed response models.

Requirements

Dependency	Version
Python	>= 3.10
httpx	>= 0.27.0
pydantic	>= 2.0.0

Install from PyPI

pip install datablue

Or with a specific version:

pip install datablue==2.0.0

Install with Poetry / uv

# Poetry
poetry add datablue

# uv
uv add datablue

Verify Installation

python -c "import datablue; print(datablue.__version__)"
# 2.0.0

Quick Start

from datablue import DataBlue

with DataBlue(api_key="wh_your_api_key") as client:
    result = client.scrape("https://example.com")
    print(result.data.markdown)

Async support: Every method available on DataBlue (sync) is also available on AsyncDataBlue (async) with the same signature. Use await and async with for the async variant.

Using with AI Assistants

The v2.0.0 SDK is designed to be AI-first. It ships with two machine-readable reference files that AI coding assistants (Claude Code, Cursor, GitHub Copilot, etc.) can read for accurate code generation:

CLAUDE.md — A structured quick-reference file at the SDK root (sdk/CLAUDE.md) containing all method signatures, response models, error types, and common patterns. AI assistants automatically read this file for context.
llms.txt — A standardized machine-readable documentation file following the llms.txt convention. Provides a condensed API surface for LLM consumption.

Tip: When using an AI coding assistant with the DataBlue SDK, point it at the sdk/ directory. The CLAUDE.md file contains every method signature, typed model, and error type with copy-pasteable examples. This eliminates hallucinated API calls and ensures the AI generates code that actually works.

SDK Features

Sync + Async clients — DataBlue for synchronous code, AsyncDataBlue for asyncio/FastAPI/Django
Pydantic v2 response models — every response is a typed dataclass with autocomplete and validation
Automatic retries — exponential backoff on 429 and 5xx errors, configurable max retries
Context manager support — with / async with for clean resource management
Job polling built-in — crawl/search blocking methods poll automatically with configurable timeout
Batch scraping — concurrent scraping with semaphore-based concurrency control and streaming results
Typed error hierarchy — catch specific errors like RateLimitError, AuthenticationError, etc.
Environment variable config — zero-config setup with DataBlue.from_env()