Webhooks

Webhooks allow you to receive real-time notifications when jobs complete instead of polling. Pass webhook_url and optionally webhook_secret when starting any async job (crawl, search, extract).

Webhook Events

Event Description
scrape.completedA scrape job has finished (success or failure)
crawl.completedA crawl job has finished all pages
crawl.pageA single page within a crawl has been scraped
search.completedA search job has finished all results
extract.completedAn extraction job has finished

Payload Format

POST https://your-server.com/webhook

Headers:
  Content-Type: application/json
  X-Webhook-Signature: sha256=a1b2c3d4e5f6...
  X-Webhook-Event: crawl.completed

Body:
{
  "event": "crawl.completed",
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "timestamp": "2026-04-03T12:00:00Z",
  "data": {
    "total_pages": 47,
    "completed_pages": 47,
    "url": "https://docs.python.org"
  }
}

Signature Verification

If you provide a webhook_secret, DataBlue signs each payload with HMAC-SHA256. The signature is sent in the X-Webhook-Signature header.

import hmac
import hashlib

def verify_webhook(payload: bytes, signature: str, secret: str) -> bool:
    """Verify the HMAC-SHA256 signature from DataBlue webhooks."""
    expected = "sha256=" + hmac.new(
        secret.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

# Usage in a Flask/FastAPI handler:
@app.post("/webhook")
async def handle_webhook(request: Request):
    body = await request.body()
    signature = request.headers.get("X-Webhook-Signature", "")
    if not verify_webhook(body, signature, "your_webhook_secret"):
        raise HTTPException(status_code=401, detail="Invalid signature")
    data = await request.json()
    print(f"Event: {data['event']}, Job: {data['job_id']}")

Retry Policy

  • 3 attempts total (1 initial + 2 retries)
  • Exponential backoff: 10s, 60s, 300s between retries
  • Retries triggered on: connection errors, 5xx responses, timeouts
  • Successful delivery requires a 2xx response within 30 seconds