Maintaining legacy PHP scrapers is a nightmare that most developers know all too well. If you've spent countless hours patching broken scripts that rely on DOMDocument, preg_match regex patterns, or raw cURL requests to scrape Google's HTML, you understand the frustration. Every time Google tweaks its page layout, your carefully crafted selectors break, leaving you scrambling to update XPath queries and DOM traversal logic.
The modern solution is remarkably simple: stop parsing HTML altogether. Instead of fighting Google's frontend, you send a query to DataBlue and get back clean, structured JSON — both the organic search results and the full content of each result page as ready-to-store markdown. By letting a PHP application consume Google Search results as JSON, you eliminate the fragility of HTML scraping and gain reliable, consistent data structures that won't break with Google's next design update.
This guide walks you through the complete process of running a search with the DataBlue Google SERP API, polling for the results, decoding the JSON response in PHP, and persisting that data into your existing MySQL database using modern best practices — all from PHP 7 or 8 codebases.
Why DataBlue for Legacy PHP
No HTML parsing. You never touch DOMDocument, XPath, or regex again. DataBlue returns structured JSON.
Results and content. Each organic result includes its title, URL, and snippet, plus the full page rendered as clean markdown — perfect for storage, search indexing, or feeding an LLM.
No CAPTCHAs or IP blocks to manage. DataBlue handles proxies, rendering, and anti-bot challenges server-side.
Works with old PHP. Everything here uses the standard cURL extension and PDO — no Composer dependencies, no PHP 8-only syntax required.
How the DataBlue Search API Works
DataBlue's search endpoint is asynchronous. Unlike a single blocking request, the flow has two steps:
Start a job — POST https://api.datablue.dev/v1/search with your query. You immediately get back a job_id.
Poll for results — GET https://api.datablue.dev/v1/search/{job_id} until its status becomes completed, then read the data array.
This design lets DataBlue search Google and scrape every result page in parallel without your script timing out. The PHP below handles both steps end to end.
Prerequisites
Before diving into the implementation, make sure you have:
- PHP 7.x or 8.x installed on your server. This guide is fully compatible with older PHP versions — important for legacy codebases that haven't migrated to the latest releases.
- cURL extension enabled. Most hosts include it by default; confirm with php -m or your phpinfo() output.
- A DataBlue API key. Create one in your DataBlue dashboard at app.datablue.dev under API Keys. Keys start with wh_ and are sent as a Bearer token. New accounts include free credits, so you can test the integration before committing to a paid plan.
Step 1: Starting a Search Job in PHP
The foundation of the integration is the request that kicks off a search. You send a POST request with a JSON body and your API key in the Authorization header. Below is a clean, production-ready example you can copy directly into your codebase:
<?php$apiKey = 'wh_YOUR_API_KEY_HERE';$query = 'web scraping tools';$payload = [ 'query' => $query, 'num_results' => 10, // how many results to fetch (default: 5) 'engine' => 'google', // "google", "duckduckgo", or "brave" 'formats' => ['markdown'], // return each page as clean markdown];$ch = curl_init('https://api.datablue.dev/v1/search');curl_setopt($ch, CURLOPT_POST, true);curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);curl_setopt($ch, CURLOPT_TIMEOUT, 30);curl_setopt($ch, CURLOPT_HTTPHEADER, [ 'Authorization: Bearer ' . $apiKey, 'Content-Type: application/json',]);curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload));$response = curl_exec($ch);$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);curl_close($ch);if ($httpCode !== 200) { die("Failed to start search (HTTP $httpCode): $response");}$start = json_decode($response, true);$jobId = $start['job_id'] ?? null;if (!$jobId) { die('No job_id returned: ' . $response);}echo "Search job started: $jobId\n";?>Two settings matter most here. CURLOPT_RETURNTRANSFER set to true captures the API response into a variable instead of printing it, giving you full control over processing. The Authorization: Bearer header authenticates the request — DataBlue uses a header rather than a key in the URL, which keeps your credentials out of access logs and referrer strings.
Step 2: Polling for the Results
Because the search runs asynchronously, you poll the job until DataBlue reports it as completed. A simple loop with a short sleep and a maximum number of attempts keeps this safe and predictable:
<?phpfunction fetchSearchResults($apiKey, $jobId, $maxAttempts = 30, $delaySeconds = 2){ for ($attempt = 0; $attempt < $maxAttempts; $attempt++) { $ch = curl_init("https://api.datablue.dev/v1/search/$jobId"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_TIMEOUT, 30); curl_setopt($ch, CURLOPT_HTTPHEADER, [ 'Authorization: Bearer ' . $apiKey, ]); $response = curl_exec($ch); curl_close($ch); $data = json_decode($response, true); if (json_last_error() !== JSON_ERROR_NONE) { die('JSON decode error: ' . json_last_error_msg()); } $status = $data['status'] ?? 'unknown'; echo "Status: $status ({$data['completed_results']}/{$data['total_results']})\n"; if ($status === 'completed') { return $data; } if ($status === 'failed') { die('Search failed: ' . ($data['error'] ?? 'unknown error')); } sleep($delaySeconds); // still pending/running — wait and retry } die('Timed out waiting for search to complete.');}$results = fetchSearchResults($apiKey, $jobId);?>The loop checks status on each pass — pending and running mean keep waiting, completed returns the payload, and failed surfaces the error immediately. Always validate with json_last_error() before accessing array elements; this prevents fatal errors when the API returns an unexpected response or the network hiccups.
Step 3: Extracting the SERP Data
This is where structured JSON shines. Unlike HTML parsing — where you constantly fight changing class names and nested divs — DataBlue gives you consistent, predictable paths to every data point. The completed response contains a top-level data array, with one object per organic result.
Each result object includes:
title — the result's titleurl — the result's linksnippet — the search-engine snippetmarkdown — the full page content as clean markdown (when markdown is requested in formats)metadata — page details such as status_code, word_count, language, and descriptionHere's how to loop through and extract them:
<?phpforeach ($results['data'] as $index => $result) { $position = $index + 1; $title = $result['title'] ?? 'No title'; $url = $result['url'] ?? ''; $snippet = $result['snippet'] ?? ''; $content = $result['markdown'] ?? ''; $words = $result['metadata']['word_count'] ?? 0; echo "Position $position: $title\n"; echo "URL: $url\n"; echo "Snippet: $snippet\n"; echo "Content: $words words of markdown captured\n\n";}?>Beyond Links: Capturing Page Content
The markdown field is what sets DataBlue apart from a plain SERP scraper. Instead of returning only a list of links you'd have to fetch and parse separately, DataBlue scrapes each result page for you and hands back the cleaned, readable content. That single field powers content indexing, change monitoring, summarization, and LLM pipelines — with zero additional requests and no HTML cleanup on your side.
If a particular page couldn't be scraped, its result object sets success to false and includes an error message, while still giving you the title, url, and snippet from the search results. Guard for it before relying on content:
<?phpforeach ($results['data'] as $result) { if (empty($result['success'])) { echo "Skipped {$result['url']}: {$result['error']}\n"; continue; } // safe to use $result['markdown'] here}?>Step 4: Integrating with a Legacy Database
Most legacy applications need to persist SERP data for historical tracking, reporting, or competitive analysis. Using PDO (PHP Data Objects) with prepared statements keeps your database operations both secure and compatible with modern PHP standards:
<?phptry { $pdo = new PDO('mysql:host=localhost;dbname=your_database', 'username', 'password'); $pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION); $stmt = $pdo->prepare( "INSERT INTO search_results (keyword, position, title, url, snippet, content, word_count, created_at) VALUES (?, ?, ?, ?, ?, ?, ?, NOW())" ); foreach ($results['data'] as $index => $result) { $stmt->execute([ $query, $index + 1, $result['title'] ?? '', $result['url'] ?? '', $result['snippet'] ?? '', $result['markdown'] ?? '', $result['metadata']['word_count'] ?? 0, ]); } echo "Stored " . count($results['data']) . " results.\n";} catch (PDOException $e) { die('Database error: ' . $e->getMessage());}?>A matching table definition:
CREATE TABLE search_results ( id INT AUTO_INCREMENT PRIMARY KEY, keyword VARCHAR(255) NOT NULL, position INT NOT NULL, title VARCHAR(512), url TEXT, snippet TEXT, content LONGTEXT, word_count INT DEFAULT 0, created_at DATETIME NOT NULL);Prepared statements protect against SQL injection while remaining compatible with the older MySQL versions common in legacy environments. Because DataBlue already returns the page content as markdown, you can store it directly in a LONGTEXT column — no extra scraping step, no HTML stripping.
Conclusion
By letting a PHP application consume Google Search results as JSON through DataBlue, you've eliminated the endless cycle of fixing broken HTML scrapers. DataBlue's JSON responses provide stable, well-documented data structures that stay consistent regardless of Google's frontend changes — saving you countless hours of maintenance and debugging.
The steps in this guide give you everything needed to modernize a legacy PHP application:
- Starting an asynchronous search job with a single authenticated cURL request
- Polling the job safely until the results are ready
- Extracting organic results and full page content from the data array
- Storing everything in MySQL with secure prepared statements
You now have a production-ready foundation for SEO tools, competitive-analysis dashboards, content-monitoring jobs, or LLM data pipelines — without worrying about CAPTCHAs, IP blocks, or parsing failures.
Stop fighting CAPTCHAs and HTML layout changes. Grab your free API key at app.datablue.dev and run the PHP script you just learned. Within minutes you'll have clean, structured search data — and the content behind it — flowing into your applications, freeing you to build features instead of maintaining fragile scrapers.


