Google gathers triple OpenAI's AI data through its search monopoly
Key Points
- Cloudflare data shows Google accesses 3–5 times more web content than AI rivals like OpenAI, Anthropic, and Microsoft by linking its search and AI crawlers.
- Website owners cannot block Google’s AI data collection without also disappearing from search results, forcing them to choose between visibility and control over their content.
- Cloudflare’s CEO argues this practice entrenches Google’s dominance and leaves publishers unable to negotiate fair terms for AI training unless Google separates its crawlers.
Cloudflare data underscores how Google's combined search and AI crawling gives it a massive data advantage over OpenAI and Anthropic.
Cloudflare CEO Matthew Prince argues that Google benefits from an unusually privileged level of access to the web, driven by the way it links its search crawler with its AI data collection systems.
Prince says internal Cloudflare measurements show that Google currently sees 3.2 times more pages than OpenAI. The gap widens even further with other competitors: Google captures 4.6 times more content than Microsoft and 4.8 times more than Anthropic or Meta. According to Prince, this imbalance stems from Google's decision to bundle its search crawler with its AI crawler. Site owners cannot block AI training without also disappearing from Google Search, creating a dilemma that effectively gives Google exclusive access to vast amounts of data.
Prince frames this as a misuse of long-standing market dominance, suggesting that Google's behavior lets it extend its historical monopoly into the emerging AI landscape.
How search lock-in limits publishers' ability to block AI scraping
The scale of the imbalance becomes clearer when looking at how aggressively site owners are trying to push back. Since July 1, Cloudflare has already blocked 416 billion AI requests for its customers. These blocks mainly affect companies that follow standards or identify their crawlers separately. Google, however, avoids that barrier through the tight coupling of its search and AI systems.
Publishers face a binary choice: allow their content to be used to train Google's AI models or lose visibility in Search, a trade-off that could be financially ruinous for many.
Prince told WIRED that Google is the central obstacle to progress unless pressured or persuaded to separate its search and AI crawlers. Without that split, publishers have almost no practical way to protect their content or negotiate licensing models that will be critical in the era of generative AI.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now