Content
summary Summary

Cloudflare data underscores how Google's combined search and AI crawling gives it a massive data advantage over OpenAI and Anthropic.

Ad

Cloudflare CEO Matthew Prince argues that Google benefits from an unusually privileged level of access to the web, driven by the way it links its search crawler with its AI data collection systems.

Prince says internal Cloudflare measurements show that Google currently sees 3.2 times more pages than OpenAI. The gap widens even further with other competitors: Google captures 4.6 times more content than Microsoft and 4.8 times more than Anthropic or Meta. According to Prince, this imbalance stems from Google's decision to bundle its search crawler with its AI crawler. Site owners cannot block AI training without also disappearing from Google Search, creating a dilemma that effectively gives Google exclusive access to vast amounts of data.

Prince frames this as a misuse of long-standing market dominance, suggesting that Google's behavior lets it extend its historical monopoly into the emerging AI landscape.

Ad
Ad

How search lock-in limits publishers' ability to block AI scraping

The scale of the imbalance becomes clearer when looking at how aggressively site owners are trying to push back. Since July 1, Cloudflare has already blocked 416 billion AI requests for its customers. These blocks mainly affect companies that follow standards or identify their crawlers separately. Google, however, avoids that barrier through the tight coupling of its search and AI systems.

Publishers face a binary choice: allow their content to be used to train Google's AI models or lose visibility in Search, a trade-off that could be financially ruinous for many.

Prince told WIRED that Google is the central obstacle to progress unless pressured or persuaded to separate its search and AI crawlers. Without that split, publishers have almost no practical way to protect their content or negotiate licensing models that will be critical in the era of generative AI.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Cloudflare data shows Google accesses 3–5 times more web content than AI rivals like OpenAI, Anthropic, and Microsoft by linking its search and AI crawlers.
  • Website owners cannot block Google’s AI data collection without also disappearing from search results, forcing them to choose between visibility and control over their content.
  • Cloudflare’s CEO argues this practice entrenches Google’s dominance and leaves publishers unable to negotiate fair terms for AI training unless Google separates its crawlers.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.