Cloudflare data underscores how Google's combined search and AI crawling gives it a massive data advantage over OpenAI and Anthropic.
Cloudflare CEO Matthew Prince argues that Google benefits from an unusually privileged level of access to the web, driven by the way it links its search crawler with its AI data collection systems.
Prince says internal Cloudflare measurements show that Google currently sees 3.2 times more pages than OpenAI. The gap widens even further with other competitors: Google captures 4.6 times more content than Microsoft and 4.8 times more than Anthropic or Meta. According to Prince, this imbalance stems from Google's decision to bundle its search crawler with its AI crawler. Site owners cannot block AI training without also disappearing from Google Search, creating a dilemma that effectively gives Google exclusive access to vast amounts of data.
Prince frames this as a misuse of long-standing market dominance, suggesting that Google's behavior lets it extend its historical monopoly into the emerging AI landscape.
How search lock-in limits publishers' ability to block AI scraping
The scale of the imbalance becomes clearer when looking at how aggressively site owners are trying to push back. Since July 1, Cloudflare has already blocked 416 billion AI requests for its customers. These blocks mainly affect companies that follow standards or identify their crawlers separately. Google, however, avoids that barrier through the tight coupling of its search and AI systems.
Publishers face a binary choice: allow their content to be used to train Google's AI models or lose visibility in Search, a trade-off that could be financially ruinous for many.
Prince told WIRED that Google is the central obstacle to progress unless pressured or persuaded to separate its search and AI crawlers. Without that split, publishers have almost no practical way to protect their content or negotiate licensing models that will be critical in the era of generative AI.