Cloudflare is accusing the AI search engine Perplexity of covertly crawling websites, even when site owners have explicitly forbidden access through robots.txt files or firewall rules. According to the company, Perplexity disguises its identity to sidestep restrictions and is violating established internet norms.
In a recent blog post, Cloudflare claims that Perplexity switches to stealth crawling tactics whenever its official bot is blocked. The company says it has removed Perplexity from its verified bots list and taken steps to block what it calls "stealth crawling."
Cloudflare's investigation began after customers reported that Perplexity continued to access their content despite explicit blocks in robots.txt and custom firewall rules.
Cloudflare exposes evasion tactics
To verify the allegations, Cloudflare ran its own experiment. The team set up new, unlisted domains and blocked all bots in the robots.txt file. They also used firewall rules to block Perplexity's declared crawlers, "PerplexityBot" and "Perplexity-User."
Despite these measures, Perplexity was still able to provide detailed information about the restricted domains, Cloudflare says. The tests showed that Perplexity either ignored robots.txt or didn't check it at all.
Disguised crawlers with rotating identities
Cloudflare's findings point to a two-step process. First, Perplexity attempts to access content with its declared user agent. If blocked, a second, undeclared crawler takes over, using a generic user agent that mimics Google Chrome on macOS.
This disguised crawler not only uses undeclared IP addresses but also rotates both IPs and ASNs (Autonomous System Numbers) to bypass blocks. An ASN is a unique identifier assigned to a network operator, representing a larger network that manages its own routing. By switching ASNs, the crawler can appear to originate from entirely different networks.
Cloudflare observed this activity across tens of thousands of domains, with millions of requests each day. When even the disguised crawler was blocked, Perplexity returned only vague answers, confirming that the block was effective.
OpenAI as a positive example
Cloudflare contrasts Perplexity's behavior with that of more transparent crawlers that respect site owners' rules. OpenAI is highlighted as a positive example: the company clearly declares its crawlers and their purpose, and it honors both robots.txt directives and network blocks.
In the same test, ChatGPT stopped crawling after reading the robots.txt file and did not attempt to access content using alternate user agents.
Cloudflare rolls out new protections
In response, Cloudflare has added the disguised crawler's signatures to its managed rules for blocking AI crawlers, which are available to all customers, including those on free plans. Customers already using bot management rules to block or challenge requests are already protected.
Cloudflare expects bot operators' tactics to keep evolving and says it is working with experts to standardize crawler behavior, including proposed updates for robots.txt.