Cloudflare accuses Perplexity of 'stealth crawling' blocked sites
Cloudflare Levels “Stealth Crawling” Accusations Against AI Search Engine Perplexity
The digital battleground over how artificial intelligence companies access and utilize online content has dramatically escalated, with internet infrastructure giant Cloudflare publicly accusing AI-powered answer engine Perplexity of engaging in “stealth crawling” to circumvent website access restrictions. This dispute highlights a fundamental tension between content creators’ desire to control their intellectual property and AI firms’ hunger for data.
Cloudflare’s allegations, detailed in a recent blog post, assert that Perplexity has been using deceptive tactics to access content from websites that have explicitly blocked its crawlers. The company claims that when Perplexity’s declared bots, “PerplexityBot” and “Perplexity-User,” are met with network blocks or robots.txt
directives—the standard protocol for instructing web crawlers what not to access—the AI firm’s systems allegedly obscure their identity. This involves modifying user agents to impersonate generic browsers like Google Chrome on macOS, rotating IP addresses, and changing Autonomous System Numbers (ASNs) to evade detection. Cloudflare reported observing millions of daily requests from these “stealth agents” attempting to bypass standard anti-bot protections.
The accusations stem from complaints by Cloudflare customers who found Perplexity still accessing their content despite implementing both robots.txt
rules and Web Application Firewall (WAF) blocks. To verify these claims, Cloudflare conducted controlled tests on newly created, unindexed domains with stringent crawling prohibitions. Despite these explicit blocks, Perplexity was reportedly able to retrieve and summarize content from these restricted sites, indicating a deliberate circumvention of established web protocols. Cloudflare emphasized that the internet is built on trust, and legitimate crawlers are expected to be transparent and adhere to website directives. In response to the observed behavior, Cloudflare has de-listed Perplexity as a verified bot and updated its managed rules to block this stealth crawling activity. This move aligns with Cloudflare’s broader “Content Independence Day” initiative, launched in July, which aims to empower publishers with greater control over AI crawlers, including options to block access or even charge for content scraping.
Perplexity, however, has vehemently denied Cloudflare’s accusations, dismissing the report as an “embarrassing” and “disqualifying” “sales pitch”. The AI company contends that Cloudflare fundamentally misunderstands the nature of modern AI assistants, arguing that their system does not engage in mass-scale, indiscriminate crawling like traditional search engines. Instead, Perplexity asserts that its platform fetches web pages “on demand” in response to specific user questions, acting as a user-initiated agent rather than an autonomous bot. Perplexity claims that Cloudflare’s systems are inadequate for distinguishing between legitimate AI assistants and malicious scraping, leading to the misclassification of responsible, user-driven traffic. The company also disputes that the specific “hidden user agent” identified by Cloudflare belongs to them or that it accessed any content.
This clash underscores the escalating tensions between AI developers and content creators over data acquisition and intellectual property rights. Perplexity has faced similar allegations of unethical web scraping and content use in the past, including threats of legal action from entities like the BBC and accusations of plagiarism from publications such as Wired and Forbes. As AI models continue to evolve and become more deeply integrated into how users access information, the debate over fair compensation, transparent data practices, and the very definition of “web crawling” is set to intensify, potentially reshaping the foundational rules of the open internet.