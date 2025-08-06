AI search startup Perplexity is facing strong criticism after Cloudflare, which is a web infrastructure company, published a blog post accusing it of bypassing site restrictions to collect web data.

This comes shortly after Perplexity launched its Comet browser, which focuses on agent-based internet browsing powered by AI.



In a blog post published by Cloudflare, the company claims that Perplexity has been scraping websites that requested not to be analysed by bots. This scraping reportedly continued even after site owners blocked access through standard rules and tools.

Perplexity accused of disguising its bot activity

Cloudflare conducted its own tests to investigate these claims. According to their blog, the company created new domains that explicitly requested not to be scraped. They then asked Perplexity AI questions about these domains. Surprisingly, Perplexity was still able to return information about the pages, despite access being denied.

Cloudflare said that Perplexity may be changing the "user agent" of its crawler. Which means that the bot pretends to be a normal web visitor, rather than an AI tool, allowing it to access sites that have blocked AI crawlers.

“This unusual activity was observed across tens of thousands of domains and millions of requests per day,” as mentioned by Cloudflare. They added that it used machine learning and network signals to identify the scraping pattern.

Perplexity responds to the allegations

In response, Jesse Dwyer, who is a spokesperson for Perplexity, told TechCrunch that the blog post was more of a marketing attempt by Cloudflare rather than an objective report. Dwyer said, “The screenshots in the blog show that no content was accessed.” She also claimed that the crawler named in the post “isn’t even ours”. Perplexity has denied any wrongdoing.

The bigger debate: How AI gathers information

AI tools like Perplexity, Claude, and ChatGPT rely on vast amounts of web content to train and answer user queries. These tools often scan forums, articles, and blogs to improve performance.



This is done to improve the performance of the both and the agentic ai

However, concerns are growing around data transparency and the ethics of content scraping. Some AI companies, such as OpenAI and Anthropic, have started offering opt-out options for websites. There is now increasing pressure on the AI industry to adopt clear guidelines on data collection.