An Internet Built For AI
A few weeks ago, we published a piece called An Internet Built For AI. In it, we unpacked the ongoing evolution of the internet from an open knowledge commons to a privatized, AI-intermediated information ecosystem. At the end of the piece, we came to this conclusion:
“[The open web will become] a training ground for AI companies, increasingly synthetic and AI-generated, less economically viable for human creators, and accessible only through AI intermediaries for most users. The open web is becoming a raw material source, driving a shift away from foundational values of openness and accessibility and toward closed systems, paywalls, and machine-to-machine content loops. The speed and scale of this shift mean the next few years will determine whether the internet preserves its role as an open platform for independent creators, diverse voices, and knowledge-sharing or becomes primarily infrastructure for machine learning.”
What was, just a month ago, a theoretical overview of how the web is evolving has now broken out into full-on combat of words just over the last week or so.
Cloudflare vs. Perplexity
On August 4th, 2025, Cloudflare published a report with as direct a title as you can get: Perplexity is using stealthy, undeclared crawlers to evade website no-crawl directives.
If you’re wondering what stake Cloudflare has in the web crawling game, then you don’t know Cloudflare. Founded in 2009, Cloudflare is a $69 billion public company with a stated mission to “help build a better internet”. The company’s core business is providing content delivery, DDoS mitigation, and distributed domain name server services. About 20% of all the websites on the internet use Cloudflare to make their sites more secure and stable.
Cloudflare’s Concerns
So what’s Cloudflare’s issue with Perplexity? According to the report, Perplexity was modifying its user agent (UA) to hide crawling activities, ignoring robots.txt files meant to stop crawlers, and even disguising its user agent to “impersonate Google Chrome on macOS when their declared crawler was blocked.”
In other words, Perplexity is demonstrating poster-child behavior of the kind of villain that Cloudflare has recently held up as the greatest threat to the open internet. Back in July 2025, Cloudflare declared its celebration of Content Independence Day. In that celebration announcement, Cloudflare outlined the fundamental “business model for the web” that Google established 30 years ago:
“The deal that Google made with content creators was simple: let us copy your content for search, and we'll send you traffic. You, as a content creator, could then derive value from that traffic in one of three ways: running ads against it, selling subscriptions for it, or just getting the pleasure of knowing that someone was consuming your stuff.”
Today, Cloudflare says this deal has been broken. Google’s search market share has dropped below 90% for the first time in a decade as more of that traffic goes to AI chatbots. For those who still turn to Google, 75% of questions are answered without the user leaving Google.
Software may be eating the world, but AI is eating the web.
Content creators are staring down the barrel of a dramatically different information ecosystem. With OpenAI, it's 750x harder for websites to get traffic compared to Google. With Anthropic, it's 30,000x harder! As Cloudflare puts it, “increasingly we aren't consuming originals, we're consuming derivatives.” That’s why Cloudflare declared it Content Independence Day on July 1st, 2025:
“Cloudflare, along with a majority of the world's leading publishers and AI companies, is changing the default to block AI crawlers unless they pay creators for their content. That content is the fuel that powers AI engines, and so it's only fair that content creators are compensated directly for it.”
Meanwhile, if the allegations against Perplexity are true, then Perplexity isn’t in favor of Content Independence. It is, instead, going to conquer content whether publishers like it or not.
Perplexity Punches Back
The same day Cloudflare made its accusation, Perplexity shot back its response. It claimed that, rather than Perplexity being a marauding pirate of internet content, Cloudflare was actually just bad at its job. Also, it implied that Cloudflare didn’t understand what Perplexity was actually doing.
First, Perplexity made the argument that “user-driven fetching” is fundamentally different than “automated crawling.” While traditional crawling is mass-scale proactive information indexing, user-driven agents, in contrast, “only fetch content when a real person requests something specific, and they use that content immediately to answer the user's question.”
Next, Perplexity directly took aim at Cloudflare:
“This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats. If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic.”
Strong words.
The crux of Perplexity’s argument was that its agents don’t represent “malicious bots,” but genuine servants of the common people acting on behalf of their users. And choosing to refer to those agents as malicious would also “criminalize email clients and web browsers, or any other service a would-be gatekeeper decided they don’t like.” Perplexity went on to say that letting Cloudflare act as that gatekeeper would create “a two-tiered internet where your access depends not on your needs, but on whether your chosen tools have been blessed by infrastructure controllers.”
Perplexity seems to believe that Cloudflare’s fundamental issue with bot traffic was whether or not data was being scraped and stored for training data. By stating that the Perplexity agent was only using that information to answer queries in real-time and discarding the data, it implied that the issue was negated and that Cloudflare was mainly interested in maintaining its “gatekeeper” or “infrastructure controller” role.
Beyond just a disagreement on terms, Perplexity also fired back at Cloudflare by implying that it’s bad at its core job: identifying and understanding web traffic.
Perplexity claimed that Cloudflare “confused Perplexity with 3-6M daily requests of unrelated traffic from BrowserBase, a third-party cloud browser service that Perplexity only occasionally uses for highly specialized tasks.” Why would Cloudflare make that mistake? Perplexity believes it's either (A) because “Cloudflare needed a clever publicity moment,” or (2) Cloudflare committed a “basic traffic analysis failure that’s particularly embarrassing for a company whose core business is understanding and categorizing web traffic.”
Litigating The Argument
Many people were quick to point out that Perplexity’s response was an aggressive straw man argument, deflecting the actual accusation about Perplexity’s underlying violation and throwing the discussion onto a more philosophical debate about first and second-class citizens among internet traffic. It felt like a “have your cake and eat it too” position, where Perplexity was claiming that “We’re not malicious bots, we’re user-driven agents scraping on behalf of the common man. And besides, what you thought was scraping wasn’t even us, it was a third-party, so...”
The reality is that Perplexity is unlikely to be the spotless white knight it would frame itself as, for two reasons.
First, Perplexity initially “claimed the bot named in the Cloudflare blog ‘isn’t even ours.’ Later, in Perplexity’s response, it clarified that the bot was in fact Perplexity’s, but it came from BrowserBase, a third-party bot that “Perplexity only occasionally uses.” So which is it?
Second, this wasn’t the first time Perplexity has run afoul of publishers’ explicit terms of service. In June 2024, Wired accused Perplexity of plagiarizing its content. In October 2024, Dow Jones and the New York Post sued Perplexity, accusing it of what they called “content kleptocracy.”
In fact, Perplexity was accused of this exact kind of circumvention of robots.txt instructions over a year ago. For context, websites use machine-readable files, called robots.txt files, to specify what content they want crawlers to leave alone. In June 2024, one user deliberately set up his robots.txt to block Perplexity, but it was still able to summarize his article. The user even asked Perplexity how it could crawl a website with a robots.txt that forbade it. Perplexity’s response was that:
“If the content from the website is restricted by its robots.txt, I cannot ethically access or summarize that content.”
When pressed, Perplexity acknowledged that, “you make a fair point, I should not have provided a summary of the [website].” Reviewing the website's logs revealed that Perplexity was “using headless browsers to scrape content, ignoring robots.txt, and not sending their user agent string.” This is identical to Cloudflare’s accusation.
The conclusion seems to be that Perplexity is doing what Cloudflare accused it of. The deeper, fundamental question that carries with it implications for how the internet is governed is this: is what Perplexity did wrong? That’s harder to answer.
The fundamental promise of the internet was built between publishers, aggregators, and consumers several decades ago. As Cloudflare explained, this agreement was something to the effect of “let us copy your content for search, and we'll send you traffic.” Under this agreement, aggregators get an index, publishers get their content seen, and consumers get information. Everyone wins.
However, while that agreement is now supported by decades of precedence, it’s a social contract, not a legal one. There are other digital standards that are enforced by law, whether thats enforcing terms of service, or punishing “unauthorized access” according to the Computer Fraud and Abuse Act (CFAA), but there is no legal requirement for Perplexity to adhere to robots.txt instructions or refrain from using disguised browsers to pull information from particular sites that are otherwise trying to stop them.
The Robots Exclusion Protocol is a standard established in 1994 based on voluntary compliance. For the most part, every above-board participant in the internet economy has chosen to abide by them to build trust and order. The original motivation behind developing the robots.txt file was because of a poorly designed web crawler that accidentally caused a denial-of-service (DOS) attack. So everyone agreed it was in all of their best interests to lay some ground rules.
But that social contract is starting to break. Perplexity CEO, Aravind Srinivas, has said explicitly that “our belief is that facts need to be universally distributed to everybody.” The would-be liberators of the internet’s information have decided that the prior deal doesn’t work anymore. Perplexity, as an aggregator, has decided to optimize for the consumer’s experience at the cost of the publisher’s business model.
And many people agree with this approach. Users of these types of AI tools see them as extensions of their own browsing activity. As one user put it: “When retrieving a web page for the user, it’s appropriate to use a UA string that looks like a browser client.” In other words, if Perplexity is browsing on my behalf, then it should be allowed to do whatever is necessary to successfully browse, including going against the norms set by robots.txt files.
Users have expressed this sentiment over and over again. “I want Perplexity to visit public content on my behalf when I give it a request / task.” “Why would the LLM accessing the website on my behalf be in a different legal category as my Firefox web browser?”.
However, what's in the best short-term interest of the users (and the AI platforms that serve them) may not be in the best interest of the internet ecosystem as a whole.
Breaking The Business Model Of The Internet
Where consumers see an increasingly optimized experience, publishers see doom.
People often forget, but advertising is the fundamental business model of the internet. At any given time, 10-20% of all the revenue that has been made online since 2000 has been from advertising. Our attention has been the underlying engine that has subsidized the majority of our digital experiences when we don’t have to pay for it. That has come from this fundamental deal that Google helped broker: publishers make content, aggregators get content in front of people, consumers get content. This is what Cloudflare CEO Matthew Prince has described as a “fair value exchange.”
“With Search Engines, the fair value exchange was you let them have your content in exchange for them sending you traffic you could derive value. With Answer Engines, they take your content and send you…? If you’re not getting anything in return, why would you give up your content?”
When publishers look at what companies like Perplexity are doing to the online social contract, they see their business model going up in smoke. As one publisher put it: “what Perplexity is doing when they crawl my content in response to a user question is that they are decreasing the probability that this user would come to my content.” Another explained the fundamental difference between AI agents and humans:
“While agents act on behalf of the user, they won't see nor click any ads; they won't sign-up to any newsletter; they won't buy the website owner a coffee. They don't act as humans just because humans triggered them. They simply take what they need and walk away.”
And, unfortunately, that’s becoming the norm. Today, bot activity is exceeding human activity online for the first time. And these aren’t all the friendly in-service-of-humans agents that Perplexity likes to champion. 37% of all internet traffic comes from malicious bots. At a time when bot traffic is at an all-time high, we’re fracturing the systems that could have maintained some semblance of order in the face of an automated onslaught.
What’s An Internet To Do?
Regardless of what people may think the internet should do, it seems clear what it will do, which is to march to the beat of consumer preferences. Just ask Betamax, LaserDiscs, and the Concorde. What the consumer wants, the consumer tends to get, consequences be damned.
And today, the consumer is compelled by agentic internet consumption. Many people believe the future of the internet is zero-click. Those seeking to bring that future to life see Cloudflare’s concern as the worries of a bygone era. As one AI search optimization platform put it: “Cloudflare positions itself as defending publishers' interests, but what they're really attempting is to become the toll collector on the information superhighway.”
Platforms like ChatGPT or Perplexity obviously want agents treated as first-class citizens. The idea is that “AI agents are extensions of users… when you charge these agents, you’re charging users, not AI companies.” But this misses a fundamental element of the original deal of the internet. Users have been paying for access to this content. Not with dollars, but with attention. If agents are going to “simply take what they need and walk away,” then you’re not replacing that marginal fee. If there’s no ad to sell, then something else has to give, or the entire internet becomes unsustainable.
In defense of the “new” business model of the internet, these platforms claim that “despite AI traffic representing a small fraction of organic traffic, it drives significantly higher-value conversions and represents substantial growth in attributable business outcomes year-over-year.” If that’s true, then, while there may be short-term adjustment pain, there will be long-term higher quality outcomes. Let’s hope that's true, for the sake of anyone trying to distribute anything on the internet.