Ravenous AI Crawlers Are Breaking the Web—and Driving It Behind Paywalls

AI data-harvesting bots are hammering websites with aggressive, noncompliant crawls that degrade performance and raise costs without sending traffic back. Even big sites struggle with spikes and must scale up, while small sites can be knocked offline. Defensive measures are emerging, but the author predicts they will lead to a fragmented, paywalled web.

Key Points

AI crawlers now drive a significant share of web traffic and are far more aggressive than traditional search bots, often ignoring crawl rules and overloading servers.
Traffic spikes from AI bots can hit 10–20x normal levels and even large sites must add resources to avoid user abandonment from slow load times.
Publishers receive little or no referral traffic from AI crawlers, unlike classic search indexing, so costs rise without corresponding revenue.
Robots.txt is frequently disregarded; defenses like paywalls, CAPTCHAs, and DDoS protections are imperfect, prompting new measures (e.g., llms.txt, Cloudflare blocking/tolls, Anubis throttling).
The likely outcome is an arms race that pushes the web toward fragmentation and paywalls, undermining the open web.

Sentiment

The discussion is overwhelmingly sympathetic to the article's thesis. Commenters are frustrated and angry at AI companies' irresponsible crawling behavior, with many sharing firsthand accounts of sites being hammered to the point of unusability. While there is some nuanced discussion about contributing factors like WordPress performance and hosting costs, the dominant sentiment is that AI companies are externalizing massive costs onto small site operators without any accountability. The community views this as a tragedy of the commons playing out in real time.

In Agreement

Hosting providers report only about 5% of web traffic is from real humans, with AI bots creating constant severe load on infrastructure
Small site operators like ProtonDB's creator face hundreds of dollars in additional hosting costs due to AI crawler bandwidth consumption
AI companies ignore robots.txt, caching, rate limiting, and ETags despite having the engineering talent and resources to do better
The web is being pushed behind paywalls and login walls as site operators have no other viable defense against relentless crawling
Multiple AI companies use massive IP pools and fake user agents to evade blocking, making mitigation a never-ending game of whack-a-mole

Opposed

WordPress's inherently poor architecture (PHP's stateless model, key-value storage requiring many DB calls) is a major contributing factor—better-engineered sites can handle the load far more easily
Netlify's pricing model ($50 per 100GB overage) is as much the problem as the crawlers themselves; cheaper hosting exists
The article is characterized as blogspam that rehashes earlier reports without adding new information
Growing web traffic demand has been a trend for over a decade; AI crawlers merely pushed already-fragile sites over the edge rather than creating a fundamentally new problem
Most websites could be statically rendered and would face far fewer problems; the inefficient web we built contributed to this vulnerability