The Bot Crisis: Why Internet Traffic is 70% Automated

Added
Article: PositiveCommunity: NegativeDivisive

By using 'digital tar pits' to attract millions of requests, Glade Art discovered that modern bots use residential IPs to mask their identity but rarely execute JavaScript. This research suggests that bot activity accounts for a much larger share of internet traffic than previously thought, possibly over 70%. However, the study also proves that even simple Proof of Work challenges can almost entirely stop these automated scrapers.

Key Points

  • Bots are increasingly using residential and mobile IPs to bypass datacenter-based detection and appear human.
  • The majority of web scrapers do not execute JavaScript, making JS-based Proof of Work challenges a highly effective deterrent.
  • Current estimates of 51% bot traffic on the internet are likely undercounts, with the true figure potentially exceeding 70%.
  • Digital tar pits can successfully waste bot resources by serving massive amounts of useless, generated data.
  • Bots targeting these traps are likely scraping data for AI training, given the scale and funding required for their infrastructure.

Sentiment

The community overwhelmingly agrees that bot scraping is a serious and worsening problem, but is notably divided on solutions. There is significant frustration that the article's own site demonstrated the worst-case scenario of anti-bot measures by blocking human visitors. The prevailing mood is pessimistic — commenters see a losing battle where defenses harm humans while well-funded scrapers easily circumvent them.

In Agreement

  • Multiple site operators confirm experiencing massive distributed scraping attacks using hundreds of thousands of residential IPs, validating the article's claims about bot traffic volume
  • The finding that most bots don't execute JavaScript is confirmed by Anubis deployment data showing scraping dropped from hundreds of thousands of daily requests to about 11 when JS-based PoW was enabled
  • Commenters agree that bots using residential and mobile IPs make traditional detection methods like fail2ban completely ineffective at scale
  • Site operators confirm the economic damage — licensed data being scraped means no ad revenue, threatening the viability of independent websites
  • The consensus supports the claim that robots.txt is universally ignored by AI scrapers, with verified reports of named bots like Amazonbot disregarding disallow directives

Opposed

  • The PoW mechanism in Anubis is not what actually stops bots — it's simply requiring JavaScript execution that filters them out, making the 'proof of work' aspect unnecessary overhead that punishes humans
  • Publishing to the public internet is a binary decision and anti-bot measures are ultimately futile against adversaries with TSMC's wafer budget and Microsoft's cloud infrastructure
  • The article's claim that scraping is for AI training is handwavy — nobody can definitively prove who is behind residential proxy scraping operations or what the data marketplace looks like
  • Anti-bot measures cause significant collateral damage to legitimate users, accessibility tools, screen readers, and AI agents doing legitimate tasks on behalf of users
  • Some argue site operators are hypocritical — wanting to serve ads to humans while blocking automated access, effectively demanding manual labor from users to generate revenue
The Bot Crisis: Why Internet Traffic is 70% Automated | TD Stuff