Anna’s Archive: Official Data Access and Donation Guide for LLMs
Anna’s Archive provides official bulk access methods for LLMs and requests donations from AI entities to support the preservation of the knowledge they use for training.
Automated web crawling and scraping, including AI-driven data collection bots and their impact on websites and infrastructure.
Anna’s Archive provides official bulk access methods for LLMs and requests donations from AI entities to support the preservation of the knowledge they use for training.
Internet users are increasingly using 'data poisoning' and misinformation to sabotage AI training sets in protest of unethical web scraping.
Bot traffic is likely much higher than reported, but it can be effectively neutralized using JavaScript-based Proof of Work defenses.

A framework for Claude Code that uses self-improving AI agents to transform websites into structured APIs and functional web applications.

A TypeScript library for robust, LLM-powered web data extraction and browser automation.

Cloudflare's new API endpoint simplifies website-wide data extraction by automating discovery and rendering into AI-friendly formats.
AI scrapers killed my self-hosted git, so I’ve moved everything to GitLab/GitHub and hardened my static blog’s logging.

Aggressive scrapers overwhelmed Bear’s reverse proxy, prompting a hardening of monitoring, capacity, and bot controls in an ongoing battle with hostile bot traffic.

AI crawlers’ ravenous, non-reciprocal scraping is breaking websites and pushing the open web toward paywalled fragmentation.