TD Stuff

Cloudflare Simplifies AI Data Ingestion with New Site-Wide Crawling API

Mar 11, 2026487

Cloudflare's new API endpoint simplifies website-wide data extraction by automating discovery and rendering into AI-friendly formats.

Web Scraping Retrieval-Augmented Generation Browser Automation AI Training Data Cloud Infrastructure

Damage Control

Publishers Block Internet Archive to Stop AI Scraping 'Backdoor'

Feb 15, 2026569

News publishers are blocking the Internet Archive to prevent AI companies from using it as a free source of training data.

AI Training Data Intellectual Property Digital Preservation Corporate Accountability

Damage Control

Shutting Down My Self-Hosted Git After AI Scraper Overload

Feb 11, 2026298

AI scrapers killed my self-hosted git, so I’ve moved everything to GitLab/GitHub and hardened my static blog’s logging.

Web Scraping Self-Hosting Open Source AI Training Data

Damage Control

Insiders Rally Data-Poisoning Campaign to Cripple AI

Jan 11, 2026242

Industry insiders are rallying a crowdsourced data-poisoning campaign to sabotage AI models, arguing it’s a faster check on AI than regulation.

AI Training Data AI Safety AI Ethics AI Regulation

Products & Announcements

AI Is Commoditizing Specs—Operations Are the New Moat

Jan 10, 2026341

AI turns specs into commodities, so the real business value has shifted from code and components to running, securing, and scaling software operations.

Open Source Technology Economics AI Training Data AI Business Models

Under the Hood

Ranke-4B: Time-Locked Historical LLMs as Windows into the Past

Dec 19, 2025897

A set of strictly time-locked historical LLMs (Ranke-4B) offers faithful, era-bound perspectives for research, avoiding modern hindsight while managing sensitive content responsibly.

AI Training Data AI Ethics Digital Humanities Open Source

Damage Control

Stop Force‑Feeding AI: Adopt It Only When It Works

Nov 30, 2025431

Use AI only when it clearly helps, not because investors need it deployed.

AI Hype Corporate Accountability Technology Economics AI Ethics AI Training Data

Products & Announcements

DeepSeek-OCR: LLM-Centric Visual-Text Compression for Fast, Flexible OCR

Oct 20, 20251003

An LLM-focused, high-throughput OCR system that compresses visual context for efficient document and image understanding.

Computer Vision Multimodal AI Open Source AI Training Data

Programming

AI Isn’t Software You Can Patch

Oct 15, 2025537

AI isn’t regular software: its failures come from data and emergent behavior, so you can’t just inspect code and patch away the risks.

AI Safety AI Hype Software Craftsmanship AI Training Data

Products & Announcements

Why Authors Will Pay to Be in AI Training

Oct 12, 2025

To stay culturally visible and influential, authors will pay and write for AIs, not just humans.

AI Training Data Writing & AI Intellectual Property Technology Economics

Under the Hood

Unlocking AI’s Data: ABC and an ARPANET-Style Plan

Sep 24, 2025

Shift from data scarcity to data access by implementing ABC—owner- and user-controlled, privacy-preserving attribution—and catalyze it with an ARPANET-style federal program.

AI Training Data Data Privacy Technology Economics Public Policy

Products & Announcements

From Search to Answers: Paying Creators in the AI Era

Sep 22, 2025217

As AI answer engines disrupt traffic-based monetization, Cloudflare champions a new model where AI companies pay creators for the unique content that improves their models.

AI Business Models Creator Economy AI Training Data Open Web Technology Economics