TD Stuff

Applying Distributed Systems Principles to LLM Teams

Mar 16, 2026104

The research advocates for using distributed systems theory as a formal framework to design and evaluate multi-agent LLM teams more effectively.

Multi-Agent Systems Distributed Systems AI Architecture LLM Inference

Under the Hood

The LLM Architecture Gallery: Mapping the Evolution of Open-Weight Models

Mar 16, 2026383

A comprehensive technical reference gallery documenting the architectural evolution and specifications of modern open-weight large language models.

AI Architecture Foundation Models Mixture of Experts LLM Inference Transformer Models

Products & Announcements

Claude Off-Peak Usage Double Promotion March 2026

Mar 14, 2026243

Claude is doubling usage limits during off-peak hours for most plan types from March 13 to March 27, 2026.

Anthropic AI Business Models AI & Productivity LLM Inference

Under the Hood

Can I Run AI: The Local LLM Hardware Compatibility Guide

Mar 13, 20261404

A hardware compatibility tool that grades the local performance of AI models based on a user's specific GPU and VRAM configuration.

On-Device AI LLM Inference Self-Hosting AI Hardware Developer Tooling

Agentic Systems

Slash Claude API Costs with Automated Prompt Caching

Mar 13, 2026

An open-source MCP tool that automates Anthropic prompt caching to reduce token costs by 90% and provide deep usage observability.

Model Context Protocol Anthropic LLM Inference AI & Productivity Observability

Damage Control

Debunking the $5,000 Claude Code Loss Myth

Mar 10, 2026477

The reported $5,000 loss per Claude Code user is based on retail markups rather than actual compute costs, masking the fact that Anthropic's inference is likely profitable.

Anthropic AI Business Models LLM Inference AI Hype Competitive Moats

Under the Hood

From Sampling to Grammars: Making LLMs Reliably Output Structured Data (Even for Thinking Models)

Sep 23, 2025234

Use efficient sampling plus grammar constraints to guarantee format today, but expect models to natively emit structured outputs tomorrow—especially when you let them think first, then constrain.

Structured Output LLM Inference LLM Reasoning

Agentic Systems

Faster LLMs, Bigger Demands: Why Coding Agents Won’t Stabilize Soon

Sep 22, 2025137

Faster LLMs will reshape coding workflows and productivity, but escalating demand, hardware limits, and pricing pressures mean a bumpy, fast-changing road ahead.

AI Coding Agents AI & Productivity AI Infrastructure LLM Inference AI Business Models

Damage Control

Postmortem: Three Overlapping Infra Bugs Degraded Claude—Fixes Shipped, Evals and Tooling Upgraded

Sep 17, 2025381

Three infrastructure bugs—not load or demand—degraded Claude; rollbacks and a shift to exact top‑k fixed them, and Anthropic is upgrading evaluations and debugging while asking for user feedback.

AI Infrastructure LLM Inference Incident Response Service Reliability Corporate Accountability

Products & Announcements

Qwen3-Next: Hybrid Attention + Ultra-Sparse MoE for 10x Faster Long-Context LLMs

Sep 12, 2025569

Qwen3-Next matches larger models while slashing training cost and delivering order-of-magnitude faster long-context inference via a hybrid attention + ultra-sparse MoE design with native MTP.

AI Architecture Mixture of Experts LLM Inference LLM Context Management

Under the Hood

A Skeptic’s Guide to Running Local LLMs on macOS

Sep 8, 2025388

A pragmatic, privacy-first guide to running and choosing small local LLMs on macOS—what to use, how to pick, and how to stay safe and sane.

On-Device AI LLM Inference Open Source Data Privacy

Under the Hood

Inside a Tiny GPT: A Visual Walkthrough of Autoregressive Prediction

Sep 5, 2025640

A visual, end-to-end demo of a tiny GPT that turns tokens into embeddings, runs them through transformers, and autoregressively predicts the next token to solve a simple sorting task.

Transformer Models LLM Inference Interactive Web Tools AI Interpretability