Postmortem: Three Overlapping Infra Bugs Degraded Claude—Fixes Shipped, Evals and Tooling Upgraded

Anthropic diagnosed and fixed three overlapping infrastructure bugs—context window misrouting, TPU output corruption, and an XLA approximate top‑k miscompile—that intermittently degraded Claude’s responses. Amplified by an Aug 29 load-balancing change, the issues varied by model and platform; fixes were deployed between Sept 2 and Sept 16, with Bedrock routing remediation in progress. The company is instituting more sensitive, continuous production evaluations, better privacy-preserving debugging tools, and emphasizes it never reduces model quality due to demand.

Key Points

Three overlapping infrastructure bugs (routing error, TPU output corruption, XLA approximate top‑k miscompilation) caused intermittent quality degradation; no intentional quality reductions occurred.
A load-balancing change on Aug 29 dramatically amplified the routing bug’s user impact, peaking at 16% of Sonnet 4 requests in one hour; sticky routing worsened individual sessions.
Output corruption on TPU servers sporadically elevated improbable tokens (e.g., Thai characters in English); it was rolled back Sept 2 and new detection tests were added.
A latent XLA:TPU compiler issue in approximate top‑k caused incorrect token selection under certain conditions; Anthropic switched to exact top‑k with enhanced precision and is working with XLA on a fix.
Detection was hindered by noisy evals, model self-recovery masking errors, and privacy constraints; Anthropic is rolling out more sensitive, continuous production evals and better debugging tools and solicits user feedback.

Sentiment

Overall, the sentiment of the Hacker News discussion is critical and skeptical. While some acknowledge the technical complexity or specific low impact on certain platforms, the dominant tone questions Anthropic's accountability, the severity of the real-world impact on users, the vagueness of certain explanations, and the absence of compensation for degraded service.

In Agreement

The specific impact on Google Cloud's Vertex AI was very low, matching some users' experience of not noticing degradation.
The claim that bugs were less prevalent than anecdotal reports is supported by the specific percentages given, indicating a concentrated window of issues.
The explanation that human-written code layers exist between users and LLM weights clarifies how such 'bugs' can be introduced in complex AI systems.
The assertion that privacy controls limited debugging access is supported by the understanding of internal controls and the user experience of needing to opt-in to share chat conversations for feedback.

Opposed

The claim of low prevalence is contradicted by the fact that approximately 30% of Claude Code users were impacted by the routing bug, amplified by 'sticky routing', which is considered a 'huge bug'.
Anthropic's tone in the postmortem is criticized as 'aggrandizing' for detailing basic service expectations.
Users express disappointment that no offers of credits or compensation were made for the degraded performance, especially given some felt the service was less useful than cheaper alternatives like ChatGPT Pro.
The article is criticized for being vague about the rate of impact for the XLA bug, which plausibly affected a broader range of users.
Anthropic's detection methods are described as inadequate ('vibe checking'), raising concerns about future quality degradation, intentional or otherwise, without stronger SLAs or accountability.
Skepticism exists regarding whether 'privacy/safety rules' truly limited access to user data for debugging, or if it represents an internal policy choice.