The Truth Behind Claude Code Quota Exhaustion

A Claude Code user reported that their Pro Max quota was exhausted in 1.5 hours, leading to a community investigation into token accounting. Analysis revealed that while prompt caching successfully saves quota, background processes and large context spikes are the true causes of rapid depletion. The investigation also identified potential peak-hour surcharges that further accelerate quota usage during busy times.

Key Points

A user reported rapid quota exhaustion, suspecting that prompt caching provided no relief for rate limits.
Statistical modeling suggests that 'cache_read' tokens have a 0.0x weight, meaning they do not actually count against the 5-hour quota.
Background sessions and idle terminal windows can silently drain quota through automated tasks like retros and hook processing.
Large context windows (up to 1M tokens) cause massive token spikes during auto-compaction events that quickly deplete limits.
Preliminary data indicates that quota consumption may be 25-35% more expensive during peak weekday hours (13:00–19:00 UTC).

Sentiment

Predominantly negative. While there is genuine appreciation for Boris Cherny personally engaging on HN and responding to bug reports in real-time, the broader community sentiment is one of frustration and eroding trust. Users feel Anthropic is deflecting blame, making silent changes that degrade service quality, and failing to provide the transparency that paying customers deserve. The discussion has a strong undertone of customers feeling gaslit by contradictory explanations about cache behavior.

In Agreement

Reducing the default context window from 1M to 400k would help most users avoid expensive cache misses without meaningful quality loss
Users loading many third-party skills, MCP servers, and background automations are contributing to unexpectedly high token consumption
The subscription pricing at $100-200/month is genuinely a good deal compared to raw API costs, and users should recognize the economics
Boris engaging directly on HN on a Sunday demonstrates good faith, and the team is actively investigating and shipping fixes
Compacting sessions early and keeping them under 200-300k tokens produces better results and more predictable quota usage

Opposed

Anthropic is blaming users for problems caused by their own architectural decisions, particularly the forced 1M context rollout without adequate testing
Cache TTL appears to have been silently reduced from 1 hour to 5 minutes, contradicting official statements and causing massive hidden costs
The lack of any SLA or transparency around quota mechanics for a $200/month service is unacceptable and reflects poorly on Anthropic's maturity
Model quality has measurably degraded since early March, with documented evidence of reduced code-reading, more file rewrites, and increased task abandonment
Anthropic is engaging in classic VC-subsidized enshittification: hook users with generous limits, then silently degrade the service while adding features nobody asked for
OpenAI Codex offers more generous limits, more transparent quota resets when issues occur, and comparable code quality for less money
Users should not need to manage hidden environment variables, manually compact sessions, or avoid leaving their computer to get reasonable value from a premium subscription