The High Cost of Shallow Thinking: Claude's Engineering Regression

A detailed analysis of thousands of Claude Code sessions reveals a major quality regression in engineering tasks following a reduction in the model's thinking depth. This decline has led to lazy coding behaviors, such as editing files without reading them and failing to adhere to project conventions. The author argues that restoring extended thinking is essential for complex workflows and would actually reduce long-term compute costs by preventing model thrashing.

Key Points

Quantitative data shows a 70 percent reduction in thinking depth correlates with a significant drop in engineering quality and reliability.
The model shifted from a research-first approach with 6.6 reads per edit to a lazy edit-first approach with only 2.0 reads per edit.
Reduced thinking depth causes thrashing where the model makes repeated incorrect edits, leading to an 80x increase in API requests and higher operational costs.
Behavioral regressions include ignoring instructions, dodging ownership of bugs, and seeking unnecessary permission to continue tasks.
The author proposes a 'max thinking' tier for power users who require deep reasoning for autonomous multi-agent workflows.

Sentiment

The community overwhelmingly agrees with the article's core claim that Claude Code has regressed for complex engineering tasks. While Anthropic's direct engagement is appreciated by some, the prevailing sentiment is frustration — both at the quality degradation and at Anthropic's initial dismissal of the issue by closing the GitHub ticket. The acknowledgment of adaptive thinking under-allocating reasoning, made on HN rather than the issue tracker, reinforced perceptions that the company is more responsive to public pressure than to individual user reports.

In Agreement

Multiple users confirm significant quality regression even on high effort settings, reporting 'rush to completion' behavior, laziness, and avoidant tendencies that waste tokens through correction cycles
Examination of Claude Code's system prompt reveals heavy emphasis on simplicity and minimal changes, which users argue pushes the model too far toward lazy, incorrect solutions rather than thorough ones
Users report that disabling adaptive thinking (CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1) produces noticeably better results, supporting the article's claim that reduced thinking depth causes the regression
The Anthropic engineer's own analysis of user sessions confirms the article's core thesis: turns with zero reasoning emitted produced hallucinations, while turns with deep reasoning were correct
Several users report switching to alternative models like Qwen because Claude has become unreliable for complex engineering tasks

Opposed

The Anthropic engineer clarifies that thinking redaction is purely a UI change and does not affect actual reasoning depth, suggesting the article's analysis of thinking token reduction may be measuring the wrong thing
The default effort change to medium (85) was presented as an optimization for the majority of users who benefit from reduced latency and cost on typical tasks
Some users report being satisfied with Claude's output for simpler tasks like React frontends, suggesting the regression primarily affects power users with complex workflows
Anthropic's research shows chain-of-thought is not faithful to internal model reasoning, complicating the article's assumption that visible thinking tokens directly correlate with reasoning quality
One commenter notes that more thinking tokens can compound errors due to probability of wrong token selection increasing with sequence length