
The High Cost of Shallow Thinking: Claude's Engineering Regression
1330
Claude's engineering capabilities have collapsed due to a significant reduction in thinking depth, leading to error-prone behavior and massive efficiency losses.
Techniques that scale compute at inference time rather than training time, including search, sampling, and evolutionary strategies applied during model evaluation to improve task performance.

Claude's engineering capabilities have collapsed due to a significant reduction in thinking depth, leading to error-prone behavior and massive efficiency losses.

Evolving plain-English instructions with multi-agent test-time search beats code on ARC and highlights that RL-driven, transferable reasoning is key to AGI.