Claude Sonnet 4.5 Launches: SOTA Coding & Agent Model With SDK and Major Product Upgrades

Anthropic launched Claude Sonnet 4.5, its most capable and aligned model, leading coding and computer-use benchmarks and sustaining long-horizon work. The release adds major product upgrades (Claude Code checkpoints, VS Code extension, API memory tools, in-app code execution and file creation) and the Claude Agent SDK for building advanced agents. It ships under ASL-3 with stronger safeguards, and includes a brief research preview, Imagine with Claude, for interactive software generation.
Key Points
- Claude Sonnet 4.5 is Anthropic’s most capable and most aligned model, leading coding and computer-use benchmarks (e.g., SWE-bench Verified, OSWorld).
- Major product upgrades ship alongside the model: Claude Code checkpoints, terminal refresh, VS Code extension; API context editing and memory; in-app code execution and file creation; Chrome extension rollout.
- Anthropic releases the Claude Agent SDK—the same infrastructure behind Claude Code—to help developers build long-running, tool-using agents.
- Safety advances include reduced misaligned behaviors, stronger defenses against prompt injection, and ASL-3 protections with improved CBRN classifiers.
- A short-term research preview, Imagine with Claude, showcases real-time, interactive software generation for Max subscribers.
Sentiment
The overall sentiment is cautiously positive but tempered by significant real-world skepticism. The community respects the benchmark improvements and appreciates the same-price positioning, but a vocal contingent questions whether any frontier model has crossed the reliability threshold needed for professional trust. Many developers describe hedging strategies using multiple models rather than committing to any single provider, reflecting a pragmatic rather than enthusiastic reception.
In Agreement
- The code interpreter mode and database refactoring demonstrations show genuinely impressive capability improvements
- Same price point as Sonnet 4 makes the upgrade compelling and demonstrates good value
- Claude Code's subagent architecture enables powerful parallel workflows that competitors like Codex cannot match
- The SWE-bench improvement from 72.7% to 77.2% and perfect AIME score represent solid progress in roughly four months
- Reduced sycophancy metrics from the system card suggest meaningful personality improvements
- The Agent SDK and long-horizon focus capabilities open new possibilities for complex multi-step automation
Opposed
- Benchmark improvements do not translate to real-world performance; models still fail at trivially simple tasks like basic lint fixes
- Preview access for bloggers creates biased first impressions, and shipped models may be degraded from preview versions
- AI agents perform destructive actions like git reset --hard and revert user changes, undermining trust in autonomous operation
- The pace of improvement appears to be slowing from exponential to sublinear when measured across consecutive releases
- Content moderation is too aggressive, making Claude frustrating for legitimate creative and professional use cases
- Multiple users report ChatGPT or Codex outperforming Claude in practice despite lower benchmark scores
- The sycophantic 'You're absolutely right!' pattern persists despite Anthropic's claims of improvement