Claude Sonnet 4.5 Launches: SOTA Coding & Agent Model With SDK and Major Product Upgrades

Anthropic launched Claude Sonnet 4.5, its most capable and aligned model, leading coding and computer-use benchmarks and sustaining long-horizon work. The release adds major product upgrades (Claude Code checkpoints, VS Code extension, API memory tools, in-app code execution and file creation) and the Claude Agent SDK for building advanced agents. It ships under ASL-3 with stronger safeguards, and includes a brief research preview, Imagine with Claude, for interactive software generation.

Key Points

Claude Sonnet 4.5 is Anthropic’s most capable and most aligned model, leading coding and computer-use benchmarks (e.g., SWE-bench Verified, OSWorld).
Major product upgrades ship alongside the model: Claude Code checkpoints, terminal refresh, VS Code extension; API context editing and memory; in-app code execution and file creation; Chrome extension rollout.
Anthropic releases the Claude Agent SDK—the same infrastructure behind Claude Code—to help developers build long-running, tool-using agents.
Safety advances include reduced misaligned behaviors, stronger defenses against prompt injection, and ASL-3 protections with improved CBRN classifiers.
A short-term research preview, Imagine with Claude, showcases real-time, interactive software generation for Max subscribers.

Sentiment

Overall, the sentiment of the Hacker News discussion is mixed, leaning towards cautious skepticism regarding the article's claims. While some users acknowledge productivity gains and benchmark improvements, a significant portion expresses concerns about real-world performance discrepancies, high costs, usage limitations, and the perceived inconsistency or 'nerfing' of models post-release. Hacker News generally agrees on the importance of the progress but shows skepticism about its practical application and cost-effectiveness.

In Agreement

Users report significant productivity gains (e.g., 3x output increase, rapid SaaS growth) when incorporating AI, including Claude, into their professional work.
Claude models are praised for being excellent collaborators, especially when compared to GPT-5's perceived aggressive or uncooperative tendencies (e.g., unexpected `git reset --hard` actions).
Claude Code is favored for its simpler command execution and safer handling of version control (less likely to revert unintended changes) compared to GPT Codex's complex pipelines and frequent `git` usage.
Many users find Claude models perform superbly and consistently across various programming languages and project structures, though better guidance is needed for larger codebases.
The new Sonnet 4.5 is acknowledged for its benchmark improvements (e.g., SWE-bench, AIME) and the fact that it maintains the same price point as Sonnet 4.
There is an expectation and hope that Sonnet 4.5 is now capable enough for planning tasks, potentially surpassing Opus in this regard.
The progress in 'computer use' benchmarks is seen as a significant step towards more generic and scalable AI interfaces, promising economic disruption.

Opposed

Users report that Claude models, including Sonnet 4.5, can still struggle with specific tasks such as stitching SwiftUI screens or simple lint error substitutions, indicating limitations in practical application despite benchmark claims.
There is widespread skepticism about benchmark reliability and whether they accurately reflect real-world performance, with some users perceiving a degradation in Claude's subjective quality (e.g., from 3.7 to 4) despite benchmark improvements.
Concerns are raised about 'model nerfing,' where models might be 'buffed' for initial release benchmarks and then silently reduced in capability afterward, with calls for ongoing performance monitoring.
Many developers find Anthropic's models to be prohibitively expensive compared to competitors like GPT-5 Codex or Grok, limiting their adoption for individual or small-scale projects.
Usage limits and token consumption are a significant frustration for paying Claude Max users, leading to frequent lockouts and hindering in-depth coding sessions.
Some users subjectively find competitor models, particularly GPT-5 Codex, to be superior for complex coding problems or overall implementation, with initial tests suggesting Sonnet 4.5 may not be clearly better than previous Opus versions or could make worse decisions than Codex.
The 'pelican on a bicycle' benchmark is criticized as a test of memorization rather than true reasoning, casting doubt on the interpretability of such benchmark results.
Overzealous safety guardrails (mentioned in context of other LLMs but relevant to Claude's 'alignment' focus) are criticized as unacceptable for a tool, limiting its utility for sensitive topics.