Content
summary Summary

Anthropic's new Claude Sonnet 4.5 continues the trend in large language model development: better coding and the ability to tackle tasks for much longer stretches.

Ad

Anthropic says Sonnet 4.5 is now its most capable model, outperforming previous versions in software development, computer operation, and task automation—even surpassing Claude Opus 4.1, which launched in August. Sonnet 4 has only been out for about four months, so release cycles are speeding up, with each update bringing more incremental gains.

The rapid release schedule seems fueled by the rivalry with OpenAI, as both companies compete to offer the best coding model. Anthropic appears to be targeting GPT-5 with each update. Opus 4.1, for example, launched just days before GPT-5 hit the market.

Sonnet 4.5 writes better code and works longer without interruption

In the SWE-bench Verified benchmark, which tests models on real-world programming problems, Anthropic says Claude Sonnet 4.5 delivered the best results of any model so far. The company also reports that, in internal tests, Sonnet 4.5 was able to stay focused on complex tasks for more than 30 hours straight.

Ad
Ad
Sonnet 4.5 can stay focused on complex programming tasks for over 30 hours, according to Anthropic. | Image: Anthropic

OpenAI is moving in a similar direction, betting that language models optimized for logic will be able to handle longer, more complex tasks.

In the OSWorld benchmark, which tests how well models can operate real computer systems, Sonnet 4.5 scored 61.4 percent, up from 42.2 percent with Sonnet 4 just four months ago. This video shows the Claude extension for Chrome in action, with Sonnet 4.5 filling out forms and handling browser tasks.

Along with improvements in programming and computer skills, Sonnet 4.5 also shows gains in math, logical reasoning, and subject-specific knowledge. According to Anthropic, tests with professionals in finance, law, medicine, and STEM found that Sonnet 4.5 performed significantly better than earlier Claude models. Anthropic recommends Sonnet 4.5 for any use case.

Sonnet 4.5 delivers stronger performance in code, logic, and financial knowledge, according to Anthropic. | Image: Anthropic

Claude Sonnet 4.5 is available now through the Claude API. Pricing stays the same at $3 or $15 per million tokens, keeping Claude among the most expensive options on the market.

Along with the model update, the Claude Code development tool is getting new features, including checkpoints to save and reset task status, a redesigned terminal interface, and a native VS Code extension for smoother integration with development environments.

Recommendation

Claude Agent SDK for building your own AI agents

With the Claude Agent SDK, Anthropic is making the infrastructure it uses to build its own AI agents public for the first time. The SDK is designed to help developers manage long-term tasks, set up authorization systems, and coordinate multiple sub-agents. New tools for memory management and context processing are also available through the Claude API, making it easier to control long-running agent processes.

Alongside the release, Anthropic is launching "Imagine with Claude", a limited-time experiment where Sonnet 4.5 generates software in real time. The demo is available to Max subscribers for five days.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Anthropic has introduced Claude Sonnet 4.5, its most advanced language model so far, which outperforms earlier versions in areas like software development, computer operation, and automation.
  • The company states that Claude Sonnet 4.5 achieves the top score among all tested models on the SWE-bench Verified Benchmark and can handle complex tasks for over 30 hours, based on internal tests.
  • Anthropic is also rolling out new features for developers, including checkpoints, a VS Code extension, improved context management, a Chrome extension for Max subscribers, and an Agent SDK that allows developers to build their own AI agents using Anthropic's infrastructure.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.