Anthropic is reportedly testing Claude models that can fix their own mistakes

May 15, 2025

Anthropic / GPT-Image-1

Anthropic is reportedly preparing the next generation of its Claude models, aiming for greater autonomy and the ability to self-correct during complex tasks.

According to The Information, the company plans to release new versions of Claude Opus and Sonnet in the coming weeks. Testers say these models can operate much more independently than earlier versions.

The biggest change is in how the models blend independent reasoning with external tool use, smoothly switching between the two as needed. If the model gets stuck while using a tool, it moves into a "thinking" mode to analyze what happened and fix the problem. This back-and-forth is meant to help the models tackle complex challenges with less help from users.

One example from The Information is a market analysis for a Manhattan café. The model starts by looking at national trends but quickly figures out those aren't useful. It then shifts to analyzing demographic data from the East Village, aiming to produce more relevant recommendations.

The new Claude models also take a more active role in coding tasks. They automatically test the code they generate, and if something goes wrong, they pause to figure out and fix the error on their own. Early testers say this self-correcting process even works with broad prompts like "make the app faster," where the model will independently try out different optimization strategies.

Less guidance, more initiative

Anthropic's approach lines up with a wider trend: building AI systems that can keep working with minimal input and solve problems on their own. The updated Claude models are designed to combine reasoning and tool use, switching between the two modes as needed for the task.

OpenAI's new o3 and o4-mini models work in much the same way. While the earlier o1 models could only "think through" extra steps by generating text, the latest generation can also bring in tools like web search, generate code, or analyze images as part of their reasoning. This should make them more flexible and robust, though initial tests show that o3, for example, still makes mistakes on complex tasks more often than previous OpenAI models.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Anthropic is reportedly testing Claude models that can fix their own mistakes

Less guidance, more initiative

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.