Anthropic is reportedly testing Claude models that can fix their own mistakes

Anthropic is reportedly preparing the next generation of its Claude models, aiming for greater autonomy and the ability to self-correct during complex tasks.

According to The Information, the company plans to release new versions of Claude Opus and Sonnet in the coming weeks. Testers say these models can operate much more independently than earlier versions.

The biggest change is in how the models blend independent reasoning with external tool use, smoothly switching between the two as needed. If the model gets stuck while using a tool, it moves into a "thinking" mode to analyze what happened and fix the problem. This back-and-forth is meant to help the models tackle complex challenges with less help from users.

One example from The Information is a market analysis for a Manhattan café. The model starts by looking at national trends but quickly figures out those aren't useful. It then shifts to analyzing demographic data from the East Village, aiming to produce more relevant recommendations.

The new Claude models also take a more active role in coding tasks. They automatically test the code they generate, and if something goes wrong, they pause to figure out and fix the error on their own. Early testers say this self-correcting process even works with broad prompts like "make the app faster," where the model will independently try out different optimization strategies.

Less guidance, more initiative

Anthropic's approach lines up with a wider trend: building AI systems that can keep working with minimal input and solve problems on their own. The updated Claude models are designed to combine reasoning and tool use, switching between the two modes as needed for the task.

OpenAI's new o3 and o4-mini models work in much the same way. While the earlier o1 models could only "think through" extra steps by generating text, the latest generation can also bring in tools like web search, generate code, or analyze images as part of their reasoning. This should make them more flexible and robust, though initial tests show that o3, for example, still makes mistakes on complex tasks more often than previous OpenAI models.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Anthropic is reportedly testing Claude models that can fix their own mistakes

Less guidance, more initiative

Anthropic releases open-source tool for AI security checks

Anthropic blocks OpenAI from accessing Claude models over alleged contract breach

Anthropic will set new weekly usage limits for Claude subscribers starting August

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

Anthropic is reportedly testing Claude models that can fix their own mistakes

Less guidance, more initiative

Anthropic releases open-source tool for AI security checks

Anthropic blocks OpenAI from accessing Claude models over alleged contract breach

Anthropic will set new weekly usage limits for Claude subscribers starting August