Anthropic CEO addresses Claude Sonnet rumors and puts Deepseek progress in perspective

Jan 29, 2025

Midjourney prompted by THE DEOCDER

Key Points

Anthropic CEO Dario Amodei acknowledges that Deepseek's new competitor model performs similarly to seven to ten-month-old US models at a lower cost, but not to the extent some suggest.
Amodei believes Deepseek's true technical innovation lies in their Deepseek-V3 model released in late December, rather than the currently discussed R1 model.
Although Deepseek has reduced AI model development costs, Amodei estimates that the company's GPU reserves are within a factor of 2-3 of major US AI companies, indicating that despite efficiency improvements, overall investments remain substantial.

Anthropic CEO Dario Amodei wants to clear up some misconceptions about Claude 3.5 Sonnet. The AI model cost far less to develop than recent rumors suggest, and it wasn't built using more advanced, secret models as some have claimed.

According to Amodei, training Claude 3.5 Sonnet - currently considered the most capable language-only AI model - cost "a few $10M's," not the billions that recent reports have suggested. He also dismisses speculation that Sonnet was developed using synthetic data generated by more sophisticated, unreleased models such as Opus 3.5.

Despite being trained nine to twelve months ago, "Sonnet remains notably ahead in many internal and external evals," says Amodei, noting that this is particularly evident when the model is actually used in practical tasks such as programming and human interaction.

The real technical achievement from Deepseek isn't the much-discussed R1 model, Amodei says, but rather their Deepseek-V3 model released in late December, which introduced key improvements such as an advanced "mixture of experts" approach. The R1 model, released later, builds mostly on existing approaches, Amodei says.

The real cost of AI development

Deepseek's cost savings aren't unusual for the industry, Amodei points out. The price tag for training AI models typically drops by about "maybe ~4x/year."

"I think a fair statement is 'DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)'," Amodei writes in his personal blog.

Even with these efficiencies, AI companies still face significant costs. DeepSeek has reportedly invested in about 50,000 Hopper-generation chips, worth about $1 billion. That puts the Chinese company's GPU reserves within a factor of 2-3 of major US AI companies, Amodei estimates.

Like other AI labs, Amodei sees reinforcement learning (RL) becoming central to scaling AI models. This new approach, which powers Deepseek R1 and OpenAI's latest models, is just beginning to show its potential, Amodei says, suggesting that Anthropic's potential next release won't be your standard LLM either.

Like other western AI Leaders, Amodei supports expanding computing infrastructure. While the cost of achieving "a given level of model intelligence" is declining, overall spending on AI training continues to rise, he says, estimating that developing AI systems more capable than most humans will require millions of chips, "and is most likely to happen in 2026-2027."

As for chip export controls to China, Amodei says Deepseek's progress makes these restrictions more important, not less. The fact that AI technology is becoming more efficient isn't a reason to lift the controls, he argues. These restrictions help prevent China from buying millions of chips, matching U.S. AI capabilities, and gaining what he calls "military dominance."

"China's best AI chips, the Huawei Ascend series, are substantially less capable than the leading chip made by U.S.-based Nvidia," Amodei writes.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Dario Amodei