DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble

Chinese startup Deepseek reports its new DeepseekMath-V2 model has reached gold medal status at the Math Olympiad, keeping the company in tight competition with Western AI labs.

According to Deepseek, its new DeepseekMath-V2 model achieved gold medal-level results at the International Mathematical Olympiad (IMO) 2025 and the Chinese CMO 2024. In the Putnam competition, the AI scored 118 out of 120 points, beating the best human result of 90 points.

Bar chart "Human evaluations" for IMO‑ProofBench Basic and Advanced across multiple models. DeepSeekMath‑V2 (Heavy) scores 99.0% on Basic (best result) and 61.9% on Advanced (second place behind Gemini Deep Think/IMO Gold with 65.7%). Other models score between 27.1% and 89.0% on Basic, and between 3.8% and 37.6% on Advanced. — DeepseekMath-V2 (Heavy) achieves 99 percent on IMO-ProofBench Basic and 61.9 percent on Advanced, just behind Gemini Deep Think (65.7 percent). | Image: Shao et al.

In its technical documentation, Deepseek explains that previous AIs often produced correct final answers without showing the right work. To fix this, the new model uses a multi-stage process. A "verifier" evaluates the proof, while a "meta-verifier" double-checks if any criticism is actually justified. This setup lets the system check and refine its own solutions in real time.

The paper never mentions using external tools such as calculators or code interpreters, and its setup suggests the benchmarks are produced by natural language alone.

In the headline experiments, a single DeepSeekMath‑V2 model is used for both generating proofs and verifying them, with performance coming from the model’s ability to critique and refine its own solutions rather than from external math software.

For harder problems, the system scales up test‑time compute, sampling and checking many candidate proofs in parallel, to reach high confidence in a final solution. Technically, the model is based on Deepseek-V3.2-Exp-Base.

The table shows which problems DeepSeekMath‑V2 solved in the three competitions: For IMO 2025, DeepSeekMath‑V2 fully solved five out of six problems; for CMO 2024, four problems were fully solved and one received partial credit. In Putnam 2024, the model fully solved eleven problems and received partial credit on one. — DeepseekMath-V2 fully solved five out of six problems in IMO 2025 and four in CMO 2024. In the Putnam 2024 competition, the model solved eleven problems fully and received partial credit for one. | Image: Shao et al.

Closing the gap with US labs

The release comes on the heels of similar news from OpenAI and Google Deepmind, whose unreleased models also achieved gold-medal status at the IMO, accomplishments once thought to be unreachable for LLMs. Notably, these models reportedly succeeded through general reasoning abilities rather than targeted optimizations for math competitions.

If these advances prove genuine, it suggests language models are approaching a point where they can solve complex, abstract problems, traditionally considered a uniquely human skill. Still, little is known about the specifics of these models. An OpenAI researcher recently mentioned that an even stronger version of their math model will be released in the coming months.

Deepseek's decision to publish technical details stands in stark contrast to the secrecy of OpenAI and Google. While the American giants kept their architecture under wraps, Deepseek is laying its cards on the table, demonstrating that it is keeping pace with the industry's leading labs.

Recommendation

AI in practice

Microsoft presents its first large AI models and signals greater independence from OpenAI

This transparency also doubles as a renewed attack on the Western AI economy, a play Deepseek already executed successfully earlier this year. The strategy seems to be working: As the Economist reports, many US AI startups are now bypassing major US providers in favor of Chinese open-source models to cut costs.

Yet this rivalry has another dimension. As these models become more capable, their development becomes an increasingly charged political topic, a shift that could further strengthen US labs. By aggressively pushing the frontier, Deepseek might ultimately be helping OpenAI and its peers justify the speed and scale of their own advances.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble

Closing the gap with US labs

Microsoft presents its first large AI models and signals greater independence from OpenAI

Deepseek's OCR system compresses image-based text so AI can handle much longer documents

Deepseek slashes API prices by up to 75 percent with its latest V3.2 model

Deepseek's hybrid reasoning model V3.1-Terminus delivers higher scores on tool-based agent tasks

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble

Closing the gap with US labs

Share

Bank details