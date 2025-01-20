AI research
Maximilian Schreiner

DeepSeek's latest R1-Zero model matches OpenAI's o1 in reasoning benchmarks

DeepSeek | OpenAI | Midjourney prompted by THE DECODER
DeepSeek's latest R1-Zero model matches OpenAI's o1 in reasoning benchmarks
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
Content
summary Summary

Chinese AI startup DeepSeek has released two new AI models that they say match OpenAI's o1 in performance. Along with their main models, DeepSeek-R1 and DeepSeek-R1-Zero, they've also launched six smaller open-source versions, with some performing as well as OpenAI's o1-mini.

Ad

What sets DeepSeek-R1-Zero apart is how it learns. Instead of studying human examples like most LLMs, it developed its reasoning skills entirely through reinforcement learning (RL). The model taught itself to check its work, think through problems, and break down complex tasks into steps.

According to the research team, the model learned to spend extra time on difficult problems and rethink its approach before providing an answer. This behavior reminded them of DeepMind's AlphaZero system - which likely inspired the model's name.

Building reasoning skills without human examples

Instead of using neural reward models, which they felt are prone to "reward hacking" and would require more computing power, the team built a straightforward reward system based on clear rules. They created two checking systems: one that verifies accuracy by comparing math solutions and testing programming code, and another that checks if answers follow the right format, including proper tags like "think" and "/think".

Ad
Ad

At the heart of their efficient training process is a new algorithm called "Group Relative Policy Optimization" (GRPO). Instead of evaluating each answer individually with a complex reward model, GRPO compares groups of answers to determine how to improve the model's performance.

While DeepSeek-R1-Zero showed promise, it had two main issues: its answers were hard to read, and it would sometimes mix different languages together. To address these problems, the team developed DeepSeek-R1, which starts with a small set of initial training data (what they call "cold start" data) before going through several rounds of reinforcement learning.

Matching OpenAI's o1 performance

DeepSeek-R1 performs as well as OpenAI-o1-1217 across various reasoning benchmarks. It scored 79.8% on AIME 2024 and reached 97.3% on MATH-500. The model particularly excels at coding tasks - outperforming 96.3% of human participants on Codeforces. It also shows strong results in knowledge tests like MMLU and GPQA Diamond, though OpenAI-o1-1217 maintains a slight edge in these areas.

Benchmark table comparing DeepSeek R1 and OpenAI o1 models across language, code, and math tasks, highlighting relative strengths of each model.
DeepSeek R1 matches OpenAI's models across major benchmarks. While OpenAI o1-1217 performs better on some English language tasks, DeepSeek R1 shows stronger results in math reasoning and coding tests. | Image: DeepSeek

DeepSeek didn't stop with their main 671 billion parameter model. They also created six smaller versions, ranging from 1.5 to 70 billion parameters. To transfer the reasoning abilities to these smaller models, they used DeepSeek-R1 to generate 800,000 training examples and refined existing models like Qwen and Llama.

The results look promising - their 32B and 70B models match or exceed OpenAI-o1-mini on most tests. Interestingly, even their tiny 1.5B model outperforms some larger models on math tests, though this likely says more about the limitations of benchmarks than the model's overall capabilities.

Recommendation
AI research

New foundation model "Evo" unlocks sequence modeling and design at the genomic scale

Performance comparison table showing benchmark scores for DeepSeek R1 distilled models versus other AI models on AIME, MATH-500, and coding tests.
DeepSeek's distilled models show strong reasoning abilities, with larger versions like R1-Distill-Llama-70B and R1-Distill-Qwen-32B scoring well on math and coding tests. The results indicate they successfully transferred knowledge from the main model to these smaller, more efficient versions. | Image: DeepSeek

DeepSeek says the smaller models perform so well because they successfully captured the reasoning patterns of the larger model. Direct reinforcement learning on smaller models didn't work nearly as well. The team has made all these distilled models available as open source.

Looking ahead

DeepSeek plans to enhance R1's general capabilities, especially in areas like function calling, multi-turn conversations, and complex role-playing. The company acknowledges that the model still lags behind others, including their own DeepSeek-V3, in these areas.

They're also working to fix issues with language mixing and prompt sensitivity. For example, they found that performance drops significantly when using few-shot prompts. Interestingly, many of these limitations match those reported by OpenAI when they launched o1.

They're also working to enhance the model's coding abilities through additional reinforcement learning. According to their paper, the team is developing more efficient methods to implement this training.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Availability

DeepSeek-R1 uses the MIT license, which allows free use of the model weights and outputs, including for fine-tuning and distillation. All model variants and their documentation can be found on GitHub and HuggingFace.

The model is also available through DeepSeek's API - users can access it with the parameter "model=deepseek-reasoner". Pricing is set at $0.14 per million input tokens for cache hits, $0.55 for cache misses, and $2.19 per million output tokens.

While the benchmark results are impressive, real-world testing in the coming days will show whether DeepSeek-R1 can truly match OpenAI's o1 in practice. Looking ahead, DeepSeek could close the gap with OpenAI's recently introduced o3 model in its next release - but that depends on whether they can maintain their current development pace and whether their training methods can deliver the same kind of improvements that OpenAI achieved with o3 in just three months.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Chinese AI startup DeepSeek has released two new reasoning models - DeepSeek-R1 and DeepSeek-R1-Zero - that perform on par with OpenAI's o1 in benchmarks.
  • DeepSeek-R1-Zero was developed using only reinforcement learning and a rule-based reward system, without any human examples. DeepSeek-R1 built on this approach by adding a small set of initial training data to reach o1-level performance
  • The company has also released six smaller open-source. models, trained using data from DeepSeek-R1. Some of these distilled versions match the capabilities of OpenAI's o1-mini.
Sources
DeepSeek
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
AI and society
Update

Deepseek's V3 is the latest example of state-controlled censorship in Chinese LLMs

News, tests and reports about VR, AR and MIXED Reality.
Bricked Quest headsets: A four-year-old bug is to blame Who will win the headset wars: Meta, Apple or Google? Oculus founder Palmer Luckey hints at VR announcement coming soon MIXED-NEWS.com
AI research

Deepseek's $5.6M Chinese LLM wonder shakes up the AI elite

AI in practice

Deepseek-V3 emerges as China's most powerful open-source language model to date

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

DeepSeek's latest R1-Zero model matches OpenAI's o1 in reasoning benchmarks

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI research

DeepSeek's latest R1-Zero model matches OpenAI's o1 in reasoning benchmarks

AI research

MatterGen: Microsoft presents AI tools for generating and simulating new materials

AI in practice

Meta's LibGen controversy reveals how desperate AI companies are for quality training data

Google News