DeepSeek-Coder-V2: Open-source model beats GPT-4 and Claude Opus

Midjourney prompted by THE DECODER

The academic research collective DeepSeek-AI has released the open-source language model DeepSeek-Coder-V2. It aims to compete with leading commercial models like GPT-4, Claude, or Gemini in code generation capabilities.

DeepSeek-Coder-V2 builds on the previous DeepSeek-V2 model and has been additionally trained on 6 trillion tokens from a high-quality multi-source corpus. The model now supports 338 programming languages, up from 86, and can process contexts of up to 128,000 tokens, up from 16,000.

The training dataset consists of 60% source code, 10% mathematical data, and 30% natural language. The code portion contains 1.17 trillion tokens from GitHub and CommonCrawl, while the mathematical part includes 221 billion tokens from CommonCrawl.

DeepSeek-Coder-V2 uses a Mixture-of-Experts architecture and comes in two variants: The 16-billion-parameter model has only 2.4 billion active parameters, while the 236-billion model has just 21 billion. Both versions have been trained on a total of 10.2 trillion tokens.

DeepSeek-Coder-V2 breaks the dominance of closed models

In benchmarks like HumanEval or MBPP, DeepSeek-Coder-V2 can keep up with the best commercial models, according to DeepSeek-AI. The 236-billion version achieved an average of 75.3%, slightly lower than GPT-4o's 76.4% but better than GPT-4, or Claude 3 Opus.

In mathematical benchmarks such as GSM8K, MATH, or AIME, DeepSeek-Coder-V2 is on par with the leading commercial models. In language tasks, it performs similarly to its predecessor, DeepSeek-V2.

The DeepSeek-Coder-V2 model is available for download on Hugging Face under an open-source license. It can be used for both research and commercial purposes without restrictions. It is also accessible via an API.

Despite the impressive results, the developers see room for improvement in the model's ability to follow instructions. This is crucial for handling complex programming scenarios in the real world, which DeepSeek-AI aims to work on in the future.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

DeepSeek-Coder-V2: Open-source model beats GPT-4 and Claude Opus

DeepSeek-Coder-V2 breaks the dominance of closed models

AI persuades best by overwhelming people with information instead of using psychological tricks

Richard Sutton says the AI industry has "lost its way" by ignoring core principles of intelligence

Alibaba upgrades its Qwen image model with visual and semantic image editing

Deepseek's first hybrid model V3.1 outperforms its R1 reasoning model on benchmarks

Meta's human-like chatbot personas can mislead users and result in real-world harm

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

DeepSeek-Coder-V2: Open-source model beats GPT-4 and Claude Opus

DeepSeek-Coder-V2 breaks the dominance of closed models

AI persuades best by overwhelming people with information instead of using psychological tricks

Richard Sutton says the AI industry has "lost its way" by ignoring core principles of intelligence

Alibaba upgrades its Qwen image model with visual and semantic image editing