Content
summary Summary

Deepseek has rolled out its experimental language model, Deepseek-V3.2-Exp, building on the recent V3.1-Terminus release.

Ad

At the heart of the upgrade is DeepSeek Sparse Attention (DSA), which selectively focuses on relevant parts of the input. This change dramatically cuts inference costs for large inputs, handling up to 128,000 tokens. According to Deepseek's technical report, costs at the 128K token level are about 3.5 times lower for prefilling and 6 to 7 times lower for decoding.

Deepseek's new model architecture significantly lowers inference costs for long-context workloads. | Image: Deepseek

The new release also features TileLang, a high-level programming framework that runs on multiple hardware platforms. This means V3.2-Exp can run on AI chips from Chinese vendors like Huawei Ascend and Cambricon out of the box. Deepseek appears to be positioning itself for a future where China reduces its reliance on US chipmakers like Nvidia.

Similar performance, steep price cuts

In benchmarks, Deepseek-V3.2-Exp performs about the same as V3.1-Terminus, with only minor differences. Deepseek notes small gains or losses on individual tasks, mostly due to shorter responses in complex reasoning tests. These gaps disappear in tests with similar token counts.

Ad
Ad
Benchmark DeepSeek-V3.1-Terminus DeepSeek-V3.2-Exp
Reasoning Mode w/o Tool Use
MMLU-Pro 85.0 85.0
GPQA-Diamond 80.7 79.9
Humanity's Last Exam 21.7 19.8
LiveCodeBench 74.9 74.1
AIME 2025 88.4 89.3
HMMT 2025 86.1 83.6
Codeforces 2046 2121
Aider-Polyglot 76.1 74.5
Agentic Tool Use
BrowseComp 38.5 40.1
BrowseComp_zh 45.0 47.9
SimpleQA 96.8 97.1
SWE Verified 68.4 67.8
SWE-bench Multilingual 57.8 57.9
Terminal-bench 36.7 37.7

Despite similar performance, the new model is much cheaper to run. Deepseek has cut API prices by 50 to 75 percent. This could put added pressure on Western providers like Anthropic, who charge more for comparable models. However, ongoing skepticism about Chinese AI models may limit the impact for now.

New price Old price Reduction
Input (cache hit) 0.028 US dollars / 1 million tokens 0.07 US dollars / 1 million tokens -60%
Input (cache miss) 0.28 US dollars / 1 million tokens 0.56 US dollars / 1 million tokens -50%
Output (cache miss) 0.42 US dollars / 1 million tokens 1.68 US dollars / 1 million tokens -75%

Deepseek-V3.2-Exp is available through the web interface, iOS and Android apps, the API, and as downloadable checkpoints on Hugging Face, with V3.1-Terminus remaining accessible for comparison testing until October 15, 2025.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Deepseek has launched Deepseek-V3.2-Exp, a language model featuring a new attention architecture called Deepseek Sparse Attention, which enables more efficient handling of longer texts and builds on the previous V3.1-Terminus model.
  • Benchmark results show that V3.2-Exp delivers similar performance to V3.1-Terminus, but at a significantly lower cost.
  • In response, Deepseek is cutting API prices for V3.2-Exp by 50 to 75 percent and making the model instantly accessible via app, web, and API.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.