Deepseek slashes API prices by up to 75 percent with its latest V3.2 model

Deepseek has rolled out its experimental language model, Deepseek-V3.2-Exp, building on the recent V3.1-Terminus release.

At the heart of the upgrade is DeepSeek Sparse Attention (DSA), which selectively focuses on relevant parts of the input. This change dramatically cuts inference costs for large inputs, handling up to 128,000 tokens. According to Deepseek's technical report, costs at the 128K token level are about 3.5 times lower for prefilling and 6 to 7 times lower for decoding.

Deepseek's new model architecture significantly lowers inference costs for long-context workloads. | Image: Deepseek

The new release also features TileLang, a high-level programming framework that runs on multiple hardware platforms. This means V3.2-Exp can run on AI chips from Chinese vendors like Huawei Ascend and Cambricon out of the box. Deepseek appears to be positioning itself for a future where China reduces its reliance on US chipmakers like Nvidia.

Similar performance, steep price cuts

In benchmarks, Deepseek-V3.2-Exp performs about the same as V3.1-Terminus, with only minor differences. Deepseek notes small gains or losses on individual tasks, mostly due to shorter responses in complex reasoning tests. These gaps disappear in tests with similar token counts.

Benchmark	DeepSeek-V3.1-Terminus	DeepSeek-V3.2-Exp
Reasoning Mode w/o Tool Use
MMLU-Pro	85.0	85.0
GPQA-Diamond	80.7	79.9
Humanity's Last Exam	21.7	19.8
LiveCodeBench	74.9	74.1
AIME 2025	88.4	89.3
HMMT 2025	86.1	83.6
Codeforces	2046	2121
Aider-Polyglot	76.1	74.5
Agentic Tool Use
BrowseComp	38.5	40.1
BrowseComp_zh	45.0	47.9
SimpleQA	96.8	97.1
SWE Verified	68.4	67.8
SWE-bench Multilingual	57.8	57.9
Terminal-bench	36.7	37.7

Despite similar performance, the new model is much cheaper to run. Deepseek has cut API prices by 50 to 75 percent. This could put added pressure on Western providers like Anthropic, who charge more for comparable models. However, ongoing skepticism about Chinese AI models may limit the impact for now.

	New price	Old price	Reduction
Input (cache hit)	0.028 US dollars / 1 million tokens	0.07 US dollars / 1 million tokens	-60%
Input (cache miss)	0.28 US dollars / 1 million tokens	0.56 US dollars / 1 million tokens	-50%
Output (cache miss)	0.42 US dollars / 1 million tokens	1.68 US dollars / 1 million tokens	-75%

Deepseek-V3.2-Exp is available through the web interface, iOS and Android apps, the API, and as downloadable checkpoints on Hugging Face, with V3.1-Terminus remaining accessible for comparison testing until October 15, 2025.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Deepseek slashes API prices by up to 75 percent with its latest V3.2 model

Similar performance, steep price cuts

DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble

Deepseek's OCR system compresses image-based text so AI can handle much longer documents

Deepseek's hybrid reasoning model V3.1-Terminus delivers higher scores on tool-based agent tasks

DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

Deepseek slashes API prices by up to 75 percent with its latest V3.2 model

Similar performance, steep price cuts

DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble

Deepseek's OCR system compresses image-based text so AI can handle much longer documents

Deepseek's hybrid reasoning model V3.1-Terminus delivers higher scores on tool-based agent tasks