Content
summary Summary

Deepseek has rolled out V3.1-Terminus, an improved version of its hybrid AI model Deepseek-V3.1.

Ad

V3.1-Terminus now does a better job distinguishing between Chinese and English, and eliminates errors like random special characters. Deepseek has also tweaked its built-in agents, including code and search agents, for more reliable results, the company says.

Benchmark results show the biggest gains in tasks that require tool use. On the BrowseComp benchmark, V3.1-Terminus jumps from 30.0 to 38.5 points. On Terminal-bench, it goes from 31.3 to 36.7.

Deepseek's chart also indicates a tradeoff: performance improves on the English-language BrowseComp, while BrowseComp-ZH on the Chinese web slips slightly. For pure reasoning tasks without tool use, the improvements are more modest.

Ad
Ad
Tabular comparison of DeepSeek V3.1 vs. V3.1 Terminus in reasoning and tool benchmarks; Terminus significantly increases tool scores.
V3.1-Terminus posts larger gains on agent tasks that use external tools. Scores climb on BrowseComp, while BrowseComp-ZH dips a bit, hinting at a tradeoff between English- and Chinese-web performance. BrowseComp measures multi-step live web searches. | Image: Deepseek

The model is available through app, web, and API. Open-source weights can be found on Hugging Face under an MIT license.

Two thinking modes and aggressive pricing

V3.1-Terminus builds on Deepseek-V3.1, first released in August, which introduced two separate modes: a "thinking" mode (Deepseek-reasoner) for complex, tool-based tasks, and a "non-thinking" mode (Deepseek-chat) for straightforward conversations. Both modes support a context window of up to 128,000 tokens.

The model was trained on an additional 840 billion tokens, using a new tokenizer and updated prompt templates. Deepseek-V3.1 has already posted strong results against hybrid models from OpenAI and Anthropic, and outperformed Deepseek's own pure reasoning model R1.

Deepseek has kept its aggressive pricing from the initial release: output tokens still cost $1.68 per million, well below GPT-5 ($10.00) and Claude Opus 4.1 (up to $75.00). The API charges $0.07 per million tokens for cache hits and $0.56 for cache misses.

Like other Chinese AI models, Deepseek's latest release is subject to state censorship, making it a propaganda tool for the Chinese government on political topics. The Trump administration has proposed similar restrictions for US-based models. According to a recent Deepseek code review, these interventions can directly impact model performance.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Deepseek has released V3.1-Terminus, an updated hybrid AI model that delivers more consistent results across languages and achieves notable gains on tool usage benchmarks like BrowseComp and Terminal-bench.
  • The model maintains its two operating modes—one optimized for complex tasks involving tools and another for straightforward conversations—and can process up to 128,000 tokens in a single context.
  • With a price of $1.68 per million output tokens, V3.1-Terminus is significantly less expensive than similar offerings from OpenAI and Anthropic, and its open-source weights are available on Hugging Face.
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.