Content
summary Summary

Alibaba has released Qwen3-Max, the biggest and most capable AI model in its lineup. The new model is built for real-world software development and automation, with major performance upgrades across the board.

Ad

Qwen3-Max comes with more than one trillion parameters and was trained on 36 trillion tokens. A preview version called Qwen3-Max-Instruct, launched earlier this month, already reached third place on the Text Arena Leaderboard, ahead of GPT-5-Chat—OpenAI's GPT-5 variant running with reasoning set to low.

Qwen3-Max sits just behind the largest models from Google, Anthropic, and OpenAI on the Text Arena Leaderboard. | Image: Alibaba

Qwen3-Max uses the same architecture as the Qwen3 series, first announced in April, but scales it up to over a trillion parameters. The model relies on a Mixture of Experts (MoE) approach, which means only a subset of parameters are active during inference.

30 percent jump in training efficiency

Alibaba says the training process for Qwen3-Max was unusually stable, with a smooth loss curve and no sudden spikes, rollbacks, or major adjustments. Optimized parallelization made training Qwen3-Max-Base 30 percent more efficient compared to Qwen2.5-Max-Base.

Ad
Ad

For long-context training, the team used new techniques that tripled throughput and made it possible to handle input sequences up to one million tokens. New tools for automatic monitoring and recovery also cut downtime from hardware failures to just a fifth of what was seen with the previous generation, according to Alibaba.

The Qwen team reports that Qwen3-Max-Instruct posted top scores across a wide range of benchmarks, including knowledge, reasoning, programming, instruction following, human preference alignment, agent tasks, and multilingual understanding.

The biggest improvements are in programming and agent abilities, which keeps the model focused on real software development and automation. On SWE-Bench Verified, a benchmark for fixing real-world software bugs, Qwen3-Max-Instruct scored 69.6. Alibaba says this puts it among the top-performing models available.

Image: Alibaba

On Tau2-Bench, which tests how well models can call external tools and handle complex workflows, Qwen3-Max-Instruct scored 74.8, coming in ahead of Claude 4 Opus and Deepseek V3.1.

A reasoning-focused variant, Qwen3-Max-Thinking, is still in training but has already maxed out the AIME 25 and HMMT math benchmarks, matching results from GPT-5 Pro and Grok 4. The model uses a code interpreter and extra compute during testing.

Recommendation
Balkendiagramm: Qwen3-Max-Thinking erzielt 100 Punkte auf AIME25/HMMT25 und 85,4 Punkte auf GPQA im Vergleich zu Grok4 und GPT-5 Pro.
Image: Alibaba

Test-time compute lets the model run several solutions at once and pick the best one. AIME is a tough student math competition and is often used as a benchmark for logical reasoning. Alibaba plans to release Qwen3-Max-Thinking soon.

Availability and API access

Qwen3-Max-Instruct is available on Qwen Chat, but like many other Qwen models, it is not open-source. Developers can use the API through Alibaba Cloud Model Studio, and the interface is compatible with OpenAI APIs.

Qwen3-Max is part of a wider expansion of Alibaba's AI lineup. The company recently introduced Qwen-3-TTS-Flash for voice generation, Qwen-Image-Edit for image editing, Qwen3-Next for faster text processing, and Qwen3-Omni, a multimodal model for text, image, and audio tasks.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Alibaba has introduced Qwen3-Max, its largest language model to date, with over a trillion parameters and a 30 percent efficiency improvement over its previous version, thanks to a mixture-of-experts architecture and enhanced training methods.
  • Qwen3-Max-Instruct leads in software debugging benchmarks for agent tool calling, outperforming models like Claude Opus 4 and DeepSeek V3.1, while the upcoming Qwen3-Max-Thinking variant achieved perfect results in challenging math tests.
  • The model is accessible through Qwen Chat and an API in Alibaba Cloud Model Studio, supports OpenAI-compatible interfaces, is not open source, and is targeted at software development and automation applications.
Sources
Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.