Alibaba launches Qwen3, an open source family of models designed to compete with leading systems.
Alibaba has released its Qwen3 model series, which achieves benchmark results on par with current top models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro.
The two largest models in the lineup, Qwen3-235B-A22B and Qwen3-30B-A3B (both using a Mixture-of-Experts architecture), match the performance of leading systems in standard tests for coding, mathematics, and general capabilities—often with smaller model sizes. According to benchmark data, these strong results were achieved in reasoning mode, likely using the highest available token budget.
Introducing Qwen3!
We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general... pic.twitter.com/JWZkJeHWhC
- Qwen (@Alibaba_Qwen) April 28, 2025
The pretraining process for Qwen3 involved 36 trillion tokens—more than Llama 4 Maverick (22T) but less than Llama 4 Scout (40T). The training data includes web content, documents, and custom-generated mathematics and programming datasets. Qwen3 models are released under the Apache 2.0 license, making them freely available.
Qwen 3 is a hybrid open source model
A key feature of Qwen3 is its ability to switch between two reasoning modes. In "Thinking Mode," the model solves tasks with detailed intermediate steps. In "Non-Thinking Mode," it delivers fast, direct answers. Similar approaches are found in other models, including Claude 3.7 and Grok. More complex tasks can benefit from the reasoning function, while the faster mode is designed for routine queries.
Qwen3 is a win for open weights & efficiency - hybrid reasoning models that approach DeepSeek R1's GPQA score with 1/3 the total parameters and a range of smaller models suited for compute limited environments
Today, Alibaba announced eight hybrid reasoning models of varying... pic.twitter.com/NMdA64mZjE
- Artificial Analysis (@ArtificialAnlys) April 29, 2025
Alibaba states that the models support 119 languages and dialects, covering widely spoken languages like English, Chinese, and Arabic, as well as numerous minority languages and regional dialects. Actual model performance will depend on the specific application context.
Published benchmark results indicate a high-performance model series that, by size, currently outpaces competitors such as Meta’s Llama series and DeepSeek. However, this lead may be short-lived: Meta is hosting its first Llamacon today and is expected to introduce a reasoning model based on Llama-4, while DeepSeek is anticipated to release the successor to R1 in the coming weeks.