Qwen3 series from Alibaba debuts with benchmark results matching top competitors

Alibaba launches Qwen3, an open source family of models designed to compete with leading systems.

Alibaba has released its Qwen3 model series, which achieves benchmark results on par with current top models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro.

The two largest models in the lineup, Qwen3-235B-A22B and Qwen3-30B-A3B (both using a Mixture-of-Experts architecture), match the performance of leading systems in standard tests for coding, mathematics, and general capabilities—often with smaller model sizes. According to benchmark data, these strong results were achieved in reasoning mode, likely using the highest available token budget.

Introducing Qwen3!

We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general... pic.twitter.com/JWZkJeHWhC

- Qwen (@Alibaba_Qwen) April 28, 2025

The pretraining process for Qwen3 involved 36 trillion tokens—more than Llama 4 Maverick (22T) but less than Llama 4 Scout (40T). The training data includes web content, documents, and custom-generated mathematics and programming datasets. Qwen3 models are released under the Apache 2.0 license, making them freely available.

Qwen 3 is a hybrid open source model

A key feature of Qwen3 is its ability to switch between two reasoning modes. In "Thinking Mode," the model solves tasks with detailed intermediate steps. In "Non-Thinking Mode," it delivers fast, direct answers. Similar approaches are found in other models, including Claude 3.7 and Grok. More complex tasks can benefit from the reasoning function, while the faster mode is designed for routine queries.

Qwen3 is a win for open weights & efficiency - hybrid reasoning models that approach DeepSeek R1's GPQA score with 1/3 the total parameters and a range of smaller models suited for compute limited environments

Today, Alibaba announced eight hybrid reasoning models of varying... pic.twitter.com/NMdA64mZjE

- Artificial Analysis (@ArtificialAnlys) April 29, 2025

Alibaba states that the models support 119 languages and dialects, covering widely spoken languages like English, Chinese, and Arabic, as well as numerous minority languages and regional dialects. Actual model performance will depend on the specific application context.

Published benchmark results indicate a high-performance model series that, by size, currently outpaces competitors such as Meta’s Llama series and DeepSeek. However, this lead may be short-lived: Meta is hosting its first Llamacon today and is expected to introduce a reasoning model based on Llama-4, while DeepSeek is anticipated to release the successor to R1 in the coming weeks.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI in practice

Qwen3 series from Alibaba debuts with benchmark results matching top competitors

Qwen 3 is a hybrid open source model

Ordinary chatbot answers could be an asset in court, judge suggests

Alibaba's Qwen2.5-VL-32B matches larger models with just 32B parameters

Alibaba's Qwen2.5 Turbo reads ten novels in just about one minute

Alibaba's Qwen 2.5 AI models are gunning for Llama 3's crown in latest benchmark

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

Qwen3 series from Alibaba debuts with benchmark results matching top competitors

Qwen 3 is a hybrid open source model

Share

Bank details