Alibaba's Qwen 2.5 AI models are gunning for Llama 3's crown in latest benchmark

Alibaba Cloud has launched Qwen 2.5, a new generation of AI models that rival leading open-source alternatives like Llama 3.1 in benchmark tests. The suite includes variants for general language tasks, programming, and mathematics.

The Qwen 2.5 series offers models ranging from 0.5 to 72 billion parameters. Alibaba claims its largest model, Qwen2.5-72B, outperforms competitors such as Llama-3.1-70B and Mistral-Large-V2 on benchmarks like MMLU. Smaller versions, including Qwen2.5-14B and Qwen2.5-32B, reportedly match the performance of larger models like Phi-3.5-MoE-Instruct and Gemma2-27B-IT.

According to Alibaba, the Qwen2.5 models were trained on a dataset of up to 18 trillion tokens and support over 29 languages. They can process up to 128,000 tokens and generate 8,000 tokens.

Qwen2.5-Coder, optimized for programming tasks, reportedly outperforms many larger language models across various programming languages and tasks, despite its smaller size.

Qwen2.5-Math builds on the earlier Qwen2-Math, incorporating additional mathematical data, including synthetic data generated by its predecessor. Alibaba reports that Qwen2.5-Math-72B-Instruct surpasses models like GPT-4o, Claude 3.5 Sonnet, and Llama 3.1 405B on math-focused benchmarks such as GSM8K, Math, and MMLU-STEM.

Some models open-source

Most Qwen2.5 models are open-source under the Apache 2.0 license, except for the 3B and 72B variants. Alibaba also offers API access to its most powerful models through Qwen-Plus and Qwen-Turbo.

The company highlights improvements in processing structured data, generating structured output, and adapting to various system prompts. These enhancements aim to simplify the implementation of role-playing games and chatbot configuration.

Qwen 2.5 follows earlier releases like Qwen2 and Qwen2-VL, a multimodal model capable of analyzing images and videos up to 20 minutes long.

Alibaba plans to develop even larger Qwen models in the future, including more multimodal variants with image and audio capabilities. All models are available on GitHub.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

Alibaba's Qwen 2.5 AI models are gunning for Llama 3's crown in latest benchmark

Some models open-source

Researchers put OpenAI's o1 through its paces, exposing both breakthroughs and limitations

Qwen3 series from Alibaba debuts with benchmark results matching top competitors

Alibaba's Qwen2.5-VL-32B matches larger models with just 32B parameters

Alibaba's Qwen2.5 Turbo reads ten novels in just about one minute

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Alibaba's Qwen 2.5 AI models are gunning for Llama 3's crown in latest benchmark

Some models open-source

Share

Bank details