Content
summary Summary

Alibaba's AI research unit Qwen has released a new series of AI models designed specifically for software development.

Ad

Called Qwen-2.5-Coder, these models help developers write, analyze, and understand code. The new series includes six different model sizes, ranging from 0.5 to 32 billion parameters, to accommodate various use cases and computing requirements.

Qwen tested these models in two practical applications: the AI-powered code editor Cursor and a web-based chatbot with artifact support similar to ChatGPT or Claude. Alibaba plans to integrate the chatbot functionality into its Tongyi cloud platform soon.

Video: Qwen

Ad
Ad

Video: Qwen

According to Qwen, their largest model, Qwen-2.5-Coder-32B-Instruct, outperformed other open-source systems like DeepSeek-Coder and Codestral in code generation benchmarks. The model also showed strong performance in general tasks like logical reasoning and language comprehension, though GPT-4o still leads in some benchmark tests.

Comparison chart: Performance metrics of various code models such as Qwen2.5, DeepSeek, GPT-4o, and Claude 3.5 across twelve benchmark categories.
Qwen2.5-Coder-32B-Instruct achieves top scores in code generation, repair, and reasoning. It outperforms other open-source models in benchmarks such as EvalPlus and LiveCodeBench, and shows comparable performance to GPT-4o. | Image: Qwen

Massive training datasets set token record

The models were trained on more than 20 trillion tokens of data from two sources: 18.5 trillion tokens from Qwen 2.5's general data mix introduced last September, plus 5.5 trillion tokens from public source code and programming-related web content. This makes it the first open-source model to exceed 20 trillion training tokens.

The top model, Qwen-2.5-Coder-32B-Instruct, supports over 40 programming languages, from common ones like Python, Java, and JavaScript to specialized languages like Haskell and Racket. All models feature context windows of up to 128,000 tokens.

Multi-part bar chart: Comparison of the McEval performance of five AI models in 28 programming languages, with Qwen2.5 as the top performer.
Qwen2.5-Coder-32B-Instruct shows great versatility and performance in over 40 programming languages. Especially in functional languages such as Haskell and Racket, the model shows its strengths thanks to its optimized training data. | Image: Qwen

Alibaba has released all models except the three-billion-parameter version under an Apache 2.0 license on GitHub. Developers can test the models through a free demo on Hugging Face.

Recommendation

Qwen researchers found that scaling up both model size and data consistently produced better results across programming tasks. The company says it plans to continue scaling to larger models and improving reasoning capabilities in future releases.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Alibaba's research unit Qwen has introduced a new series of AI models called Qwen-2.5-Coder, designed to help programmers write, analyze, and understand code, with model sizes ranging from 0.5 to 32 billion parameters.
  • The models demonstrated strong performance in real-world tests, both integrated into the Cursor AI editor and as a web chatbot with artifact support, with the largest model, Qwen-2.5-Coder-32B-Instruct, outperforming available open-source systems on code generation benchmarks.
  • Qwen's approach to improving the performance of its code AI models relies on scaling, with models trained on over 5.5 trillion tokens from public source code and programming-related web text. All models except the 3 billion parameter version are available under the Apache 2.0 license on GitHub.
Sources
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.