Alibaba's Qwen AI unit releases capable new models to help developers write and analyze code

Nov 12, 2024

Midjourney prompted by THE DECODER

Alibaba's AI research unit Qwen has released a new series of AI models designed specifically for software development.

Called Qwen-2.5-Coder, these models help developers write, analyze, and understand code. The new series includes six different model sizes, ranging from 0.5 to 32 billion parameters, to accommodate various use cases and computing requirements.

Qwen tested these models in two practical applications: the AI-powered code editor Cursor and a web-based chatbot with artifact support similar to ChatGPT or Claude. Alibaba plans to integrate the chatbot functionality into its Tongyi cloud platform soon.

Video: Qwen

According to Qwen, their largest model, Qwen-2.5-Coder-32B-Instruct, outperformed other open-source systems like DeepSeek-Coder and Codestral in code generation benchmarks. The model also showed strong performance in general tasks like logical reasoning and language comprehension, though GPT-4o still leads in some benchmark tests.

Comparison chart: Performance metrics of various code models such as Qwen2.5, DeepSeek, GPT-4o, and Claude 3.5 across twelve benchmark categories. — Qwen2.5-Coder-32B-Instruct achieves top scores in code generation, repair, and reasoning. It outperforms other open-source models in benchmarks such as EvalPlus and LiveCodeBench, and shows comparable performance to GPT-4o. | Image: Qwen

Massive training datasets set token record

The models were trained on more than 20 trillion tokens of data from two sources: 18.5 trillion tokens from Qwen 2.5's general data mix introduced last September, plus 5.5 trillion tokens from public source code and programming-related web content. This makes it the first open-source model to exceed 20 trillion training tokens.

The top model, Qwen-2.5-Coder-32B-Instruct, supports over 40 programming languages, from common ones like Python, Java, and JavaScript to specialized languages like Haskell and Racket. All models feature context windows of up to 128,000 tokens.

Multi-part bar chart: Comparison of the McEval performance of five AI models in 28 programming languages, with Qwen2.5 as the top performer. — Qwen2.5-Coder-32B-Instruct shows great versatility and performance in over 40 programming languages. Especially in functional languages such as Haskell and Racket, the model shows its strengths thanks to its optimized training data. | Image: Qwen

Alibaba has released all models except the three-billion-parameter version under an Apache 2.0 license on GitHub. Developers can test the models through a free demo on Hugging Face.

Qwen researchers found that scaling up both model size and data consistently produced better results across programming tasks. The company says it plans to continue scaling to larger models and improving reasoning capabilities in future releases.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Alibaba's Qwen AI unit releases capable new models to help developers write and analyze code

Massive training datasets set token record

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.