AI learns math better with new approach that uses a fraction of the data

Midjourney prompted by THE DECODER

Researchers have found a more efficient way to help AI models learn mathematics. Their new approach, called PRIME, delivers better results while using just a fraction of the training data needed by other methods.

The team tested their method on a model called Eurus-2-7B-PRIME, which builds on the existing Qwen 2.5 Math 7B system. After training with PRIME (Process Reinforcement through Implicit Rewards), the model's performance jumped from 32.2% to 48.9% across mathematical benchmarks - a significant improvement of 16.7 percentage points.

These results are particularly impressive when compared to larger models. GPT-4o manages 43.3%, while Llama-3.1-70B-Instruct reaches 35.7%. Even the specialized Qwen-2.5-Math-7B-Instruct scores lower at 43.8%.

The biggest improvements showed up in the American Invitational Mathematics Examination (AIME), one of the toughest math competitions for high school students. The PRIME-trained model solved 26.7% of these problems correctly, up from just 3.3%. For comparison, GPT-4o only solved 9.3%, Llama-3.1-70B-Instruct managed 16.7%, and Qwen-2.5-Math-7B-Instruct reached 13.3%.

PRIME is very data efficient

What makes PRIME different is how it teaches AI models. Instead of just telling the model whether its final answer is right or wrong, PRIME provides continuous feedback throughout the problem-solving process using what researchers call "implicit process rewards."

The system is remarkably efficient with its resources. While the Qwen2.5-Math-7B-Instruct model needed 2.5 million training examples, PRIME achieved better results with just 230,000. It's also more efficient during the learning process itself, requiring only four solution attempts per problem, compared to Qwen's 32 attempts to achieve similar results.

Researchers have made all their data available on GitHub for others to explore and build upon.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

AI learns math better with new approach that uses a fraction of the data

PRIME is very data efficient

AI training shifts from clickworkers to experts in physics, biology and engineering

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Researchers train AI to generate long-form text using only reinforcement learning

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

AI learns math better with new approach that uses a fraction of the data

PRIME is very data efficient

AI training shifts from clickworkers to experts in physics, biology and engineering

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Researchers train AI to generate long-form text using only reinforcement learning