Researchers have found a more efficient way to help AI models learn mathematics. Their new approach, called PRIME, delivers better results while using just a fraction of the training data needed by other methods.
The team tested their method on a model called Eurus-2-7B-PRIME, which builds on the existing Qwen 2.5 Math 7B system. After training with PRIME (Process Reinforcement through Implicit Rewards), the model's performance jumped from 32.2% to 48.9% across mathematical benchmarks - a significant improvement of 16.7 percentage points.
These results are particularly impressive when compared to larger models. GPT-4o manages 43.3%, while Llama-3.1-70B-Instruct reaches 35.7%. Even the specialized Qwen-2.5-Math-7B-Instruct scores lower at 43.8%.
The biggest improvements showed up in the American Invitational Mathematics Examination (AIME), one of the toughest math competitions for high school students. The PRIME-trained model solved 26.7% of these problems correctly, up from just 3.3%. For comparison, GPT-4o only solved 9.3%, Llama-3.1-70B-Instruct managed 16.7%, and Qwen-2.5-Math-7B-Instruct reached 13.3%.
PRIME is very data efficient
What makes PRIME different is how it teaches AI models. Instead of just telling the model whether its final answer is right or wrong, PRIME provides continuous feedback throughout the problem-solving process using what researchers call "implicit process rewards."
The system is remarkably efficient with its resources. While the Qwen2.5-Math-7B-Instruct model needed 2.5 million training examples, PRIME achieved better results with just 230,000. It's also more efficient during the learning process itself, requiring only four solution attempts per problem, compared to Qwen's 32 attempts to achieve similar results.
Researchers have made all their data available on GitHub for others to explore and build upon.