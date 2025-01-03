AI research
Maximilian Schreiner

AI learns math better with new approach that uses a fraction of the data

Midjourney prompted by THE DECODER
AI learns math better with new approach that uses a fraction of the data
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
Content
summary Summary

Researchers have found a more efficient way to help AI models learn mathematics. Their new approach, called PRIME, delivers better results while using just a fraction of the training data needed by other methods.

Ad

The team tested their method on a model called Eurus-2-7B-PRIME, which builds on the existing Qwen 2.5 Math 7B system. After training with PRIME (Process Reinforcement through Implicit Rewards), the model's performance jumped from 32.2% to 48.9% across mathematical benchmarks - a significant improvement of 16.7 percentage points.

These results are particularly impressive when compared to larger models. GPT-4o manages 43.3%, while Llama-3.1-70B-Instruct reaches 35.7%. Even the specialized Qwen-2.5-Math-7B-Instruct scores lower at 43.8%.

The biggest improvements showed up in the American Invitational Mathematics Examination (AIME), one of the toughest math competitions for high school students. The PRIME-trained model solved 26.7% of these problems correctly, up from just 3.3%. For comparison, GPT-4o only solved 9.3%, Llama-3.1-70B-Instruct managed 16.7%, and Qwen-2.5-Math-7B-Instruct reached 13.3%.

Ad
Ad

PRIME is very data efficient

What makes PRIME different is how it teaches AI models. Instead of just telling the model whether its final answer is right or wrong, PRIME provides continuous feedback throughout the problem-solving process using what researchers call "implicit process rewards."

The system is remarkably efficient with its resources. While the Qwen2.5-Math-7B-Instruct model needed 2.5 million training examples, PRIME achieved better results with just 230,000. It's also more efficient during the learning process itself, requiring only four solution attempts per problem, compared to Qwen's 32 attempts to achieve similar results.

Researchers have made all their data available on GitHub for others to explore and build upon.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers have developed a new approach called PRIME that helps AI models learn mathematics more efficiently, delivering better results while using only a fraction of the training data required by other methods.
  • The PRIME-trained model, Eurus-2-7B-PRIME, outperformed larger models like GPT-4o and Llama-3.1-70B-Instruct across mathematical benchmarks, with a significant improvement of 16.7 percentage points compared to its predecessor, Qwen 2.5 Math 7B.
  • PRIME provides continuous feedback throughout the problem-solving process using "implicit process rewards," requiring only 230,000 training examples and four solution attempts per problem to achieve better results than the Qwen2.5-Math-7B-Instruct model, which needed 2.5 million examples and 32 attempts.
Sources
Notion
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
AI research

Study reveals AI models have hidden capabilities they can't access through normal prompts

News, tests and reports about VR, AR and MIXED Reality.
Meta released a tech demo for Quest 3, and we gave it a try Meta Quest Browser now lets you import passwords After recent software update debacle, Meta simplifies exchange process for broken Quest VR headsets MIXED-NEWS.com
AI research

Scaling laws for precision: AI researcher sees "perfect storm" for the end of scale

AI research

REPA accelerates diffusion model training by a factor of 17.5

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

AI learns math better with new approach that uses a fraction of the data

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI in practice

The great AI scaling debate continues into 2025

AI research

Deepseek's $5.6M Chinese LLM wonder shakes up the AI elite

AI in practice

OpenAI unveils o3, its most advanced reasoning model yet

Google News