Researchers discover three factors that make AI agents significantly smarter
Key Points
- Researchers found that high-quality, diverse data and well-designed task distribution are key for effective AI training, while algorithm design and the right mindset also play a major role.
- New training algorithms using token-based scoring, optimized rewards, and special exploration strategies led to more stable learning and higher success rates compared to traditional methods.
- The DemyAgent-4B model, with 4 billion parameters, matched or outperformed much larger models with up to 32 billion parameters on several benchmarks, and both the models and datasets are available for public use.
Researchers from the National University of Singapore, Princeton, and the University of Illinois Urbana-Champaign have pinpointed three key factors that make AI agents smarter: data quality, algorithm design, and reasoning strategy.
Their work shows that a carefully trained 4-billion-parameter model can match or even beat competitors with up to 32 billion parameters.

Real data beats synthetic every time
The type of data used during training matters most. The researchers compared models trained on authentic learning trajectories against those trained on artificial data, where intermediate steps were replaced by tool outputs.
On AIME math benchmarks, a 4-billion-parameter model trained on real data achieved 29.79 percent accuracy. The same model trained on synthetic data scored under 10 percent.

Real data captures the full reasoning workflow, including pre-tool analysis, guided execution, error correction, and self-reflection. Synthetic data can’t replicate these links, the researchers say.
Data diversity proved equally crucial. A mixed dataset of 30,000 examples from math, science, and programming accelerated learning dramatically. The AI hit 50 percent accuracy after just 150 training steps, while a math-only dataset needed 220 steps to reach the same benchmark.
Token-based scoring makes the difference
The second factor is how the learning process itself is structured. The team tested three algorithm variants, looking for the best way to optimize performance. The winner was a method called GRPO-TCR, which combines token-level scoring (grading each word chunk), broader clipping for more exploration, and a reward setup to discourage overly long answers.

This optimized approach achieved 70.93 percent accuracy on one math benchmark and 68.13 percent on another. Token-based scoring proved particularly powerful, outperforming sentence-based methods by about 4 percent. Unlike traditional reinforcement learning, agents can improve both exploration and precision simultaneously through tool interactions.
Think more, act less
The third finding is about how the AI organizes its reasoning. The researchers found two main styles: reactive (short thinking, frequent tool use) and deliberative (longer thinking, fewer tool calls). Models using the deliberative strategy consistently achieved over 70 percent success rates in tool use. Reactive models performed poorly, as their rapid-fire tool calls were often ineffective or wrong. Quality trumps quantity every time.

Interestingly, current long-chain-of-thought models struggle with tool integration. Despite being optimized for extended thinking, they tend to avoid tool calls entirely, relying only on internal reasoning processes.
Small but mighty
Applying these insights, the researchers built DemyAgent-4B with just 4 billion parameters. The results speak for themselves: 72.6 percent on AIME2024, 70 percent on AIME2025, 58.5 percent on GPQA-Diamond science tests, and 26.8 percent on LiveCodeBench-v6 programming benchmarks. This performance places it firmly among competitors with 14 to 32 billion parameters, proving that smart training beats brute force.

The researchers have released their training data and model weights for others to use and build on.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now