Content
summary Summary

A new mini-model called TRM shows that recursive reasoning with tiny networks can outperform large language models on tasks like Sudoku and the ARC-AGI test - using only a fraction of the compute power.

Ad

Researchers at Samsung SAIL Montreal introduced the "Tiny Recursive Model" (TRM), a compact design that outperforms large models such as o3-mini and Gemini 2.5 Pro on complex reasoning tasks, despite having just seven million parameters. By comparison, the smallest language models typically range from 3 to 7 billion parameters.

According to the study "Less is More: Recursive Reasoning with Tiny Networks," TRM reaches 45 percent on ARC-AGI-1 and 8 percent on ARC-AGI-2, outperforming much larger models including o3-mini-high (3.0 percent on ARC-AGI-2), Gemini 2.5 Pro (4.9 percent), DeepSeek R1 (1.3 percent), and Claude 3.7 (0.7 percent). The authors say TRM achieves this with less than 0.01 percent of the parameters used in most large models. More specialized systems such as Grok-4-thinking (16.0 percent) and Grok-4-Heavy (29.4 percent) still lead the pack.

In other benchmarks, TRM boosted test accuracy on Sudoku-Extreme from 55.0 to 87.4 percent and on Maze-Hard from 74.5 to 85.3 percent compared to the "Hierarchical Reasoning Model" that inspired its design.

Ad
Ad

Small model, big impact

TRM functions like a tight, repeating correction loop. It maintains two pieces of short-term memory: the current solution ("y") and a sort of scratchpad for intermediate steps ("z"). At each stage, the model updates this scratchpad by reviewing the task, its current solution, and its prior notes, then produces an improved output based on that information.

This loop runs multiple times, gradually refining earlier mistakes without requiring a massive model or lengthy chains of reasoning. The researchers say a small network with only a few million parameters is enough to make this process work.

During training, TRM receives step-by-step feedback and learns to estimate a stop probability, preventing unnecessary iterations. Depending on the task, it uses either simple MLPs (for fixed-size grids like Sudoku) or self-attention (for larger structures such as ARC-AGI).

What the results do - and don't - mean

TRM demonstrates that small, targeted models can be extremely efficient on narrow, structured reasoning tasks. It improves its answers incrementally and benefits greatly from data augmentation. The paper also emphasizes that architecture choices - such as preferring MLPs over attention for smaller grids - depend on the dataset, and TRM consistently beats larger general-purpose systems in those scenarios.

However, the findings don't imply that large language models are obsolete as a path toward more general capabilities. TRM operates within well-defined grid problems and it isn't suited for open-ended, text-based, or multimodal tasks since it’s not a generative system.

Recommendation

Instead, it represents a promising building block for reasoning tasks, not a replacement for transformer-based language models. Further experiments adapting TRM to new domains are already underway and could expand its potential applications.

Independent replication and tests using the private ARC-AGI datasets from the ARC Institute are still pending.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Samsung SAIL Montreal have developed the Tiny Recursive Model (TRM), a compact neural network with just seven million parameters that surpasses much larger models like o3-mini and Gemini 2.5 Pro on structured reasoning tasks such as Sudoku and the ARC-AGI benchmarks.
  • TRM works by running a tight correction loop s to iteratively refine its output, significantly boosting accuracy on benchmarks with far less computational power than typical large language models.
  • While TRM excels at narrow, grid-based tasks, it is not designed for open-ended or generative problems, and should be seen as a specialized model rather than a replacement for large transformer-based models.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.