Content
summary Summary

A new architecture called the Energy-Based Transformer aims to teach AI models how to solve problems analytically and step by step.

Ad

Most current AI models operate much like what Daniel Kahneman described as "System 1 thinking": they're fast, intuitive, and excel at pattern recognition. But according to a study from researchers at UVA, UIUC, Stanford, Harvard, and Amazon GenAI, these models often fail at tasks that require the slower, more analytical "System 2 thinking" - such as complex logical reasoning or advanced mathematics.

The paper, "Energy-Based Transformers are Scalable Learners and Thinkers," asks whether these kinds of reasoning skills can emerge purely from unsupervised learning. The researchers' answer is a new architecture: the Energy-Based Transformer (EBT).

How Energy-Based Transformers work

The EBT approach treats thinking as an iterative optimization process. Instead of generating an answer in a single step, the model starts with a random solution. It then evaluates this solution by calculating an "energy" value.

Ad
Ad

The lower the energy, the better the prediction fits the context. Through repeated adjustments using gradient descent, the answer is gradually refined until the energy reaches a minimum. This lets the model spend more computation on harder problems.

Image: Gladstone et al.

The idea of framing this process in terms of energy isn't new - Meta's chief AI scientist Yann LeCun and others have discussed "energy-based models" for years.

More efficient learning and generalization

In experiments, the researchers compared EBTs with an advanced Transformer variant (Transformer++). Their results suggest EBTs scale more efficiently: the paper reports up to a 35 percent higher scaling rate in terms of data, parameter count, and compute. This points to improved data and computational efficiency.

The real strength, however, shows up in what the authors call "thinking scalability" - the ability to boost performance by allocating extra compute at runtime. On language tasks, EBTs improved performance by up to 29 percent, especially on problems that differed significantly from their training data.

Image: Gladstone et al.

In image denoising tests, EBTs outperformed Diffusion Transformers (DiTs) while requiring 99 percent fewer computation steps. The study also found that EBTs learned image representations that delivered roughly ten times better classification accuracy on ImageNet-1k, suggesting a deeper understanding of content.

Recommendation

Significant hurdles remain

Despite these promising results, open questions remain. The main issue is compute: according to the paper, training EBTs requires 3.3 to 6.6 times more computing power (FLOPs) than standard Transformers. This extra overhead could be a barrier for many real-world applications. The study also measures "System 2 thinking" mainly through perplexity improvements, rather than actual reasoning tasks, and comparisons to state-of-the-art reasoning models are missing due to limited compute budgets.

All scaling predictions are based on experiments with models up to just 800 million parameters - much smaller than today's largest AI systems. Whether EBTs' advantages hold at larger scales remains to be seen.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers from several top universities and Amazon GenAI have introduced the Energy-Based Transformer (EBT), a new AI architecture designed to handle analytical reasoning by refining answers through an iterative process, rather than generating them in a single step.
  • In tests, EBTs demonstrated more efficient learning and better generalization compared to advanced Transformer models, with notable improvements in language tasks and image processing, but at the cost of requiring 3.3 to 6.6 times more computing power to train.
  • The main limitations are significant computational demands and open questions about how well EBTs will scale to larger models, as current results are based on relatively small model sizes and do not directly compare to leading reasoning-focused AIs.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.