Abacus embeddings help AI models sort out their arithmetic issues and tackle longer calculations

Midjourney prompted by THE DECODER

Novel "Abacus" embeddings enable AI language models to solve significantly longer and more complex addition and multiplication tasks than before. This opens up new possibilities for algorithmic reasoning.

AI language models such as GPT-4 from OpenAI show rudimentary mathematical capabilities. However, they still reach their limits when it comes to performing calculations with very long numbers or complex algorithms such as sorting in a zero-shot scenario without external tools.

A team of researchers from the University of Maryland, the Lawrence Livermore National Laboratory, the ELLIS Institute Tübingen, the Max Planck Institute for Intelligent Systems, the Tübingen AI Center, and Carnegie Mellon University has developed a new method to drastically improve the capabilities of AI language models in arithmetic tasks. Their so-called Abacus embeddings help the models to better recognize the position of individual digits in long numbers.

Conventional AI models have difficulty correctly performing very long additions with up to 100 digits, even if they have been trained on numbers with up to 20 digits. The reason: the models lose track of the position of a digit within the long number.

The Abacus method solves this problem by assigning each digit a position that corresponds to its place in the number system, similar to an abacus. In this way, units, tens, hundreds, etc. each have the same position coding. This helps the models to arrange the digits correctly one below the other and to add them together in places.

Abacus embeddings enable a drastic leap in performance

With this approach, the researchers were able to increase the accuracy and generalization capability of the AI models enormously: models that were only trained on 20-digit numbers solved additions with up to 120 digits almost error-free. This corresponds to a generalization by a factor of 6 - the previous best value was 2.5.

The researchers achieved even better results by combining the Abacus embeddings with special network architectures: so-called "looped transformers" with "input injection", in which the input data is fed into each network layer, reduced the error rate to just 0.9%.

The scientists also successfully applied the approach to the multiplication of up to 15-digit numbers and the sorting of number sequences. This once again demonstrated the great potential of Abacus to drastically improve the performance of the models in some cases.

The results show how specialized data representations and model architectures can take the algorithmic reasoning capabilities of AI systems to a new level. The researchers hope that their approach will pave the way for further breakthroughs in the mathematical understanding of language models.

Recommendation

AI research

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

However, due to limited computing capacity, the team has not carried out any tests with natural language.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Abacus embeddings help AI models sort out their arithmetic issues and tackle longer calculations

Abacus embeddings enable a drastic leap in performance

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

AI startup tackles bottleneck where people spend more time checking AI content than creating it

LeanDojo enables theorem proving with LLMs

The ARC benchmark's fall marks another casualty of relentless AI optimization

DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

Abacus embeddings help AI models sort out their arithmetic issues and tackle longer calculations

Abacus embeddings enable a drastic leap in performance

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

AI startup tackles bottleneck where people spend more time checking AI content than creating it

LeanDojo enables theorem proving with LLMs