Content
summary Summary

Researchers have developed an algorithm that could dramatically reduce the energy consumption of artificial intelligence systems.

Ad

Scientists at BitEnergy AI created a method called "Linear-complexity multiplication" (L-Mul) that replaces complex floating-point multiplications in AI models with simpler integer additions.

According to the study "Addition is All You Need for Energy-Efficient Language Models", L-Mul could cut energy use for element-wise floating-point tensor multiplications by up to 95% and for dot products by 80%. The team tested their approach on various language, vision, and reasoning tasks, including language comprehension, structural reasoning, mathematics, and answering common sense questions.

The researchers say L-Mul can be applied directly to the attention mechanism in transformer models with minimal performance loss. The attention mechanism is a core component of modern language models like GPT-4o.

Ad
Ad

Direct use in attention mechanisms possible

BitEnergy AI sees potential for L-Mul to strengthen academic and economic competitiveness, as well as AI sovereignty. They believe it could enable large organizations to develop custom AI models faster and more cost-effectively.

The team plans to implement L-Mul algorithms at the hardware level and develop programming APIs for high-level model design. Their goal is to train text, symbolic, and multimodal AI models optimized for L-Mul hardware.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at BitEnergy AI have developed an algorithm called "Linear-complexity multiplication" (L-Mul), which replaces floating-point multiplications with more efficient integer additions in AI models and could thus reduce energy requirements by up to 95 percent.
  • The method was tested on various tasks such as language comprehension, reasoning, math and question answering. According to the team, results show that the direct application of L-Mul to the attention mechanism, a central component of modern language models, is almost lossless.
  • The team plans to implement L-Mul and L-Matmul kernel algorithms at the hardware level and develop APIs for high-level model design to train textual, symbolic and multimodal generative AI models optimized for use on L-Mul-native hardware.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.