Content
summary Summary

Researchers at Microsoft Research and the University of the Chinese Academy of Sciences have unveiled BitNet b1.58, a 1-bit language model that promises high performance at significantly reduced cost and power consumption.

The development of large-scale language models, such as GPT-4, has made significant progress in recent years, but the high energy and memory consumption and associated costs continue to pose significant challenges to the environment and widespread use of AI. However, a recent study by Shuming Ma and colleagues at Microsoft Research and the University of Chinese Academy of Sciences may provide a breakthrough to this problem: They have presented a 1-bit language model called BitNet b1.58 that provides similar performance to traditional 16-bit models (FP16 or BF16) with significantly reduced latency, memory requirements, and power consumption.

These 1-bit models work with ternary parameters that can take the values -1, 0, and 1, and were introduced in the study with BitNet b1.58, an evolution of the original BitNet. The highlight here is that the parameters are no longer limited to the two values -1 and 1, but also include zero, resulting in a representation with an average of 1.58 bits, which offers a higher modeling capability and thus better reflects the performance of classical language models.

The researchers showed that from a size of 3 billion parameters, BitNet b1.58 achieves comparable performance to classical language models in terms of perplexity and task performance - with up to 2.71 times faster processing and 3.55 times lower memory consumption. A 3.9 billion parameter variant of BitNet b1.58 is said to perform significantly better than Meta's Llama 3B.

Ad
Ad

1-bit models could continue to benefit from special hardware

A key advantage of these new 1-bit models is their efficiency in matrix multiplication, which mainly requires the addition of integers - an operation that consumes significantly less energy than the usual floating-point operations. The researchers suggest that the energy savings of these models could also translate into faster computation since the performance of many chips is limited by the energy available.

Another positive aspect is the reduction in memory requirements. Because BitNet b1.58 uses fewer bits, the amount of memory required to transfer model parameters from DRAM to the memory of an on-chip accelerator is also reduced. This results in faster and more efficient inference processes.

The study also provides comparisons to current models and shows that BitNet b1.58 with 70 billion parameters could achieve up to 11 times higher batch size and 8.9 times higher token throughput than a comparable LLaMA 70B model.

The researchers also point out that to fully exploit the potential of 1-bit language models, specialized hardware for these models needs to be developed. They call for further research and development in this direction to take full advantage of these new models.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Microsoft Research and the University of the Chinese Academy of Sciences have developed a 1-bit language model, called BitNet b1.58, that delivers similar performance to traditional 16-bit models, but with reduced latency, memory requirements, and power consumption.
  • BitNet b1.58 works with ternary parameters (-1, 0, 1) and achieves comparable performance to classical language models from a size of 3 billion parameters, with up to 2.71 times faster processing and 3.55 times less memory consumption.
  • The researchers emphasize that the development of specialized hardware is required to fully exploit the potential of 1-bit language models and call for further research in this direction.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.