summary Summary

The European companies show the first result of their cooperation: a language model of Aleph Alpha slimmed down by 80 percent.

Large language models like OpenAI's GPT-3 or Google's PaLM have well over a hundred billion parameters. Even with new insights into the role of training data in Deepmind's Chinchilla, larger models are to be expected.

In fact, language models such as Google's Switch Transformer already exist with 1.6 trillion parameters, but they rely on sparse modeling, in Google's case specifically on a mixture-of-experts Transformer architecture.

Whereas with GPT-3, for example, all parts of the neural network are involved in every processing step, sparse models such as Switch Transformer use processes in which only parts of the network relevant to the task become active. This greatly reduces the computing power required for queries to the network.

Classical neural networks are trained to be "dense". Using sparse modeling, the networks can be reduced in their complexity while maintaining approximately the same performance. | Image: Graphcore/Aleph Alpha

A European AI collaboration shows first results

Google uses sparse modeling in the case of Switch Transformer to further scale language models. But conversely, it can also be used to train smaller networks with similar performance to larger models.

That's exactly what AI chipmaker Graphcore and AI startup Aleph Alpha have now done. The two European AI companies announced a collaboration in June 2022 that aims to develop large European AI models, among other things. The German Aleph Alpha recently launched Europe's fastest commercial AI data center.

Aleph Alpha CEO Jonas Andrulis pointed to the advantages of Graphcore's hardware for sparse modeling last summer, saying, "Graphcore’s IPU offers a new opportunity to evaluate advanced technological approaches such as conditional sparsity. These architectures will undoubtedly play a role in Aleph Alpha’s future research."

Graphcore and Aleph Alpha demonstrate lightweight Luminous language model

The two companies were able to slim Aleph Alpha's 13 billion parameters "Luminous Base" language model to 2.6 billion parameters. The companies also showed the slimmed-down variant running Lumi, a "conversational module" for Luminous.

At the Super Computing Conference 2022 (SC22) in Texas, Aleph Alpha and Graphcore showed how the sparse variant of Luminous drives the Lumi module. Lumi is a kind of "chatbot mode" of the language model. | Image: Aleph Alpha

The sparse modeling reduced nearly 80 percent of the model's weights while preserving most of its capabilities, according to the press release.


The new model uses point sparse matrix multiplications supported by Graphcore's Intelligence Processing Unit (IPU) and requires only 20 percent of the computational power and 44 percent of the memory of the original model, it said.

The small size allows the 2.6 billion-parameter model to be held entirely on the ultra-high-speed on-chip memory of a Graphcore IPU-POD16 Classic - achieving maximum performance. The model also requires 38 percent less power.

"Sparsification" central for next generation of AI models

For the next generation of models, "sparsification" will be critical, the companies said. It would enable specialized submodels to master selected knowledge more efficiently.

"This breakthrough in sparsification modeling impacts the commercial potential of AI companies like Aleph Alpha, enabling them to deliver high-performance AI models to customers with minimal computational requirements," the statement added.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Google is also following this path. In October 2021, AI chief Jeff Dean spoke for the first time about the search giant's AI future: Pathways is to one day become a kind of AI multipurpose system - and relies on sparse modeling as a central element.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • Heidelberg-based AI company Aleph Alpha is slimming down its 13 billion parameter Luminous Base Model to 2.6 billion parameters via sparse modeling.
  • The sparse model requires only 20 percent of the FLOPs and 44 percent of the memory of the dense model and runs on Graphcore's IPU hardware. According to Aleph Alpha, most of the capabilities of the large model are preserved in the process.
  • The companies call their work a breakthrough that has implications for the commercial potential of AI companies like Aleph Alpha. "Sparsification" allows them to deliver high-performance AI models with minimal computational requirements.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.