In China, the state and companies are researching AI models with trillions of parameters. They want to prove that they can develop "brain-scale" AI.
In the race to build ever-larger AI models, China is showing that cooperation between the state, universities and the private sector holds the potential for gigantic AI models. The researchers are talking about "brain-scale" AI: according to their definition, these are AI models with parameters beyond the 100-trillion mark.
Currently, the largest AI models include Nvidia Megatron NGL with 530 billion parameters, Google's Switch Transformer with 1.6 trillion and WuDao 2.0 with 1.75 trillion parameters.
Such models and others are often developed exclusively by companies in the West. There are individual exceptions, such as Open GPT-X, a language model being developed as part of the Gaia-X initiative, or the BigScience project of the AI start-up HuggingFace, which is training a language model on a French supercomputer. The Eleuther AI research collective is also developing open-source models such as GPT-NeoX.
Small record on the way to the big 100 trillion model
In a new paper, researchers from Tsinghua University, Alibaba Group, Zhejiang Lab and Beijing Academy of Artificial Intelligence present BaGuaLu, a framework that enables the training of large AI models using the Mixture-of-Experts (MoE) architecture.
Like OpenAI's GPT-3, it relies on Transformer models, but in AI training it forms individual expert networks that take on specific queries while conserving the resources of the rest of the network. The huge MoE models only ever activate the part of the network that is currently needed, rather than the entire network, as many other AI architectures do.
In an initial test, the researchers trained a 1.93 trillion model with their framework, outperforming Google's Switch Transformer. They also demonstrate that their framework enables models with 14.5 trillion and a full 174 trillion parameters.
The researchers conducted their experiments on the Chinese supercomputer "New Generation Sunway" and also showed which hurdles supercomputer technology still has to overcome for the planned gigantic models.
Brain-sized AI models could bring major advances
The team expects that giant multimodal AI models could have far-reaching implications for numerous AI applications. Multimodal means that an AI is trained with different related data, such as photos, texts, and videos.
The researchers cite image and video annotation, image and video generation, multimodal search, answering visual questions, visual reasoning, object referencing, multimodal dialog systems, and multimodal translation as application scenarios. Moreover, the experience gained in these fields could be transferred to other areas, such as AI in biology or chemistry.
BaGuaLu could soon be used to train the first models beyond 100 trillion parameters. Then it would also become clear whether the capabilities of the AI models continue to scale so clearly with their size, as was seen, for example, from GPT-2 to GPT-3.