GLM-130B: The most capable AI language model currently available comes from China

Aug 27, 2022

THUDM/GLM-130B

A Chinese language model performs better than OpenAI's GPT-3 and Google's PaLM. Huawei shows a Codex alternative.

Large AI models for language, code, and images play a central role in the current proliferation of artificial intelligence. Researchers at Stanford University therefore even want to call such models "foundation models." The pioneer in the development of very large AI models is the U.S. AI company OpenAI, whose GPT-3 language model first demonstrated the usefulness of such AI systems.

In addition to many text tasks, GPT-3 also demonstrated rudimentary code capabilities. OpenAI then leveraged its close collaboration with Microsoft to use Github data to train the large code model Codex. Codex also serves as the foundation for Github's CoPilot.

China's AI firms develop powerful alternatives to Western models

Meanwhile, the list of large language models from Western companies and institutions is long.

In addition to GPT-3, there is Google's PaLM, AI21 Labs' Jurassic-1, Meta's OPT models, BigScience BLOOM, and Aleph Alpha's Luminous, for example. Code models are also available from Google, Amazon, Deepmind, and Salesforce. However, these models are primarily trained with Western data and are therefore not suitable for use in China - if access is possible or permitted at all.

Chinese companies and research institutions, therefore, began producing their own alternatives at the latest with the presentation of GPT-3. In 2021, for example, Huawei showed PanGu-Alpha, a 200 billion parameter language model trained with 1.1 terabytes of Chinese language data. The Beijing Academy of Artificial Intelligence (BAAI) unveiled Wu Dao 2.0, a 1.75 trillion parameter multimodal model, in the same year.

GLM-130B language model outperforms GPT-3

Now, researchers at China's Tsinghua University have unveiled GLM-130B, a bilingual language model that outperforms Metas OPT, BLOOM, and OpenAI's GPT-3, according to the team's benchmarks. The Chinese- and English-language model's Few-Shot performance surpassed the level of the previous top model GPT-3 in the Massive Multi-Task Language Understanding (MMLU) benchmark.

The 130 billion parameter model outperforms GPT-3 and PaLM. | Image: http://keg.cs.tsinghua.edu.cn/glm-130b/posts/glm-130b/

The team also tested GLM-130B against LAMBADA, a zero-shot benchmark for predicting the last word in a sequence of words. The benchmark is used to evaluate the language modeling capabilities of large language models.

Here, the Chinese model outperformed even previous leader PaLM - despite 410 billion fewer parameters. For the training, the team relied on a method developed at Tsinghua University (GLM), as well as 400 Nvidia A100 GPUs.

This is the first time a large language model from China has outperformed Western models. GLM-130B is available on Github and HuggingFace.

Code model PanGu coder achieves Codex performance

As a consistent evolution of PanGu, Huawei's Noah's Ark Lab and Huawei Cloud also recently showed a Chinese alternative to Copilot, Codex and other code models. PanGu-Coder completes code like the Western models and builds on the work done with PanGu. Like Codex, PanGu follows a training method similar to language models - the key difference is the training data: Code instead of text.

PanGu-Coder comes in several models, ranging from 317 million to 2.6 billion parameters. According to Huawei, the Chinese models are on par with Codex, AlphaCode, and alternatives in human evaluations - and in some cases outperform them. The company also shows a variant trained with a curated dataset (PanGu-Coder-FT) that performs even a bit better.

PanGu-Coder comes just under a year after the release of OpenAI's Codex. Huawei is thus following the pattern of PanGu-Alpha, which was also released just under a year after GPT-3.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder