AI in practice

China releases first four politically approved language models

Matthias Bastian
The Chinese flag in a data stream, generative AI art, Midjourney.

Midjourney prompted by THE DECODER

Four large generative AI models from Alibaba, Baidu, Tencent, and 360 Group have passed China's official "large model standard compliance assessment."

The assessment was carried out by the China Electronics Standardization Institute, which is under the Ministry of Industry and Information Technology, reports the Global Times. The measure aims to promote the development of AI-generated content (AIGC) in the industry and create a compliance directory.

The evaluation is based on a multidimensional model evaluation framework and an indicator system designed to cover the generalization, intelligence, and safety of large models. It covers multiple modalities, including text, speech, and visual content.

The released models include Baidu's ERNIE bot, which has reached about 100 million users since its launch in August, and Alibaba's Tongyi Qianwen, the only open-source model in the first group of released large-scale AI models.

Pan Helin, a professor at the International Business School of Zhejiang University, told the Global Times that government regulation upfront is better for the industry's growth in China than intervening later.

AI Performance: China aims to catch up

According to Dou Dejing, associate professor at Tsinghua University's School of Electronic Engineering, China's best domestic AI models have reached the level of GPT-3.5, and the technical gap with GPT-4 is narrowing.

Baidu CEO Robin Li described his company's Ernie Bot 4.0 as being on par with OpenAI's GPT-4 in late October 2023. Zhou Hongyi, founder and president of 360 Security Technology, sees ChatGPT 12 to 18 months ahead of the Chinese competition, according to the Global Times.

The Chinese Communist Party demands that generative AI services be based on the "core values of socialism." In addition to evaluating and validating language models, it has released a dataset of 50 billion tokens in 100 million data points to train language models that reflect its political views.

Sources: