Why an artificial intelligence must go to school

Sep 8, 2019

The big advances in AI development come from more data, bigger networks, more computing power. Does AI have a future outside the cloud?

The trend in language AIs is toward larger and larger models that swallow more and more data. These giants include Google's BERT language AI, OpenAI's GPT-2, Facebook's RoBERTa and Nvidia's Megatron. The latter is the most recent language AI, 24 times larger than BERT-large and five times larger than GPT-2.

But large AI models require a lot of computing power: Nvidia recommends 512 graphics chips for training Megatron. And those who want to retrain OpenAI's GPT-2 will bring even fast server graphics cards to their limit with the latest version.

Since language models require a lot of computing power and energy, they only run in the cloud. Smaller models can be run locally without an Internet connection on the smartphone or on a robot - but they perform significantly less.

So smaller AI models are needed that can still match the performance of their giant relatives. But how can this be done?

Back to School (of AI)

The solution could be a kind of AI school in which the small AIs learn from the big ones.

AI researchers call this process distillation: a large AI model acts as a teacher, and a small one as a student. During training, the large AI passes on its knowledge: in the case of a language AI, for example, the 20 most likely words that complete an incomplete sentence. The small AI model thus learns to reproduce the results of the large AI model - without adopting its scale.

Google researchers have now applied this method to the language AI BERT. The result: BERT's student DistilBERT is 60 percent smaller, faster, and achieves 95 percent of BERT's performance. In the future, DistilBERT will be shrunk using other methods, such as removing some network connections.

Google has similarly shrunk its own next-gen Assistant, which is expected to run locally on Pixel smartphones without an Internet connection by the end of the year. The almost 100-gigabyte voice model was reduced to just under 0.5 gigabytes for this.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Why an artificial intelligence must go to school

Back to School (of AI)

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.