Content
newsletter Newsletter

The big advances in AI development come from more data, bigger networks, more computing power. Does AI have a future outside the cloud?

The trend in language AIs is toward larger and larger models that swallow more and more data. These giants include Google's BERT language AI, OpenAI's GPT-2, Facebook's RoBERTa and Nvidia's Megatron. The latter is the most recent language AI, 24 times larger than BERT-large and five times larger than GPT-2.

But large AI models require a lot of computing power: Nvidia recommends 512 graphics chips for training Megatron. And those who want to retrain OpenAI's GPT-2 will bring even fast server graphics cards to their limit with the latest version.

Since language models require a lot of computing power and energy, they only run in the cloud. Smaller models can be run locally without an Internet connection on the smartphone or on a robot - but they perform significantly less.

Ad
Ad

So smaller AI models are needed that can still match the performance of their giant relatives. But how can this be done?

Back to School (of AI)

The solution could be a kind of AI school in which the small AIs learn from the big ones.

AI researchers call this process distillation: a large AI model acts as a teacher, and a small one as a student. During training, the large AI passes on its knowledge: in the case of a language AI, for example, the 20 most likely words that complete an incomplete sentence. The small AI model thus learns to reproduce the results of the large AI model - without adopting its scale.

Google researchers have now applied this method to the language AI BERT. The result: BERT's student DistilBERT is 60 percent smaller, faster, and achieves 95 percent of BERT's performance. In the future, DistilBERT will be shrunk using other methods, such as removing some network connections.

Google has similarly shrunk its own next-gen Assistant, which is expected to run locally on Pixel smartphones without an Internet connection by the end of the year. The almost 100-gigabyte voice model was reduced to just under 0.5 gigabytes for this.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.