Does the breakthrough to general AI need more data and computing power above all else? Yann LeCun, Chief AI Scientist at Meta, comments on the recent debate about scaling sparked by Deepmind’s Gato.
The recent successes of large AI models such as OpenAI’s DALL-E 2, Google’s PaLM and Deepmind’s Flamingo have sparked a debate about their significance for progress towards general AI. Deepmind’s Gato has recently given a particular boost to the debate, which has been conducted publicly, especially on Twitter.
Gato is a Transformer model trained with numerous data modalities, including images, text, proprioception or joint moments. All training data is processed by Gato in a token sequence similar to those of large language models. Thanks to the versatile training, Gato can text, describe images, play video games or control robotic arms. Deepmind tested the AI model with over 600 benchmarks.
Deepmind’s Gato and scaling as a path to general artificial intelligence
Deepmind sees Gato as an important step on the way to a generalist AI model. What exactly does that path look like? According to Nando de Freitas, head of research at Deepmind, it’s all about scaling. Scaling is meant to lead Gato to its goal, possibly general artificial intelligence. At least that’s how de Freitas can be understood when he says, “The game is over.”
De Freitas thus states what many in the AI industry think, suspects cognitive scientist and AI researcher Gary Marcus. Marcus calls this approach “Scaling-Uber-Alles” and criticizes it as short-sighted.
Someone’s opinion article. My opinion: It’s all about scale now! The Game is Over! It’s about making these models bigger, safer, compute efficient, faster at sampling, smarter memory, more modalities, INNOVATIVE DATA, on/offline, … 1/N https://t.co/UJxSLZGc71
– Nando de Freitas ?️? (@NandoDF) May 14, 2022
But where does the confidence in scaling come from? Underlying is a phenomenon that can be observed in numerous Transformer models since GPT-1: With a higher number of parameters and corresponding data to train, the performance of the models, for example in speech processing or image generation, increases – sometimes leaps and bounds.
This can also be observed in Gato: Deepmind trained three variants of the AI model. The largest variant with a relatively small number of 1.18 billion parameters was clearly ahead of the smaller models. Considering large language models with hundreds of billions of parameters and the leaps in performance observed there, de Freitas’ hope in the scaling of Gato is understandable.
Metas AI chief Yann LeCun sees big challenges beyond scaling
Now, Metas AI chief Yann LeCun is speaking out on the debate over the significance of recent advances. He follows up on positions he has expressed several times before, such as in a podcast with Lex Fridman on three major challenges of artificial intelligence or in a post on the development of autonomous AI.
About the raging debate regarding the significance of recent progress in AI, it may be useful to (re)state a few obvious facts:
(0) there is no such thing as AGI. Reaching “Human Level AI” may be a useful goal, but even humans are specialized.
– Yann LeCun (@ylecun) May 17, 2022
LeCun sees models like Flamingo or Gato as an indication that the research community is making “some” progress toward human-level artificial intelligence (HLAI). LeCun thinks the term general artificial intelligence is misguided.
But some fundamental concepts for the way ahead are still missing, LeCun said. According to him, some of them are closer to implementation than others, such as generalized self-supervised learning.
But it’s not clear how many of those concepts are even needed – only the obvious ones are known. “Hence, we can’t predict how long it’s going to take to reach HLAI,” LeCun writes.
Scaling alone will not solve the problem, so LeCun calls for new concepts. Machines would have to:
- learn how the world works by observing like babies,
- learn to predict how one can influence the world through taking actions,
- learn hierarchical representations that allow long-term predictions in abstract spaces,
- properly deal with the fact that the world is not completely predictable,
- enable agents to predict the effects of sequences of actions so as to be able to reason and plan,
- enable machines to plan hierarchically, decomposing a complex task into subtasks,
- all of this in ways that are compatible with gradient-based learning.
The solution to all these tasks is not imminent, LeCun said. So scaling is necessary, but not sufficient for further progress.
According to LeCun, the game is not over yet. Unlike de Freitas, Meta’s AI chief comes to a sobering conclusion: “We have a number of obstacles to clear, and we don’t know how.”