DeepMind's Vice President of Drastic Research and Gemini co-tech lead Oriol Vinyals describes how artificial intelligence is moving from narrowly focused systems toward autonomous agents, and what challenges remain ahead.
According to Vinyals, AI is going through a fundamental transformation away from highly specialized systems and toward autonomous agents. Speaking on a company podcast, he explained that early AI systems like AlphaStar, which focused on playing StarCraft, were just the beginning of this development.
Today's large language models (LLMs) and multimodal systems serve as a kind of "CPU" - a foundation for more complex capabilities, Vinyals says. The next major step is giving these systems a "digital body" that allows them to interact independently with the (digital) world.
The limitations of scaling
A key challenge lies in the limitations of scaling, according to Vinyals. Simply building larger models is no longer enough, as improvements become exponentially harder to achieve. Vinyals compares it to cleaning a room: "The first 10 minutes that you spend tidying, it's going to make a massive difference. But once you're like 7 hours in, that extra 10 minutes, it's not going to make any difference at all."
Training data is also becoming scarce. Vinyals says DeepMind is experimenting with synthetic data and untapped data sources like videos: "There's a lot of it. And we haven't quite seen a moment of take all the video data where you probably can derive a lot of knowledge, a lot of laws of physics, a lot of how the world works, even if there are no words associated with the videos necessarily, and extract that knowledge."
First steps with Gemini 2.0
With Gemini 2.0, Google DeepMind has introduced initial capabilities toward autonomous agents. According to Google's demos, the system can navigate browsers, write code, and act as a "companion" in games. But these abilities are just the beginning: "There are a lot of steps. But if you just fast-forward, anything a human can do on a browser, these things can do in principle. And then if you make them really understand what you want and really good through thinking and other techniques, they'll get better and better," says Vinyals.
The vision extends further: DeepMind is working to give agents capabilities like planning, logical thinking, and different types of memory. While Vinyals draws parallels to the human brain, he emphasizes that artificial systems might take entirely different approaches better suited to computers.
AGI, Agents and AlphaFold
On the development of Artificial General Intelligence (AGI), Vinyals takes a measured view: "If 10 years ago, 5 years ago even, I would have been given the models today, and I would say, look, there's a secret lab, this is a model, play with it and tell me if you think this is actually close to a general intelligence; I would have claimed, oh, yeah, that comes from a future where AGI basically either has happened or I can see that this is very close to it. So the closer you are, the more you find, oh, but it hallucinates. Of course, that's very important. But I think, just zooming out, it just feels like, ok, it's getting pretty close."
He expects initial breakthroughs mainly in scientific areas with clear success criteria, as was the case with AlphaFold. "And we've seen a good example very recently, of course, with AlphaFold. So in that sense, from a domains perspective, we honestly have seen some examples already of narrow but superintelligent system. AlphaFold was only doing that. And I think probably that's the domains to think about where we're going to start seeing superintelligence, even from the general sort of capabilities these models have. You might need to do some specialization. And again, it might be worth it," says Vinyals. "Was it worth it to solve protein folding? Absolutely, right? But I think that's a good test to use."