Demis Hassabis, CEO of Google Deepmind, expects the next year to bring major progress in multimodal models, interactive video worlds, and more reliable AI agents. Speaking at the Axios AI+ Summit, Hassabis noted that Gemini's multimodal capabilities are already powering new applications. He used a scene from "Fight Club" to illustrate the point: instead of just describing the action, the AI interpreted a character removing a ring as a philosophical symbol of renouncing everyday life. Google's latest image model uses similar capabilities to precisely understand visual content, allowing it to generate complex outputs like infographics, something that wasn't previously possible.
Hassabis says AI agents will be "close" to handling complex tasks autonomously within a year, aligning with the timeline he predicted in May 2024. The goal is a universal assistant that works across devices to manage daily life. Deepmind is also developing "world models" like Genie 3, which generate interactive, explorable video spaces.