Demis Hassabis, CEO of Google Deepmind, expects the next year to bring major progress in multimodal models, interactive video worlds, and more reliable AI agents. Speaking at the Axios AI+ Summit, Hassabis noted that Gemini's multimodal capabilities are already powering new applications. He used a scene from "Fight Club" to illustrate the point: instead of just describing the action, the AI interpreted a character removing a ring as a philosophical symbol of renouncing everyday life. Google's latest image model uses similar capabilities to precisely understand visual content, allowing it to generate complex outputs like infographics, something that wasn't previously possible.
Hassabis says AI agents will be "close" to handling complex tasks autonomously within a year, aligning with the timeline he predicted in May 2024. The goal is a universal assistant that works across devices to manage daily life. Deepmind is also developing "world models" like Genie 3, which generate interactive, explorable video spaces.
Yann LeCun, Meta's outgoing AI scientist, is launching a new startup built around "world models" - systems designed to understand physical reality rather than just generate text. LeCun argues that Silicon Valley is currently "hypnotized" by generative AI, and he intends to build his project with a heavy reliance on European talent. According to Sifted, the company will operate globally and maintain a hub in Paris.
Nvidia used the NeurIPS conference to debut new AI models for autonomous driving and speech processing. The company introduced Alpamayo-R1, a system designed to handle traffic situations through step-by-step logical reasoning. Nvidia says this approach helps the model respond more effectively to complex real-world scenarios than previous systems. The code is public, but the license limits it to non-commercial use.
Nvidia also showed new tools for robotics simulation. In speech AI, the company unveiled MultiTalker, a model that can separate and transcribe overlapping conversations from multiple speakers.
Programmers who rely on AI assistants tend to ask fewer questions and learn more superficially, according to new research from Saarland University. A team led by Sven Apel found that students were less critical of the code suggestions they received when working with tools like GitHub Copilot. In contrast, pairs of human programmers asked more questions, explored alternatives, and learned more from one another.
Apel et al.
In the experiment, 19 students worked in pairs: six in human-only teams and seven in human-AI teams. According to Apel, many of the AI-assisted participants simply accepted code suggestions because they assumed the AI's output was already correct. He noted that this habit can introduce mistakes that later require significant effort to fix. Apel said AI tools can be helpful for straightforward tasks, but complex problems still benefit from real collaboration between humans.