Richard Sutton, a leading figure in reinforcement learning and Turing Award winner, says the AI industry has lost its way.
"As AI has become a huge industry, to an extent it has lost its way," Sutton writes. He argues that recent progress has ignored the core principles needed for real intelligence.
Sutton calls for a course correction. "What is needed to get us back on track to true intelligence? We need agents that learn continually. We need world models and planning. We need knowledge that is high-level and learnable. We need to meta-learn how to generalize."
Sutton, who works at Google DeepMind, joins other prominent researchers criticizing the industry's fixation on scaling large language models. He believes real intelligence comes from experience—from agents that learn by interacting directly with their environment. Along with David Silver, he recently published a paper arguing that AI should learn by doing, not just by absorbing huge amounts of text.
Sutton says current models do the opposite: their knowledge is programmed in at design time rather than discovered through learning. He points to his well-known "Bitter Lesson," which argues that scalable, general methods succeed in AI, not handcrafted human knowledge.
Sutton's path to superintelligence
Sutton says the main problem with today's systems is that they can't learn continually. They struggle with catastrophic forgetting, where new information overwrites what they've already learned, and they lose the ability to keep learning over time.
To address this, Sutton proposes the Oak architecture (Options and Knowledge), a framework for building agents that can reach superintelligence by learning from experience.
Oak is built on three core principles. First, the agent must be general-purpose, starting with no specific knowledge about any particular world. Second, learning is entirely experience-driven: the agent acquires knowledge solely through direct interaction with its environment—by observing, acting, and receiving rewards. Third, the reward hypothesis applies: every goal can be reduced to maximizing a simple, scalar reward signal.
At the core of Oak is a self-reinforcing loop: the agent creates higher-level abstractions through feedback, where features that help with planning and problem solving become the basis for the next, even more abstract generation of knowledge. This process is open-ended, limited only by available computing power, and, in Sutton's view, could eventually pave the way to superintelligence.
But Sutton says Oak is still out of reach, since it depends on algorithms that can learn continuously and stably without forgetting what they've already learned. According to Sutton, reliable, ongoing deep learning is the missing piece. You can watch Sutton's full technical talk here.