Content
summary Summary

Andrej Karpathy, a former Tesla and OpenAI researcher, is part of a growing movement in the AI community calling for a new approach to building large language models (LLMs) and AI systems.

Ad

On X, Karpathy shared his long-term skepticism about reinforcement learning (RL) as a foundation for LLM training. He argues that RL reward functions are "super sus" - unreliable, easy to game, and not well suited for teaching "intellectual problem solving" skills.

This stands out because current "reasoning" models depend heavily on reinforcement learning, and companies like OpenAI see the approach as scalable and adaptable to new tasks. Reasoning models have powered most of the recent AI hype and progress, while purely pre-trained LLMs seem to have hit a plateau.

Reinforcement learning is often used to help LLMs break down tasks into logical steps and make their reasoning process more transparent. RL works best when there's a clear right or wrong answer, since the model gets positive feedback for solving problems in a step-by-step way.

Ad
Ad

Despite his criticism, Karpathy still sees RL finetuning as a step up from classic supervised finetuning (SFT), which just mimics human answers. He thinks RL leads to more nuanced model behavior and believes RL finetuning will "continue to grow substantially."

Still, Karpathy says real breakthroughs will need fundamentally different learning mechanisms. Humans, he points out, use much more powerful and efficient ways to learn—methods that "haven't been properly invented and scaled yet." This puts him in line with a growing group of LLM skeptics who argue that the next leap in AI will only come from new approaches.

One direction he mentions is "system prompt learning," where learning happens at the level of tokens and context, not by changing model weights. Karpathy compares this to what happens during human sleep, when the brain consolidates and stores information.

Interactive environments as the next major training paradigm for language models

Karpathy also sees promise in training LLMs through interactive environments—digital spaces where models can act and see the consequences. Earlier training phases relied on internet text for pre-training and question-and-answer data for fine-tuning, but training in environments gives models real feedback based on what they actually do.

With this approach, LLMs could go beyond simply guessing how a person might respond and start learning to make decisions, testing how well those choices work in controlled scenarios. Karpathy says these environments could be used for both training and evaluation. The main challenge now is building a large, diverse, and high-quality set of environments, much like the text datasets used in earlier training phases.

Recommendation

Back in August 2024, Karpathy argued that reinforcement learning could be a breakthrough for LLM training—if it relied on truly objective, measurable reward functions. At the time, he criticized reinforcement learning from human feedback (RLHF), then the standard approach, for being too dependent on human preferences, calling it more of a "vibe check" than a real goal. He said that solving complex problems requires well-defined success criteria. Even as reasoning models advance, it doesn't seem like Karpathy believes this core issue has been solved.

Karpathy's thinking lines up with calls for a paradigm shift from DeepMind researchers Richard Sutton and David Silver in their essay "Welcome to the Era of Experience." Both argue that the next wave of advanced AI can't just copy human language or judgments. Instead, they say, future AI needs to become more robust, creative, and adaptable by learning directly from experience and independent action.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Andrej Karpathy is critical of reinforcement learning in large language models, especially pointing out that reward functions for cognitive tasks like problem solving are unreliable and easy to manipulate.
  • He suggests training AI systems in interactive environments where they learn through their own actions and consequences, instead of just statistically imitating human answers.
  • Karpathy’s argument echoes the views of Deepmind researchers Richard Sutton and David Silver, who also believe future AI should learn from independent experience and action rather than relying mainly on language data or human feedback.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.