Andrej Karpathy, a former Tesla and OpenAI researcher, is part of a growing movement in the AI community calling for a new approach to building large language models (LLMs) and AI systems.
On X, Karpathy shared his long-term skepticism about reinforcement learning (RL) as a foundation for LLM training. He argues that RL reward functions are "super sus" - unreliable, easy to game, and not well suited for teaching "intellectual problem solving" skills.
This stands out because current "reasoning" models depend heavily on reinforcement learning, and companies like OpenAI see the approach as scalable and adaptable to new tasks. Reasoning models have powered most of the recent AI hype and progress, while purely pre-trained LLMs seem to have hit a plateau.
Reinforcement learning is often used to help LLMs break down tasks into logical steps and make their reasoning process more transparent. RL works best when there's a clear right or wrong answer, since the model gets positive feedback for solving problems in a step-by-step way.
Despite his criticism, Karpathy still sees RL finetuning as a step up from classic supervised finetuning (SFT), which just mimics human answers. He thinks RL leads to more nuanced model behavior and believes RL finetuning will "continue to grow substantially."
Still, Karpathy says real breakthroughs will need fundamentally different learning mechanisms. Humans, he points out, use much more powerful and efficient ways to learn—methods that "haven't been properly invented and scaled yet." This puts him in line with a growing group of LLM skeptics who argue that the next leap in AI will only come from new approaches.
One direction he mentions is "system prompt learning," where learning happens at the level of tokens and context, not by changing model weights. Karpathy compares this to what happens during human sleep, when the brain consolidates and stores information.
Interactive environments as the next major training paradigm for language models
Karpathy also sees promise in training LLMs through interactive environments—digital spaces where models can act and see the consequences. Earlier training phases relied on internet text for pre-training and question-and-answer data for fine-tuning, but training in environments gives models real feedback based on what they actually do.
With this approach, LLMs could go beyond simply guessing how a person might respond and start learning to make decisions, testing how well those choices work in controlled scenarios. Karpathy says these environments could be used for both training and evaluation. The main challenge now is building a large, diverse, and high-quality set of environments, much like the text datasets used in earlier training phases.
Back in August 2024, Karpathy argued that reinforcement learning could be a breakthrough for LLM training—if it relied on truly objective, measurable reward functions. At the time, he criticized reinforcement learning from human feedback (RLHF), then the standard approach, for being too dependent on human preferences, calling it more of a "vibe check" than a real goal. He said that solving complex problems requires well-defined success criteria. Even as reasoning models advance, it doesn't seem like Karpathy believes this core issue has been solved.
Karpathy's thinking lines up with calls for a paradigm shift from DeepMind researchers Richard Sutton and David Silver in their essay "Welcome to the Era of Experience." Both argue that the next wave of advanced AI can't just copy human language or judgments. Instead, they say, future AI needs to become more robust, creative, and adaptable by learning directly from experience and independent action.