Content
summary Summary

Former OpenAI and Tesla AI researcher Andrej Karpathy believes the race for bigger AI language models is about to reverse course. He predicts that future models will be smaller, but still get smarter.

Ad

Most of the relevant AI model companies have recently released more compact, yet powerful and much more affordable AI models compared to their larger counterparts, the most recent example being OpenAI's GPT-4o mini.

Karpathy expects the trend to continue. "My bet is that we'll see models that "think" very well and reliably that are very very small," he writes.

He attributes the current size of the most capable models to the complexity of the training, which requires LLMs to "memorize the Internet," adding that large language models excel at memorization, surpassing human capabilities.

Ad
Ad

But this also makes improvements difficult because knowledge and reasoning demonstrations are intertwined in the training data, Karpathy says. However, today's large models are still needed, even if they aren't trained efficiently because they can help automate the conversion of training data into "ideal, synthetic formats."

He foresees continuous improvement, with each model generating training data for the next, until a "perfect training set" emerges. Even a tiny model like GPT-2, with 1.5 billion parameters, could be considered "smart" when trained with this super data, Karpathy says.

While standard scores may suffer on certain benchmarks, such as Massive Multi-task Language Understanding (MMLU), which requires extensive knowledge and memorization, a smarter AI model could retrieve information more reliably and verify facts, so it doesn't need to know as much out of the box. OpenAI's Strawberry project reportedly addresses both of these capabilities.

More efficient scaling

OpenAI CEO Sam Altman has made similar remarks, announcing the "end of an era" for large AI models back in April 2023, and more recently confirming that data quality is the key success factor for further AI training, whether real or synthetic data. The key question, according to Altman, is how AI systems can learn more from less data.

Microsoft researchers made this assumption when developing the Phi models. Hugging Face AI researchers recently also confirmed this hypothesis and published a high-quality training dataset.

Recommendation

This doesn't mean that scaling is no longer a factor. Even small, high-quality models could benefit from more, diverse, higher-quality data, more parameters, and thus computation. OpenAI and others are pushing for more computing power for a reason.

This step back to smaller, more efficient models could be considered a consolidation phase, optimizing current achievements before the next computing surge. OpenAI's next larger model should give a clearer indication of where this is headed.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Andrej Karpathy, former AI researcher at OpenAI and Tesla, expects AI language models to become smaller and more efficient in the future, instead of getting bigger and bigger. To achieve this, the training data must be optimized so that even small models can "think" reliably.
  • Large AI models are still needed: They would have the ability to automatically help evaluate training data and convert it into ideal synthetic formats. That way, each model can improve the data for the next, until the "perfect training data set" is achieved, Karpathy said.
  • Sam Altman, CEO of OpenAI, also sees data quality as a critical success factor for further AI training, saying recently that the key question is how AI systems can learn more from less data.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.