Leading AI companies are changing course. Instead of developing ever-larger language models, they are focusing on test-time compute, which uses more processing power during model execution rather than initial training.
Three sources close to the situation tell Reuters that major AI labs are running into walls. Training these massive LLMs costs tens of millions of dollars, and the complex systems often break down. It can take months just to know if a model works as intended.
The slowdown seems to be hitting everyone. The Information recently reported that OpenAI's next big model, Orion, is barely improving on GPT-4o. Google is reportedly struggling with similar issues on Gemini 2.0, while Anthropic is rumored to have paused work on its Opus 3.5 model (Update: Anthropic-CEO Dario Amodei says, "the aim here is to shift the curve and then at some point there's going to be an Opus 3.5.")
"The 2010s were the age of scaling, now we're back in the age of wonder and discovery once again. Everyone is looking for the next thing," says OpenAI co-founder Ilya Sutskever, who now runs his own AI lab, Safe Superintelligence (SSI). Sutskever stresses that what's important now is to "scale the right thing."
This is quite a turn for Sutskever, who once pushed the "bigger is better" approach that defined OpenAI's GPT models. At SSI's recent funding round, he said he wanted to try a different approach to scaling than OpenAI.
"Everyone just says scaling hypothesis. Everyone neglects to ask, what are we scaling?" Sutskever said.
Having just left OpenAI last May, he is likely aware of OpenAI's latest model, o1, which follows the new scaling paradigm - unless plans have changed since he left.
AI labs try new approaches
AI labs are now looking at test-time compute, giving models more time to work through problems. The goal is to create AI systems that don't just calculate probabilities but think through problems step by step. Instead of quick answers, these models generate several solutions, evaluate them, and pick the best one.
OpenAI CEO Sam Altman said in early November that his company would focus on its new o1 model and its successors. Reuters reports that other major labs such as Anthropic, xAI, Meta, and Google DeepMind are trying similar methods.
Conventional language model development may continue, even with smaller gains because companies may end up using both approaches for an optimal cost-benefit scenario. For example, OpenAI's o1 does math better, while GPT-4 writes text more efficiently.
This shift might shake up Nvidia's control of AI hardware. While Nvidia dominates in graphics cards for training large language models, the move to test-time compute creates room for other chipmakers. Companies like Groq are making specialized chips for these tasks, though Nvidia's products still work well here too.