Content
summary Summary

During his presentation at the NeurIPS conference, AI pioneer and OpenAI co-founder Ilya Sutskever warned that AI development is approaching the limits of available training data.

Ad

While computing power keeps growing through better hardware, algorithms, and larger data centers, Sutskever explained that training data isn't keeping pace. He introduced the concept of "peak data," noting that "we have but one internet" and that won't change.

Sutskever compared training data to fossil fuels, noting that both resources have finite limits. While data files can be copied unlike oil, Sutskever's concern likely focuses on a different limitation: the actual knowledge and insights that AI systems can extract from this data, which cannot be infinitely duplicated or expanded.

From early vision to current constraints

Looking back at modern AI's beginnings, Sutskever recalled his 2014 "deep learning hypothesis," which proposed that a neural network with ten layers could match any human task performed in a fraction of a second. He chose ten layers simply because that's what could be practically trained at the time.

Ad
Ad

The hypothesis was based on similarities between artificial and biological neurons. Sutskever argued that if artificial neurons were even slightly similar to biological ones, large neural networks should be able to perform the same tasks as the human brain. One key difference remained: while the brain can reconfigure itself, AI systems require as much training data as they have parameters.

This idea sparked what Sutskever terms the era of pre-training, leading to models like GPT-2 and GPT-3. He credited his former colleagues Alec Radford and Anthropic founder Dario Amodei for driving this progress. Now, however, Sutskever believes this approach is reaching its natural limits.

New scaling horizons: Agents, Synthetic Data, and Test-Time Compute

Sutskever outlined several potential paths forward beyond pre-training: AI agents, synthetic data (which he described as a "big challenge"), and increased computing power during inference could help overcome training data limitations. The AI researcher recently described this period as a new "age of discovery" for the field.

Presentation slide by Ilya Sutskever highlighting three key future AI developments: agents, synthetic data, and inference compute.
OpenAI's Chief Scientist Ilya Sutskever outlines three critical developments shaping AI's future: the rise of AI agents, synthetic data generation, and inference computation time. These factors will fundamentally transform how AI systems evolve and operate. |Image: via Sutskever

According to Sutskever, tomorrow's AI systems will be fundamentally different from today's models. He explained that current systems are only minimally "agentic," but predicted this will change as future AI systems develop genuine abilities to think and reason independently.

This evolution, however, comes with challenges. Sutskever warned that increased reasoning ability leads to less predictability, pointing out that chess AIs already surprise even grandmasters with their moves.

Recommendation

On a positive note, he suggested this shift toward real reasoning could help reduce hallucinations, as future AI systems might use logical thinking and self-reflection to verify and correct their own statements—capabilities that current systems, which rely mainly on pattern recognition and intuition, don't have.

Pre-training stagnation becomes industry consensus and new ventures

This shift in thinking matches what's happening in the AI industry. OpenAI, Google, and Anthropic are reportedly hitting the ceiling with traditional pre-training methods in their newest language models.

Google's Gemini AI lead, Oriol Vinyals, recently explained that making models bigger isn't enough anymore - each improvement now requires exponentially more effort and resources.

In response, companies like OpenAI are exploring alternatives such as "test-time compute," which gives AI models more time and computing power to process information rather than just increasing their pre-training capacity.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Following his departure from OpenAI in May 2024, Sutskever founded Safe Superintelligence Inc (SSI), a startup that has raised more than $1 billion at a $5 billion valuation. The company, which operates from offices in Palo Alto and Tel Aviv, focuses on developing safe superintelligent systems.

SSI plans to maintain a small team of top engineers and researchers, with most of its funding allocated to computing power and hiring. The company says it's particularly interested in recruiting employees who aren't swayed by the AI industry's hype.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Ilya Sutskever, co-founder of OpenAI, warns that the data available for training AI models is limited and compares it to fossil fuels that will eventually be depleted, while computing power continues to increase.
  • Sutskever sees potential solutions to the limitations of training data, such as AI agents capable of independent thinking and reasoning, the use of synthetic data, and increased computing power during inference.
  • After leaving OpenAI, Sutskever founded the startup Safe Superintelligence Inc (SSI), which aims to develop safe superintelligence with a small team of top engineers and researchers, and has raised over one billion dollars in funding.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.