GPT-4 could have been trained on 2012 GPUs - it just would have been very expensive

A new simulator from Epoch AI shows that training models on the scale of GPT-4 would have been possible with older hardware, but at a significantly higher cost.

The simulator analyzes how efficiently a model uses FLOP (Floating Point Operations Per Second) compared to the computing power needed for training. Epoch AI's research shows that efficiency on the same hardware tends to decrease as models grow larger.

The company's data also reveals different patterns of efficiency across GPU generations. While newer architectures like the H100 can maintain higher efficiency rates for longer periods of time, older GPUs like the V100 show a steeper decline in efficiency as training size increases.

Line chart: FLOP utilization rates of three GPU models (V100, A100, H100) over different training sizes, showing decreasing efficiency. — V100, A100 and H100 have different efficiency curves, with the newer H100 and A100 architectures able to maintain high utilization rates for longer. | Image: Epoch AI

2012 tech could handle GPT-4

Epoch AI conducted an experiment simulating training on a GTX 580 GPU with 3 GB of memory. This was the same graphics card researchers used to train the groundbreaking AlexNet model in 2012.

The researchers estimate that GPT-4's training requires between 1e25 and 1e26 floating-point operations (FLOP). The simulation indicates this scale of training could have been achieved with 2012 technology, though at approximately ten times the cost of using modern hardware.

Understanding future hardware needs

The simulator enables complex simulations for training across multiple data centers. Users can specify the size of the data centers, the latency, and the bandwidth of the connections between the data centers. These parameters allow researchers to simulate how training runs could be distributed across multiple locations.

The tool also allows users to analyze performance differences between modern GPUs, such as the H100 and A100, as well as the effects of different batch sizes and training across multiple GPUs. The system generates detailed log files showing the output of the model.

Epoch AI says it developed the simulator to improve understanding of advances in hardware efficiency and to assess the impact of chip export controls. The company's goal is to increase understanding of the hardware requirements needed for large-scale training runs expected this decade.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

GPT-4 could have been trained on 2012 GPUs - it just would have been very expensive

2012 tech could handle GPT-4

Understanding future hardware needs

Researchers say they may have found a ladder to climb the "data wall"

AI learns math reasoning by playing Snake and Tetris-like games rather than using math datasets

New method adapts language models without training

OpenAI says GPT-5 shows 30 percent less political bias than previous models

OpenAI suddenly remembers that copyright law exists after a few days of wild Sora videos

OpenAI unveils Sora 2 video model with realistic physics, high-quality audio, and a new social app

GPT-4 could have been trained on 2012 GPUs - it just would have been very expensive

2012 tech could handle GPT-4

Understanding future hardware needs

Researchers say they may have found a ladder to climb the "data wall"

AI learns math reasoning by playing Snake and Tetris-like games rather than using math datasets

New method adapts language models without training