Content
summary Summary

A new simulator from Epoch AI shows that training models on the scale of GPT-4 would have been possible with older hardware, but at a significantly higher cost.

Ad

The simulator analyzes how efficiently a model uses FLOP (Floating Point Operations Per Second) compared to the computing power needed for training. Epoch AI's research shows that efficiency on the same hardware tends to decrease as models grow larger.

The company's data also reveals different patterns of efficiency across GPU generations. While newer architectures like the H100 can maintain higher efficiency rates for longer periods of time, older GPUs like the V100 show a steeper decline in efficiency as training size increases.

Line chart: FLOP utilization rates of three GPU models (V100, A100, H100) over different training sizes, showing decreasing efficiency.
V100, A100 and H100 have different efficiency curves, with the newer H100 and A100 architectures able to maintain high utilization rates for longer. | Image: Epoch AI

2012 tech could handle GPT-4

Epoch AI conducted an experiment simulating training on a GTX 580 GPU with 3 GB of memory. This was the same graphics card researchers used to train the groundbreaking AlexNet model in 2012.

Ad
Ad

The researchers estimate that GPT-4's training requires between 1e25 and 1e26 floating-point operations (FLOP). The simulation indicates this scale of training could have been achieved with 2012 technology, though at approximately ten times the cost of using modern hardware.

Understanding future hardware needs

The simulator enables complex simulations for training across multiple data centers. Users can specify the size of the data centers, the latency, and the bandwidth of the connections between the data centers. These parameters allow researchers to simulate how training runs could be distributed across multiple locations.

The tool also allows users to analyze performance differences between modern GPUs, such as the H100 and A100, as well as the effects of different batch sizes and training across multiple GPUs. The system generates detailed log files showing the output of the model.

Epoch AI says it developed the simulator to improve understanding of advances in hardware efficiency and to assess the impact of chip export controls. The company's goal is to increase understanding of the hardware requirements needed for large-scale training runs expected this decade.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • AI research company Epoch AI has published an interactive simulator for distributed training that simulates the computing power required to train large language models.
  • A simulation with a GTX 580 GPU from 2012 suggests that training GPT-4 would have been possible with the technology of that time, but at a cost about ten times higher than today.
  • The simulator enables more complex simulations, for example for training in multiple data centers, and should help to better understand the significance of future advances in hardware efficiency.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.