AI research
Maximilian Schreiner

Scaling laws for precision: AI researcher sees "perfect storm" for the end of scale

Midjourney prompted by THE DECODER
Scaling laws for precision: AI researcher sees
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
Content
summary Summary

A new study by researchers from Harvard University, Stanford University, and other institutions shows that precision—the number of bits used to represent numbers in models—plays a more significant role in scaling laws than previously thought.

Ad

The study, titled "Scaling Laws for Precision," demonstrates that precision significantly affects language model performance. According to the researchers, previous scaling laws describing how model performance changes with parameter count and training data volume largely ignored precision.

The research team conducted over 465 training runs to test their hypotheses. They trained language models with precisions ranging from 3 to 16 bits and quantized them to various precision levels after training. The models contained up to 1.7 billion parameters and were trained on up to 26 billion tokens.

A key finding shows that over-trained language models become more sensitive to quantization after training. A model is considered over-trained when its ratio of training tokens to parameters significantly exceeds the "Chinchilla-optimal" value of about 20. The researchers examined ratios up to 1000.

Ad
Ad

The experiments revealed that performance degradation from post-training quantization increases with training data volume. When a model is quantized after training, additional training with more data can actually be harmful, as it amplifies quantization errors.

New precision scaling laws emerge

Based on their experiments, the researchers developed new scaling laws that incorporate precision into the equations. Another important finding concerns the compute-optimal precision during pre-training. According to the study, this is generally independent of the compute budget when jointly optimizing parameter count, data, and precision.

The common practice of training models at 16 bits is suboptimal, since many bits are unnecessary. However, training at 4 bits requires a disproportionate model size increase to maintain loss scaling. The researchers' calculations suggest that 7-8 bits are compute-optimal for larger models.

The situation changes when model size is fixed from the start: larger and better-trained models should be trained with higher precision—for example, models like Llama 3.1 8B with 16 bits.

However, actual compute savings also depend on hardware support for lower precisions. Additionally, the models studied here (up to 1.7 billion parameters) haven't been tested at the largest practical scale. The general trends should still apply to larger models.

Recommendation
AI research

OpenAI's stunning video generation debut Sora feels like a GPT-4 moment

As hardware development increasingly supports low-precision computing, these new scaling laws can help developers find the optimal balance between model size, data volume, and precision.

"The perfect storm for the end of scale"

For AI researcher Tim Dettmers from Carnegie Mellon University and Allen AI, this work is "the most important paper in a long time." He says it clearly shows that the community has reached the limits of quantization—with implications for AI research and GPUs.

Combined with physical limitations, he sees a "perfect storm" for the end of scalability. Efficient low-precision methods like 8-bit training are reaching their limits, especially for large models like LLaMA 3.1 with 405 billion parameters. Dettmers sees few remaining options for efficiency gains, such as larger data centers, specialized models, or knowledge distillation. He believes the paradigm will soon shift from pure scaling toward human-centered applications. "Many of us efficiency researchers had some hunch that our data reflects this trend, but we had no hard evidence. Predictive trends that are verified by more experiments (scaling laws) is as robust evidence as you can get. So now it is very clear where we are."

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • In a new study, researchers show that the numerical precision of AI models is more important than previously thought. They conducted 465 training runs with language models that had between 3 and 16 bits of precision.
  • The scientists found that 7-8 bits is optimal for larger models. The common practice of 16-bit training wastes resources, while 4-bit training requires too many compromises. However, they still recommend higher accuracies for fixed model sizes.
  • AI researcher Tim Dettmers sees the results as an indication of the limits of quantization. He expects a shift from pure scaling to specialized models and human-centered applications, as the increase in efficiency due to low precision reaches its limits.
Sources
Arxiv
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
AI research

REPA accelerates diffusion model training by a factor of 17.5

News, tests and reports about VR, AR and MIXED Reality.
Spatial Ops' single-player mode delivers a thrilling new spin on mixed reality shooters Looking Glass Go Review: Do you need this 3D display? Two premium mixed reality games are coming soon to Apple Vision Pro MIXED-NEWS.com
AI in practice

Adobe launches web app to protect creatives from unwanted AI use

AI research

Researchers collect 950,000 hours of open source speech data for EU languages

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Scaling laws for precision: AI researcher sees "perfect storm" for the end of scale

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI research

Scaling laws for precision: AI researcher sees "perfect storm" for the end of scale

AI in practice

OpenAI's new "Orion" model reportedly shows small gains over GPT-4

AI in practice

OpenAI adds web search to ChatGPT, and may just kill the WWW as we know it

Google News