Content
summary Summary

In an interview, Sam Altman, CEO of OpenAI, stressed the importance of high-quality data for training AI models. Altman said the company currently has enough data for the next version after GPT-4.

Ad

In an interview at the AI for Good Global Summit, Altman mentioned the need for high-quality data in AI systems, whether it comes from humans or is synthetically generated. The possibility that too much AI-generated data could harm an AI system doesn't seem to concern Altman per se. He said that low-quality data from either source is a problem.

"I think what you need is high-quality data. There's low-quality synthetic data, there's low-quality human data," Altman said in an interview at the AI for Good Global Summit.

For now, OpenAI has enough data to train the next model after GPT-4, Altman said.

Ad
Ad

The OpenAI CEO also said that the company has been testing generating large amounts of synthetic data to try different ways of training AI.

But the main question is how AI systems can learn more from less data, rather than just generating massive amounts of synthetic data for training. Altman says it would be "very strange" if the best way to train a model was to "generate like a quadrillion tokens of synthetic data and feed that back in."

For Altman, the ability to learn efficiently from data is key, describing the core question as "how do you learn more from less data?" He cautions that OpenAI and other companies still need to figure out what data and methods work best to train increasingly powerful AI systems.

Science backs up Altman's comments by showing that better data leads to better AI performance. It also fits with OpenAI's strategy of spending hundreds of millions recently to license training data from major publishers.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • In an interview, OpenAI CEO Sam Altman emphasizes the importance of using high-quality data to train AI models, whether it is human-generated or synthetic.
  • OpenAI is experimenting with generating large amounts of synthetic data to explore different AI training techniques, but sees the key question as how AI systems can learn more with less data.
  • According to Altman, OpenAI currently has enough data to train the next iteration after GPT-4, but acknowledges that much scientific progress is still needed to find the most appropriate data and techniques for increasingly powerful AI systems.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.