Content
summary Summary

OpenAI's GPT-3.5 may be a perfect example of the efficiency potential of large AI models.

The model behind the free ChatGPT variant has "only" 20 billion parameters. That's according to a benchmark paper on code models published by Microsoft. OpenAI had not commented on the number of parameters when it introduced GPT-3.5.

Image: Microsoft

The "leak" is interesting in that OpenAI presumably massively distilled and compressed the 175 billion (!) parameter GPT-3 model it introduced in May 2020, achieving efficiency gains in inference and speed. With GPT-3.5, OpenAI has increased generation speed while significantly reducing cost.

The example also shows that pure model size is probably less important than the variety and quality of the data and the training process. In GPT-3.5, OpenAI used additional human feedback (RLHF) to optimize output quality.

Ad
Ad

Is GPT-4.5 coming?

The question now is whether OpenAI will pull off a similar feat with GPT-4. It is rumored that GPT-4 is based on a much more complex Mixture-of-Expert architecture, combining 16 models with ~111 billion parameters into a gigantic ~1.8 trillion parameter model.

GPT-4 is significantly pricier to run than other AI models. GPT-3.5 is $0.0048 per 1000 tokens, while GPT-4 8K is 15x pricier at $0.072 per 1000 tokens. However, the output quality is also much better than GPT-3.5.

In particular, OpenAI's original March 2023 GPT-4 model "0314" follows input prompts more closely and produces higher quality generations than GPT-3.5. And according to our repeated anecdotal tests over the past few months for editorial work, it also outperforms OpenAI's newer GPT-4 model "0613" released this summer, which in turn writes text significantly faster.

OpenAI may have already turned the efficiency screw here. Microsoft and open-source models are reportedly squeezing OpenAI's margins.

OpenAI CEO Sam Altman said in April 2023 that he believes the era of large models may be coming to an end, and that they will improve models in other ways. Data quality will likely be a key factor.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI's GPT-3.5, the model behind the free ChatGPT variant, is said to have "only" 20 billion parameters, compared to 175 billion parameters for GPT-3, according to a Microsoft benchmark paper.
  • This suggests that OpenAI was able to significantly distill and compress the model, resulting in efficiency gains in inference and speed. The size of the model appears to be less important than the variety and quality of the data and the training process.
  • The question remains whether OpenAI can achieve similar efficiency gains with GPT-4, which is based on a more complex mixture-of-experts architecture. GPT-4 is currently significantly pricier to run, but also provides higher quality than GPT-3.5.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.