summary Summary

As part of his trip to Europe, OpenAI CEO Sam Altman gave an update on OpenAI's roadmap.

According to Altman, the lack of computing power is slowing down OpenAI's near-term plans and leading to customer complaints about the reliability of OpenAI's API.

The GPU shortage also limits the API for fine-tuning models, he said. OpenAI does not yet use more efficient fine-tuning methods such as low-rank adaptation (LoRA), which has been very useful to the open-source community.

The 32k context window version of GPT-4 is also not yet deployable due to a lack of computing power, and access to private models with budgets over $100,000 is limited. Still, Altman believes that a context window of up to one million tokens is realistic this year.


Anything beyond that, he says, will require solving the "O(n^2)" scaling problem for transformer model attention: As the size of the context window increases, the amount of computation required increases as the square of the number of tokens. Doubling the size of the context window quadruples the computation, tripling it ninefold, and so on. Solving this problem, Altman says, will require a scientific breakthrough.

Making GPT-4 cheaper

Reducing the cost of GPT-4 computation is a top priority for OpenAI. Already from GPT-3 to GPT-3.5 and ChatGPT, OpenAI was able to massively reduce the cost of computation. This has been passed on to customers through significantly lower API costs.

The latest models should be available within the year via the fine-tuning API, as well as a new API that can remember previous conversations, so they don't have to be sent again with each new API call. This will further reduce costs.

ChatGPT's plugins, on the other hand, are unlikely to make it into the API, according to Altman. He believes that ChatGPT in apps is more interesting than apps in ChatGPT. According to Altman, the plugins, except for browsing, still lack product-market fit.

In this context, Altman assures that OpenAI is not planning any other products besides ChatGPT, as they would rather not compete with their developer community. The vision for ChatGPT is to optimize the OpenAI APIs and provide an intelligent assistant. There are many other applications for language models that OpenAI will not touch.


Multimodality won't arrive until 2024

For the coming year, OpenAI has put multimodality on its agenda. Multimodality means that an AI model can process pictures as well as text, and in the future perhaps audio and video or 3D models.

OpenAI has already shown at the GPT-4 launch that the model can in principle process images, i.e. generate text or code to images or based on images. Due to the GPU limitation mentioned above, this feature is currently not available.

Whether OpenAI is working on additional multimodal models is not known. GPT-5 is expected to add more multimodality, but will not go into training for another six months, according to Altman. Google Deepmind may therefore have a head start on multimodality with its Gemini model.

Altman also commented on his recent statement about the "end of an era of giant AI models," saying that OpenAI will continue to try to train larger models and that the scaling law still applies, i.e., larger models promise more performance. However, models will no longer double or triple in size every year, as this is not sustainable.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • OpenAI's short-term roadmap suffers from a lack of computing power. Multimodal features for GPT-4 will therefore not be available until next year.
  • The priority for OpenAI is to make GPT-4 more efficient and less expensive, and to increase the context window. Altman thinks up to a million tokens this year is realistic.
  • ChatGPT's plugins probably won't make it into the API, he said, because they still lack the product-market fit. Instead, an API is planned that will store conversations so that they do not have to be processed with every request.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.