As part of his trip to Europe, OpenAI CEO Sam Altman gave an update on OpenAI's roadmap.
According to Altman, the lack of computing power is slowing down OpenAI's near-term plans and leading to customer complaints about the reliability of OpenAI's API.
The GPU shortage also limits the API for fine-tuning models, he said. OpenAI does not yet use more efficient fine-tuning methods such as low-rank adaptation (LoRA), which has been very useful to the open-source community.
The 32k context window version of GPT-4 is also not yet deployable due to a lack of computing power, and access to private models with budgets over $100,000 is limited. Still, Altman believes that a context window of up to one million tokens is realistic this year.
Anything beyond that, he says, will require solving the "O(n^2)" scaling problem for transformer model attention: As the size of the context window increases, the amount of computation required increases as the square of the number of tokens. Doubling the size of the context window quadruples the computation, tripling it ninefold, and so on. Solving this problem, Altman says, will require a scientific breakthrough.
Making GPT-4 cheaper
Reducing the cost of GPT-4 computation is a top priority for OpenAI. Already from GPT-3 to GPT-3.5 and ChatGPT, OpenAI was able to massively reduce the cost of computation. This has been passed on to customers through significantly lower API costs.
The latest models should be available within the year via the fine-tuning API, as well as a new API that can remember previous conversations, so they don't have to be sent again with each new API call. This will further reduce costs.
ChatGPT's plugins, on the other hand, are unlikely to make it into the API, according to Altman. He believes that ChatGPT in apps is more interesting than apps in ChatGPT. According to Altman, the plugins, except for browsing, still lack product-market fit.
In this context, Altman assures that OpenAI is not planning any other products besides ChatGPT, as they would rather not compete with their developer community. The vision for ChatGPT is to optimize the OpenAI APIs and provide an intelligent assistant. There are many other applications for language models that OpenAI will not touch.
Multimodality won't arrive until 2024
For the coming year, OpenAI has put multimodality on its agenda. Multimodality means that an AI model can process pictures as well as text, and in the future perhaps audio and video or 3D models.
OpenAI has already shown at the GPT-4 launch that the model can in principle process images, i.e. generate text or code to images or based on images. Due to the GPU limitation mentioned above, this feature is currently not available.
Whether OpenAI is working on additional multimodal models is not known. GPT-5 is expected to add more multimodality, but will not go into training for another six months, according to Altman. Google Deepmind may therefore have a head start on multimodality with its Gemini model.
Altman also commented on his recent statement about the "end of an era of giant AI models," saying that OpenAI will continue to try to train larger models and that the scaling law still applies, i.e., larger models promise more performance. However, models will no longer double or triple in size every year, as this is not sustainable.