OpenAI has released an update to its GPT-4o model that includes bug fixes and performance improvements, but OpenAI can't give us any more details, although it would like to.
The update is not an entirely new model but an iteration based on experiments and qualitative feedback from ChatGPT users, who "tend to prefer" the new version, OpenAI writes, putting to rest rumors of a possible test of a new frontier model. According to the release notes, the update includes "bug fixes and performance improvements."
Interestingly, OpenAI says it would like to provide more details on how the model responses differ but cannot due to the lack of advanced research into methods for granularly evaluating and communicating improvements in model behavior.
When possible, OpenAI points out new capabilities and specific improvements and will continue to do so. Meanwhile, the team is constantly working to improve the model by adding good data, removing bad data, and testing new research methods based on user feedback and offline evaluations. This model update follows the same approach.
This situation illustrates the challenge OpenAI and other AI companies face in accurately quantifying and communicating model improvements, which are often subtle changes based on different data and experiments.
In addition, even news of small updates creates high expectations and speculation in the AI community, as everyone looks for evidence that the current GPT-4 level, which has been in place for about a year and a half, can be significantly surpassed.
OpenAI tries to manage these expectations by stressing that the latest model is not an entirely new frontier model, but an improvement of the existing GPT-4o. Rumors have been circulating for a few days about a possible imminent release of a much more powerful model based on "Project Strawberry."
In the chatbot arena, where people rate LLM chatbot performance without knowing which chatbot they are chatting with, GPT-4o has reclaimed the top spot, slightly ahead of Google Gemini 1.5. However, the validity of such tests is limited and can vary greatly depending on the task. Personal testing is the only way to determine the best option for individual needs.
For API users, the latest model snapshot via the API "gpt-4o-2024-08-06" supports structured outputs and has an extended maximum output of 16,384 tokens. The dynamic model "chatgpt-4o-latest" always refers to the current version of GPT-4o in ChatGPT, which is the latest model mentioned at the beginning. All models were trained with data up to October 2023.