Andreas Braun, CTO of Microsoft Germany, announced the introduction of GPT-4 for next week. The models will be multimodal.
At the “AI in Focus – Digital Kickoff” event, Microsoft Germany presented business applications of large language models and talked about its cooperation with OpenAI and new Azure offerings resulting from it.
As Silke Hahn reports for Heise, Braun announced a GPT-4 reveal next week: “Next week we will present GPT-4, there we have multimodal models that offer completely different possibilities – for example videos,” Braun said.
Can GPT-4 generate video?
There are two intriguing aspects to this statement: first, Braun refers to GPT-4 in the plural, which could mean that it consists of multiple models networked together. Already in early 2020, it was rumored that OpenAI would train a huge multimodal AI model by merging several projects.
Second, Braun explicitly talks about “videos”. However, one should not jump to the conclusion that GPT-4 is a full-scale video generator. The technology exists, but it is still very experimental and computationally intensive.
Braun could also relate video multimodality to input. That is, GPT-4 can process video or image prompts textually. It is possible, for example, that GPT-4 will be able to describe the content of an image, video, or audio and then use that description in context for further textual tasks.
It fits that GPT-4’s context window is said to be four times larger than ChatGPT’s, and that OpenAI has trained a powerful speech recognition model with Whisper that can automatically convert audio from videos to text and thus make spoken video content usable for AI training.
Microsoft Germany does not comment on Braun’s statement
A spokesperson for Microsoft Germany declined to comment on Braun’s GPT-4 statement. He did, however, point to a March 16 event titled “The Future of Work with AI,” where Microsoft CEO Satya Nadella plans to talk about using AI tools for productivity.
That would be a fitting setting for GPT-4’s unveiling – with the caveat that GPT-4 is still an OpenAI product. But the billion-dollar collaboration blurs the lines between the two companies, and Microsoft had previously secured exclusive rights to the GPT-3 model.
OpenAI CEO Sam Altman said in the fall of 2021 that GPT-4 will definitely be a text-based model without multimodality, but that he expects multimodal models to overtake pure text models in text generation in the future.
These plans might have changed and the then planned GPT-4 has become GPT-3.5, so that the GPT-4 to be presented now already offers multimodality.
Altman announced in mid-January that GPT-4 would not be released until it was safe and responsible to do so, and dampened expectations for the model’s capabilities a little later. Social Media rumors about the model’s gigantic size were “ridiculous” and made up out of thin air, Altman said.