Google has unveiled Gemini 1.5, a significant update to its line of AI models. Its main feature is an unprecedentedly large token context length.
According to Google, Gemini 1.5 features a new Mixture-of-Experts (MoE) architecture that makes it more efficient to train and deploy. Demis Hassabis, CEO of Google DeepMind, noted that Gemini 1.5 Pro, the first model of this latest generation, offers performance comparable to Gemini 1.0 Ultra, but requires less computing power.
The most groundbreaking feature of Gemini 1.5 is the long context window. Gemini 1.5 Pro, the first model to be released, comes with a standard context window of 128,000 tokens. However, a limited group of developers and enterprise customers will have early access to a version that can handle up to 1 million tokens. According to Google, this will enable it to handle huge data arrays - for example, an hour of video, 11 hours of audio, codebases with more than 30,000 lines, or documents with more than 700,000 words. OpenAI's GPT-4 Turbo has 128,000 tokens, and Anthropics Claude 2.1 has 200,000 tokens. Google's research has tested the model with up to 10 million tokens, demonstrating its ability to effectively manage massive amounts of information.
Remarkably, in the "Needle In A Haystack" test, Gemini 1.5 Pro located the target text 99 % of the time within data blocks of up to 1 million tokens, addressing the "lost in the middle"-phenomenon. For example, it achieves perfect accuracy in finding hidden keywords in nearly a day's worth of audio. It also effectively retrieves information from random frames within a three-hour video, Google's Gemini Team said in the technical report.
Gemini 1.5 Pro gets closer to Gemini 1.0 Ultra
The core capabilities of Gemini 1.5 Pro stretch across a wide range of benchmarks in text, code, image, video, and audio modalities. Google indicates that Gemini 1.5 Pro has an 87.1% win-rate over Gemini 1.0 Pro and a 54.8% win-rate over Gemini 1.0 Ultra, based on 31 benchmarks. The new model demonstrates improvements in various domains, including Math, Science, and Reasoning, Multilinguality, Video Understanding, and Code.
Developers and enterprise customers can access a limited preview of Gemini 1.5 Pro through AI Studio and Vertex AI. Google is offering this preview for free during the testing phase, although users should expect longer latency times with the experimental feature. Google plans to introduce pricing tiers based on the size of the model's context window in the future.
"We’ll also introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model," said Jeff Dean, Google DeepMinds Chief Scientist.
If Google can maintain the accuracy and performance of the Gemini 1.5 family of models with 1 million or even 10 million token context windows in experimental models, this and other models to follow will enable new applications for multimodal models in science and other domains. It also shows that Google is capable of iterating on its Gemini family rather quickly, and that the race between Google and OpenAI and Microsoft has only just begun.