Google I/O is the biggest AI show yet. Here's a look at some of the announcements.
This article is still being updated. It might be 5,249 news.
Google Gemini 1.5 and Gemini Flash
Google CEO Sundar Pichai announced that Gemini 1.5 Pro will increase its context window from one to two million tokens. The model is now available via API for all interested developers.
Google introduced a new model called Gemini Flash. It is optimized for speed and efficiency and is suitable for summaries, chat applications, image and video captions, and data extraction from long documents.
According to Demis Hassabis, CEO of Google DeepMind, it is lighter and less expensive than Gemini Pro, but just as powerful. This was achieved through "distillation" - transferring the main functions of Pro to the smaller model. Gemini 1.5 Flash has the same multimodal capabilities and a context window with one million tokens.
Both models are now available for public preview in Google's AI Studio and Vertex AI. The open source Gemma family is joined by the PaliGemma vision language model and the powerful Gemma 2 with 27 billion parameters.
New features for Gemini in Google Workspace
Google announced new features for Gemini in Google Workspace. According to the Google Workspace blog, Gemini now uses Gemini 1.5 Pro in the Workspace sidebar for more detailed answers.
The Gmail app is getting features like email summaries, contextual reply suggestions and Gmail Q&A. The "Writing Help" feature is now supported in Gmail and Docs for Spanish and Portuguese on desktop.
The new features are available to Workspace Labs and Alpha users and will roll out to businesses and consumers next month via Gemini for Workspace add-ons and the Google One AI Premium plan.
Another new feature is the ability to create a virtual teammate with their own Workspace account. This teammate can be configured for specific tasks, such as monitoring and tracking projects, organizing information, providing context, identifying trends from data analysis, and collaborating with the team.
In Google Chat, the virtual teammate can join all relevant rooms and answer questions based on conversation history, Gmail threads, and anything else they have access to. But according to Aparna Pappu, vice president and GM of Workspace, this is just a technical demonstration for now.
Google still has a lot of work to do on how to integrate such agentive experiences into Workspace, including the ability for third parties to create their own versions.
Gemini Live and personalized chatbots "Gems"
New ways to interact with Gemini include chat in Google Messages and a mobile conversation experience called Live with natural-sounding voice technology.
Gemini Advanced subscribers will soon be able to create personalized versions of Gemini called Gems that can act as fitness coaches, coding partners or writing coaches - Google's alternative to GPTs. Gems can be set up by simply describing what you want them to do and how you want them to respond.
Google SGE continues to roll out
Gemini-generated summaries, now called "AI Overviews," will be available to all U.S. users in Google Search this week, as will more complex queries that include multiple questions. Users will also soon be able to search for videos. Other countries will follow soon.
Those who sign up for Search Labs will have access to more AI features such as voice simplification, complex search, meal and travel planning, and video search. All of the new search features are based on Gemini, which has been adapted for search.
Project Astra
Astra is Google's vision of a multimodal AI assistant for everyday life. It can process text, video, and audio in real time. In a video, Google showed Astra identifying speakers, crayons and other objects in response to a camera image and voice commands.
It was able to answer questions about the objects, explain them, or generate creative output about the objects. Astra could also recognize and explain diagrams or program code on a whiteboard.
The application ran on a smartphone and on a prototype pair of tech headsets with a video camera. Some of these features will be integrated into Google products like the Gemini app later this year.
AI for images, video and music: Imagen 3, Veo, Music AI
Google also unveiled its latest AI models for creating media content: Veo, for creating 1080p videos, and Imagen 3, for generating images from text descriptions.
Veo is supposed to have an advanced understanding of natural language and visual semantics, and can produce videos over one minute in length. Veo will be available immediately to select creators in Google's VideoFX tool, and will also be integrated into YouTube Shorts and other products.
Imagen 3 promises photorealistic, lifelike images with fewer artifacts. According to Google, it is the most powerful text-to-image model developed in-house to date. It will be available through Google ImageFX.
Google is also testing the Music AI Sandbox, a set of tools to help create songs and beats, with musicians including Wyclef Jean and Bjorn. It is part of the MusicFX experiment.
Experiments are available in 110 countries and 37 languages. Google works with artists to develop the tools responsibly. All content created is digitally tagged with the SynthID. Verified users can try out the experiments at labs.google.
AI-based photo chatbot
Google announces Ask Photos with Gemini, an artificial intelligence chatbot for the Google Photos application. The feature will be available to Google One subscribers in the U.S. in the coming months.
With Ask Photos, users can use Gemini to find specific images in their gallery by asking questions like, "Show me the best photo of each national park I've visited. The AI takes into account GPS information and decides which images to select. Users can give Gemini feedback on which images they prefer.
Ask Photos can also find the best photos from a vacation and generate captions for social media. The queries are not stored, but processed in the cloud, Google says, emphasizing that privacy is respected. The feature is experimental and initially available to paid users.
Axion and Tensor
Google also introduced the Trillium chip (TPU v6) for AI datacenters, which is nearly five times faster than the previous version. According to CEO Sundar Pichai, demand for AI chips has increased by a factor of 1 million over the past six years. Google's custom chips are one of the few alternatives to Nvidia's market-dominating processors.
The Trillium chip delivers 4.7 times the processing power and is 67 percent more power efficient than its predecessor, the TPU v5e. The chips are used in pods of 256 chips that can be scaled to hundreds of pods. The new chip will be available to Google Cloud customers by the end of 2024.
More AI for Android
Google is integrating its AI technology directly into the Android operating system. With Circle to Search, students can now use gestures to search for homework help to get step-by-step instructions for physics and math problems.
Android's built-in Gemini assistant is designed to better understand context and will soon be used in more applications, such as inserting generated images into messages or finding information in YouTube videos and PDFs.
Images generated by Gemini can be dragged and dropped into apps like Gmail or Google Messages. For YouTube videos, the "Ask This Video" feature can search for specific information in a video. Gemini Advanced users can use Ask This PDF to quickly find answers in PDF documents without having to flip through pages.
Gemini Nano with multimodality will be available on Pixel devices later this year. This on-device model will understand images, sound and speech in addition to text. In Talkback, it will provide clearer descriptions of images and warn of phone scams. More AI capabilities for Android are on the way. Developers can now work with Gemini Nano and Gemini in Android Studio.