Google demonstrates the central role that large AI models will play in the company’s future. The vision is that search will be available everywhere – even in real life.
Even before Google unveiled new hardware like the Pixel 7, Pixel 6a or Google Buds Pro, CEO Sundar Pichai spoke at length about the company’s latest AI achievements: monolingual translation models enable 24 new languages for Google Translate and buildings recognized by artificial intelligence now account for 20 percent of all buildings listed in Google Maps.
Since July 2020, AI has quintupled the number of detected structures on the African continent from 50 million to 300 million, he said. According to Pichai, the AI model developed by Google is also freely available and is used by the United Nations and the World Bank.
Visually stunning: With Immersive View, 3D copies of major cities can be streamed live from the Google Cloud to any device thanks to the machine merging of aerial and satellite images as well as photos. Google also uses so-called neural rendering techniques for a virtual visit to a restaurant. This makes Google Maps a potential backbone of the augmented reality cloud.
Deepmind technology improves YouTube
Last year at I/O, Google introduced automatically generated chapters for YouTube videos. This year Pichai is announcing more video enhancements thanks to multimodal AI models from Deepmind. The technology analyzes text, audio and images from videos to generate even better chapter suggestions, according to Pichai.
Speech recognition for automatic YouTube video transcriptions is now available for all Android and iOS users. Automatically generated translations are also available for YouTube on mobile platforms, and an update for the Ukrainian language will follow soon.
For Google’s Workspace products, Pichai showed off the summary feature recently released for Google Docs: a language model generates summaries for longer documents at the push of a button. This feature is expected to appear soon for products like Google Chat and Google Meet. Pichai also announced AI models for Google Meet that improve video and lighting quality.
Multimodal search: CTRL+F for real life
Senior Vice President Prabhakar Raghavan, responsible for Google Search among other things, reveals upcoming updates to the search engine’s recently released Multisearch feature. With Multisearch, users can combine image and text searches, such as taking a picture of a water bottle and searching for a variant with a flower motif.
Later in the year, Multisearch can also perform local searches: Anyone searching for a photo of a pizza using the “near-me” function will be shown pizzerias nearby. The same should work for numerous objects – from food to commodities.
The next evolution of Multisearch will be Scene Exploration: Instead of a single image, users will pan the camera over a scene and receive answers to questions that match the context of the image.
Raghavan shows an example in which high-quality dark chocolate without nuts is filtered out of a whole range of products in a supermarket. He says the technology has numerous applications, such as in conservation to quickly identify rare plants, or in pharmacies to find a specific cream.
Raghavan describes Scene Exploration as “CTRL+F for the world”, i.e. as a universal search function for the real world. This idea is likely to develop its full effect with AR glasses in particular, to which Google also clearly committed itself at I/O.
LaMDA 2: Beta test for Google’s AI future
Last year, Google already gave a glimpse into the development of the large AI models LaMDA and MUM. While MUM is to serve as the basis for multimodal search, LaMDA is one of the large language models that Google wants to work with directly. There was already an update on LaMDA’s capabilities and issues in early 2022. According to Pichai, thousands of Google employees have tested the language model since its development began.
At this year’s developer conference, Google now showed LaMDA 2, an improved version of the big language model. As announced last year, Google is holding back with the release for now.
Instead, LaMDA 2 will be made available to more and more select people over the course of the year via the “AI Test Kitchen” app. The app will be rolled out in the US in the coming months and will be available via invitations for now.
There are three LaMDA-2 apps available in the app for now, which are different and sometimes particularly sophisticated applications. In “Imagine it”, LaMDA generates interesting descriptions and is probably the least constrained.
In “Talk About It”, LaMDA is supposed to talk exclusively about a specific topic, such as dogs in the “Dogs Edition”. For questions that lead away from the topic, LaMDA is supposed to lead the conversation back to dogs.
In “List It,” the language model generates a list of useful ideas or lower-level tasks. In a demonstration, for example, LaMDA creates instructions for planting a vegetable garden.
The app allows users to provide feedback that will improve the model in the long term. Google wants to collaborate with researchers from different disciplines, human rights activists and policy makers to collect feedback.
In the future, other AI models could be tested in the app. Google can thus use the existing mobile infrastructure to test and further develop its own AI products in a controllable environment.
Towards the end of the LaMDA presentation, Pichai talks about the impressive capabilities of the large language model PaLM and shows an example in which the model correctly answers a question in Bengali and translates it into English. He emphasizes that PaLM has never explicitly learned to answer or translate questions.
Large language models are likely to play an even more central role in Google’s products in the future: “We are so optimistic about the potential of language models. One day we hope we can answer questions on more topics in any language you speak, making knowledge even more accessible in Search and across all of Google,” Pichai concludes.