newsletter Newsletter

Google's multimodal AI model will enable complex search queries. The first search capabilities are expected to roll out in the coming weeks.

In May 2021, Google unveiled the MUM (Multitask Unified Model) multimodal artificial intelligence. The AI model follows the trend of multimodal trained transformational models such as OpenAI's DALL-E or CLIP. It is trained on text, image and video data in 75 languages.

According to Google, MUM represents the future of the search engine. The artificial intelligence is said to be significantly more powerful than the current BERT model and to have a deeper understanding of the world.

At its own "Search On" conference, Google announced new details about MUM and announced MUM-based features for Google Search.


Google's multimodal AI model finds the right socks

To illustrate the advantage of multimodal models, Google uses a simple example: the current Google search knows what a lion looks like, how it sounds, and how to spell its name. MUM, on the other hand, knows that the lion - even if it is a cat - does not make a good pet.

This ability of multimodal models to represent implicit connections between different concepts was also demonstrated by OpenAI's study of CLIP's neurons.

In practice, Google wants to use this ability for better search results and also give users the ability to make multimodal queries, such as a picture with a question about it. At the conference, Google showed two examples of this: finding colorful socks and gathering tips for bicycle repair.

Video: Google

In the first demonstration, the user scans a patterned shirt with Google Lens and then uses text input to ask Google to find socks with the same pattern.


In a second demonstration, the user takes a picture of a bicycle part and asks for repair tips. The MUM AI recognizes the part and suggests appropriate YouTube tutorials. According to Google, this is especially useful if you don't even know the name of the broken part.

Multimodal search: Google Lens becomes part of Google Search

To enable users to search for images and text, Google will integrate its image analysis software, Lens, into the Google app on iOS and the Chrome web browser. Going forward, Lens will always be available in the Google universe. According to Google, this means, for example, that while scrolling through images on a home decor blog, it will be possible to search for products in an image.

Multimodal search will be rolled out and extensively tested in the coming months. Presumably, Google wants to make sure that the biases inherent in giant AI models do not trickle down to the end user.

Because of MUM: Google Search gets new design

Google has also announced a redesign of Google Search. A new "Things to know" box will display useful information, such as instructions or further tips. In addition, search refinements will be suggested and more topic suggestions contributed by MUM will be displayed.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

MUM will also display topic suggestions for videos in search, including topics not directly mentioned in the video, which Google says is only possible through the multimodal model. Some of these features are expected to appear in the coming weeks.

In addition to MUM, Google showed off other improvements to Google Maps, better shopping features and a tool that shows city planners:inside where greening against heat waves is particularly worthwhile.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.