AI in practice

Google to unveil new language model and Bard image AI - Report

Matthias Bastian

Midjourney prompted by THE DECODER

Google's I/O 2023 developer conference starts on May 10, and some information has leaked in the run-up.

According to CNBC, Google is planning to present PaLM 2, the successor version of Google's large language model PaLM. The new model is supposed to support more than 100 languages and is internally codenamed "Unified Language Model". PaLM 2 is expected to solve a wide range of programming and math tasks, as well as handle creative writing and text analysis.

Google unveiled the first PaLM model in April 2022. With 530 billion parameters, it was one of the largest and arguably most powerful models back then, even capable of explaining simple jokes. The name stands for "Pathways Language Model." Pathways, in turn, is Google's long-term vision for more general AI models.

Google's chatbot Bard is based on PaLM, which has been available via an API since March 2023. With Med-PaLM 2, Google offers a PaLM version fine-tuned for medical tasks.

Realizing one's own potential with generative AI

Google is unveiling the new PaLM model and other generative AI innovations under the banner of "helping people realize their full potential." In addition to PaLM 2, Google will introduce "generative experiences" for its Bard chatbot and Google Search.

For Bard, an AI image generator could be available directly in chat, CNBC suggests. Google could use its Imagen model, which was announced in May 2022 but has yet to be rolled out. It has performed quite well in benchmarks. Google is also reportedly implementing image and template generation in AI Workspace Collaborator, Google's AI solution for its Office suite.

Microsoft offers an implementation of an "advanced version" of Open AI's DALL-E 2 image system in its Bing chatbot, but its results cannot match Midjourney v5 or high-quality Stable Diffusion implementations.

Multi Bard, Big Bard, Giant Bard

According to CNBC, Google is experimenting internally with a multimodal version of Bard that is based on more data and can solve complex mathematical and coding problems. OpenAI has also announced a multimodal version for GPT-4 and ChatGPT that can understand images and solve tasks related to image understanding, such as captions. So far, this GPT-4 version is only available to a small circle.

Besides this "Multi Bard" model, Google is said to be working on "Big Bard" and "Giant Bard". The names suggest that Google is working on different performance Bard models, which will probably be implemented on a cost-benefit basis.

Google CEO Sundar Pichai recently referred to the current version of the Bard as a "souped-up Civic" and said that a true sports car is yet to come. "But we are going to be training fast. We clearly have more capable models."

Just a few days ago, an internal document was leaked in which senior AI engineer Luke Sernau described Google's AI advantage over open source as slim. He sees the open-source movement as having an advantage, in part because it can optimize smaller models with high-quality data more quickly. OpenAI has the same problem, he said.

Google should rather take the lead in the open-source movement and work with it, as it did with Android, Sernau said. Mastering the ecosystem is more important than providing the most powerful models. The document has been shared thousands of times internally, according to Bloomberg.

Sources: