Gemini co-lead Oriol Vinyals addresses criticism of Google Deepmind's staged multimodal demo

Update

Added video demo and statement from Gemini co-lead Oriol Vinyals

Update from December 9, 2023:

Gemini co-lead Oriol Vinyals addressed criticism of Google's staged Gemini hands-on demo on X, stating that "all the user prompts and outputs in the video are real, shortened for brevity."

The staged video people criticized supposedly "illustrates what the multimodal user experiences built with Gemini could look like," and it was made "to inspire developers," according to Vinyals.

He even took the time to demo the developer environment, generating AI output with a combination of images and prompts similar to what Google showed in the video.

Video: Oriol Vinyals via X

It's not real-time video analytics combined with speech, as Google showed in the video below. But it does show that the underlying capabilities needed for such a use case are part of Gemini Pro and Ultra - which isn't surprising since we already know such capabilities from GPT-4 vision.

Original article from December 8, 2023:

Google took a fake-it-till-you-make-it approach to demonstrating Gemini's multimodal capabilities

A staged demo video leaves developers and employees in doubt about the true capabilities of Google's new Gemini language model.

In the video, titled "Hands-on with Gemini: Interacting with multimodal AI," Google shows off the AI model's impressive voice interaction and real-time visual response capabilities.

Recommendation

AI in practice

Ordinary chatbot answers could be an asset in court, judge suggests

After the demonstration, however, it turned out that the voice interaction did not exist and the demonstration was not in real time. Instead, Google used still images from the video with specific text prompts to get the results. In the video description Google states: "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity."

According to Bloomberg, Google admits that the actual demonstration involved the use of still images from the video and text prompts, rather than Gemini predicting or responding to changes in real time. You can check out a making-of of the video on Google's developer blog.

Gemini fake demo faces internal criticism

According to sources from Bloomberg and The Information, Google employees have expressed concern and criticism internally about the demo video. One Google employee stated that the video painted an unrealistic picture of how easy it is to achieve impressive results with Gemini.

The staged demo also became the subject of memes and jokes within the company, with employees sharing images and comments poking fun at the discrepancies between the video and the actual AI system.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Despite the controversy surrounding the demo video, Google insists that all user input and output shown in the video is real, even if the video suggests a real-time implementation that does not yet exist.

Eli Collins, vice president of products at Google DeepMind, told Bloomberg that the duck-drawing demo is still in the research stage and not yet part of Google's products.

"It’s a new era for us," Collins told Bloomberg. "We’re breaking ground from a research perspective. This is V1. It’s just the beginning."

Google also published benchmark results in a misleading way. It compared a top score on the well-known language understanding benchmark MMLU using a more complex prompt method (CoT@32) with the standard benchmark method tested by OpenAI using GPT-4 (5-shot). Using the 5-shot prompt method with Gemini Ultra on MMLU, Google's largest model performs 2.7% worse than GPT-4.

Although Gemini achieved the best overall MMLU score with CoT@32, the way it presents this result is questionable. It shows, as does the fake real-time video, that Google has tried at all costs to portray Gemini as superior to GPT-4, rather than about equal, which is probably closer to the truth.

Gemini co-lead Oriol Vinyals addresses criticism of Google Deepmind's staged multimodal demo

Google took a fake-it-till-you-make-it approach to demonstrating Gemini's multimodal capabilities

Ordinary chatbot answers could be an asset in court, judge suggests

Gemini fake demo faces internal criticism

Perplexity's valuation soared to $18 billion after its latest funding round

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data

Anthropic appears to tighten the usage limits for Claude code

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Gemini co-lead Oriol Vinyals addresses criticism of Google Deepmind's staged multimodal demo

Google took a fake-it-till-you-make-it approach to demonstrating Gemini's multimodal capabilities

Gemini fake demo faces internal criticism

Share

Bank details