Content
summary Summary
Update
  • Added video demo and statement from Gemini co-lead Oriol Vinyals

Update from December 9, 2023:

Ad

Gemini co-lead Oriol Vinyals addressed criticism of Google's staged Gemini hands-on demo on X, stating that "all the user prompts and outputs in the video are real, shortened for brevity."

The staged video people criticized supposedly "illustrates what the multimodal user experiences built with Gemini could look like," and it was made "to inspire developers," according to Vinyals.

He even took the time to demo the developer environment, generating AI output with a combination of images and prompts similar to what Google showed in the video.

Ad
Ad

Video: Oriol Vinyals via X

It's not real-time video analytics combined with speech, as Google showed in the video below. But it does show that the underlying capabilities needed for such a use case are part of Gemini Pro and Ultra - which isn't surprising since we already know such capabilities from GPT-4 vision.

Original article from December 8, 2023:

Google took a fake-it-till-you-make-it approach to demonstrating Gemini's multimodal capabilities

A staged demo video leaves developers and employees in doubt about the true capabilities of Google's new Gemini language model.

In the video, titled "Hands-on with Gemini: Interacting with multimodal AI," Google shows off the AI model's impressive voice interaction and real-time visual response capabilities.

Recommendation

After the demonstration, however, it turned out that the voice interaction did not exist and the demonstration was not in real time. Instead, Google used still images from the video with specific text prompts to get the results. In the video description Google states: "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity."

According to Bloomberg, Google admits that the actual demonstration involved the use of still images from the video and text prompts, rather than Gemini predicting or responding to changes in real time. You can check out a making-of of the video on Google's developer blog.

Gemini fake demo faces internal criticism

According to sources from Bloomberg and The Information, Google employees have expressed concern and criticism internally about the demo video. One Google employee stated that the video painted an unrealistic picture of how easy it is to achieve impressive results with Gemini.

The staged demo also became the subject of memes and jokes within the company, with employees sharing images and comments poking fun at the discrepancies between the video and the actual AI system.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Despite the controversy surrounding the demo video, Google insists that all user input and output shown in the video is real, even if the video suggests a real-time implementation that does not yet exist.

Eli Collins, vice president of products at Google DeepMind, told Bloomberg that the duck-drawing demo is still in the research stage and not yet part of Google's products.

"It’s a new era for us," Collins told Bloomberg. "We’re breaking ground from a research perspective. This is V1. It’s just the beginning."

Google also published benchmark results in a misleading way. It compared a top score on the well-known language understanding benchmark MMLU using a more complex prompt method (CoT@32) with the standard benchmark method tested by OpenAI using GPT-4 (5-shot). Using the 5-shot prompt method with Gemini Ultra on MMLU, Google's largest model performs 2.7% worse than GPT-4.

Although Gemini achieved the best overall MMLU score with CoT@32, the way it presents this result is questionable. It shows, as does the fake real-time video, that Google has tried at all costs to portray Gemini as superior to GPT-4, rather than about equal, which is probably closer to the truth.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A demo video of Google's new Gemini language model gives the impression of impressive voice interactions and real-time visual response capabilities but turns out to be staged.
  • Google admits that the video uses still images and targeted text prompts instead of real-time interactions, which has led to internal criticism and concern among employees.
  • Despite the controversy, Google insists that the user input and output shown is real, though not based on speech and not in real-time. The technology is still in the research stage.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.