Google's Gemini 2.0 can hold real-time conversations over video

Dec 13, 2024

Google has released a new streaming API for its Gemini 2.0 multimodal model that enables real-time interactions through audio, video, and text. Developer Simon Willison demonstrated the technology in a one-minute iPhone video, showing a live conversation with Gemini about objects it could see through the camera. The API is now available in preview form for developers who want to test it, though some technical setup is required. The release comes as OpenAI introduced a similar capability for ChatGPT that lets the AI discuss smartphone video content in real-time.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Google's Gemini 2.0 can hold real-time conversations over video

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.