Google has released a new streaming API for its Gemini 2.0 multimodal model that enables real-time interactions through audio, video, and text. Developer Simon Willison demonstrated the technology in a one-minute iPhone video, showing a live conversation with Gemini about objects it could see through the camera. The API is now available in preview form for developers who want to test it, though some technical setup is required. The release comes as OpenAI introduced a similar capability for ChatGPT that lets the AI discuss smartphone video content in real-time.

Ad
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.