Google has released a new streaming API for its Gemini 2.0 multimodal model that enables real-time interactions through audio, video, and text. Developer Simon Willison demonstrated the technology in a one-minute iPhone video, showing a live conversation with Gemini about objects it could see through the camera. The API is now available in preview form for developers who want to test it, though some technical setup is required. The release comes as OpenAI introduced a similar capability for ChatGPT that lets the AI discuss smartphone video content in real-time.
Ad