Google adds image-to-video and Veo 3 Fast to the Gemini API

Update

Added Veo 3 Fast and image-to-video API integration

Update from July 31, 2025:

Google has rolled out the Veo 3 Fast version and image-to-video support through its API. According to Google, Veo 3 Fast is engineered for speed and cost efficiency, aimed at developers who need to iterate quickly or generate content at scale, such as for programmatic advertising or rapid A/B testing. Google says Veo 3 Fast still delivers "high quality."

Both Veo 3 and Veo 3 Fast accept text and image prompts, generate videos at 720p and 24 fps, and output eight-second clips by default, with one video per request. They share the same technical specs, including a maximum of 1,024 tokens per text input and native audio generation.

Veo 3 Fast is priced at $0.40 per second of video with audio, while standard Veo 3 costs $0.75 per second—an 87.5 percent difference. An eight-second video clip runs $3.20 with Veo 3 Fast or $6.00 with Veo 3, and a five-minute video costs $120 with Veo 3 Fast compared to $225 with Veo 3. Google doesn't specify exactly how the two models differ in output quality.

Image-to-Video Feature

A new image-to-video feature is now available in both Veo 3 and Veo 3 Fast via the API. Users can combine a single image with a text prompt to generate dynamic videos with audio. According to Google, this feature helps maintain stylistic consistency and allows for more precise control over movement, narrative structure, and sound through the prompt.

Integration happens through the same Gemini API as before. Google says videos generated from images are billed at the same rate as text-to-video outputs for each model. These new features are available now in a paid preview through its API. Developers can use the API documentation and the Veo Cookbook to build their own applications.

Article from July 17, 2025:

Google's Veo 3 video generation model launches on Gemini API with a hefty price tag

Google’s Veo 3 video generation model is now available through the Gemini API, with a price point that puts it among the more expensive options for AI video.

The Gemini API integration targets developers looking to bring advanced video generation into their own apps or build production-ready prototypes. For now, the API is limited to text-to-video, but image-to-video support—already live in the Gemini app—is on the way. Veo 3 is Google’s first model that can generate high-resolution video and synchronized audio from a single text prompt. It creates visuals, dialog, music, and sound effects all at once.

Recommendation

AI in practice

OpenAI launches GPT-4.1: New model family to improve agents, long contexts and coding

To help developers get started, Google AI Studio offers an SDK template and a starter app for quick prototyping. Access requires an active Google Cloud project with billing enabled. Google says Veo 3 has already been used millions of times across the Gemini app, Flow, and Vertex AI.

$0.75 per second for video with audio

Veo 3 access through the Gemini API is only available on Google Cloud’s paid tier. Pricing is $0.75 per second for 720p, 24fps video with audio in 16:9 format—25 cents more than Veo 2, which did not include sound. Google has also announced a "Veo 3 Fast" mode that’s both faster and cheaper, but it’s not yet available for the API.

At current rates, an eight-second video costs $6, and a five-minute video costs $225. Because generating the perfect result often takes multiple tries, costs can rise quickly. For example, if you need ten times as much footage to end up with five minutes of usable video, the total cost could reach $2,250. Still, Google is likely betting that for some use cases, this might be cheaper than traditional video production.

Real-world examples

Google says Cartwheel uses Veo 3 to turn 2D videos into realistic 3D character animations, mapping the generated movements onto rigged models for client projects.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Game studio Volley uses Veo 3 to create cutscenes for its role-playing game "Wit's End", allowing developers to quickly experiment with new story ideas and visuals. So far, these examples point to fairly specialized use cases, which could suggest that Google doesn't have larger integrations to highlight yet. It's also possible that some companies are using Veo 3 behind the scenes but aren't ready to go public.

Google adds image-to-video and Veo 3 Fast to the Gemini API

Image-to-Video Feature

Google's Veo 3 video generation model launches on Gemini API with a hefty price tag

OpenAI launches GPT-4.1: New model family to improve agents, long contexts and coding

$0.75 per second for video with audio

Real-world examples

Google reaches now billions with AI products for search, Gemini and Veo 3

Google launches image-to-video feature for Veo 3 in Gemini

Google launches Veo 3 Fast worldwide, letting Gemini Pro users generate videos up to 720p

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

OpenAI’s math breakthrough might also mean AI is getting better at knowing its own limits

Google adds image-to-video and Veo 3 Fast to the Gemini API

Image-to-Video Feature

Google's Veo 3 video generation model launches on Gemini API with a hefty price tag

$0.75 per second for video with audio

Real-world examples

Share

Bank details