Imagen Video: Google unveils high-definition text-to-video AI

Google's latest generative AI, Imagen Video, creates short videos based on text input. It achieves a new quality standard in resolution and frame rate.

Shortly after Meta with Make-a-Video, Google introduces its own text-to-video system: It draws on the diffusion technique and other insights from the image AI Imagen, the most powerful image AI publicly presented to date, but uses a more complex setup based on a "cascade of video diffusion models," according to Google's research team.

"A bunch of autumn leaves falling on a calm lake to form the text 'Imagen Video'. Smooth. | Video: Google

Compared to the Imagen image model, the Imagen video model was extended to the temporal domain and trained with images and videos at the same time. The strengths of the Imagen image model have been retained, according to Google. For text processing, Google relied on a large, pre-trained Transformer language model (T5-XXL), as it did for the Imagen image model.

Imagen Video achieves HD resolution and smooth frame rates

Imagen Video's outstanding feature is its high resolution combined with a relatively high frame rate: the system achieves HD standard with 1280 x 768 pixels at 24 frames per second. Meta generated videos using Make-a-Video for initial testing with a maximum resolution of 768 x 768 videos at a lower frame rate.

Imagen Video, like generative imaging systems, masters various artistic styles - from pixel art to Van Gogh - has an understanding of 3D objects and can spell words correctly, according to Google. The latter feature was previously only possible with Imagen image.

Imagen Video masters various artistic styles, understands 3D structures and text. | Image: Google

The system also offers a "high degree of controllability and world knowledge" and can generate videos very accurately to the text command, the research team writes - and demonstrates.

Prompt: "Sprouts in the shape of text 'Imagen Video' coming out of a fairytale book."
Model Output: pic.twitter.com/FVgnM0UAAn

- Durk Kingma (@dpkingma) October 5, 2022

As with Meta's make-a-video, however, the length of the videos is limited: the output is currently around five seconds maximum. So Imagen Video tends to generate longer animations rather than videos - but a solution to this is also on the horizon.

Recommendation

AI research

Study reveals AI models have hidden capabilities they can't access through normal prompts

Long AI videos: Phenaki also comes from Google

The fact that AI can also generate long videos was demonstrated just last week by the text-to-video AI system Phenaki, which can generate entire story scenes from prompts that build on each other. In a demo, the Phenaki research team showed an approximately two-minute-long video that was generated along a small script.

At the time of Phenaki's release, it wasn't clear who was behind the paper for review reasons. Now, Imagen video lead author Jonathan Ho reveals on Twitter that Phenaki is also a Google project.

The next step, according to Ho, is to combine the image quality of Imagen Video and the coherence and video length of Phenaki.

Imagen Video will not be released for now

As with Imagen for images, Google is not releasing the model at this time. The reason is the same: Imagen Video was trained with partly "problematic data".

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

While internal testing had successfully filtered out much explicit and violent content, social biases and stereotypes were still replicated. It is a challenge to recognize and filter these, the research team writes.

With a similar reasoning, Meta also refrained from releasing Make-a-Video for the time being, but at least held out the prospect of a demo in the near future.

Imagen Video: Google unveils high-definition text-to-video AI

Imagen Video achieves HD resolution and smooth frame rates

Study reveals AI models have hidden capabilities they can't access through normal prompts

Long AI videos: Phenaki also comes from Google

Imagen Video will not be released for now

Trump advisors are pushing a regulation targeting what they call "woke" AI models in the tech sector

Anthropic appears to tighten the usage limits for Claude code

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Imagen Video: Google unveils high-definition text-to-video AI

Imagen Video achieves HD resolution and smooth frame rates

Long AI videos: Phenaki also comes from Google

Imagen Video will not be released for now

Share

Bank details