Text to coherent HD video: Google merges Phenaki and Imagen Video

Recently, Google introduced two text-to-video models, which it is now combining in a new approach. The result is high-resolution, longer videos generated purely from text.

First, Google showed off Imagen Video, a text-to-video system based on the Imagen image AI that can produce short clips based on text input. The language understanding of a large language model (T5-XXL) is fundamental to the architecture. Imagen Video was trained simultaneously with images and videos.

At the same time, another Google team demonstrated Phenaki, a text-to-video AI that is also trained with videos and images. It can generate minute-long videos based on long text. The Phenaki team uses a transformer architecture with time-dependent causal attention that can string events together along a temporal sequence described in a series of sequential prompts.

Google merges Phenaki and Imagen Video to create long HD videos out of text

The Google research team had already hinted at the possibility of a merger with Imagen Video during the presentation of Phenaki. This has now happened and Google is presenting the result as part of a presentation of current AI projects.

First, Phenaki generates a coherent video based on sequential prompts. Imagen Video then takes the output from Phenaki (prompt and video) and upscales it. Compared to other super-resolution systems, a particular strength of Imagen Video is its ability to incorporate text into the super-resolution module, Google writes.

Alonso Martinez, a lead AI researcher at Google who is involved in the development of Phenaki, believes that at current rates of progress, the technology could be used to produce a major television show in as little as two years.

AI short film created with text-to-video. https://t.co/DNEsTga216 https://t.co/7D5myWmYvS

At the rate of advance in this area of research, we are likely ~2 years out from seeing a major tv show fully created using similar techniques.

prompt in comments!#generativeart #AIart pic.twitter.com/dNVX14bC2N

— alonso martinez (@alonsorobots) November 2, 2022

The technology is still in its infancy, according to Google. You can watch a presentation of the combination of Phenaki and Imagen Video in the following video starting at minute 28:25.

Imagen comes to Googles AI kitchen

Google's first text-to-image systems are expected to be available soon in the AI Kitchen test app (Android / iOS). With the image AI Imagen, Google recently presented what is probably the most powerful model of this kind, but has not published it so far, primarily for ethical reasons.

The rollout in the test kitchen app could indicate a change in strategy here, which would make sense from an economic perspective considering the successes of DALL-E 2, Midjourney and Stable Diffusion in an emerging market.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI in practice

Text to coherent HD video: Google merges Phenaki and Imagen Video

Google merges Phenaki and Imagen Video to create long HD videos out of text

Imagen comes to Googles AI kitchen

Anthropic study reveals how malicious examples can bypass LLM safety measures at scale

Midjourney launches powerful AI image editing tool - struggles with regulation

AI search company Perplexity eyes major funding boost

Spirit LM: Meta's AI division paves the way for its own Advanced Voice Mode

Apple's local AI agent framework paves the way for more useful Apple Intelligence

Apple AI researchers question OpenAI's claims about o1's reasoning capabilities

Tesla unveils Cybercab robot taxi, but robot Optimus is the bigger deal

Text to coherent HD video: Google merges Phenaki and Imagen Video

Google merges Phenaki and Imagen Video to create long HD videos out of text

Imagen comes to Googles AI kitchen

Share

Bank details