Deepmind: Transframer AI dreams 30-second video from an image

Aug 20, 2022

DALL-E 2 prompted by MIXED

Deepmind's new video AI, Transframer, can handle a whole range of image and video tasks - and dream up 30-second videos from a single frame.

Generative AI systems have moved from research labs to industrial and consumer applications in recent years, kicked off by OpenAI's large-scale language model GPT-3. Then last April, the company introduced the DALL-E 2 imaging system, which indirectly spawned alternatives such as Midjourney and Stable Diffusion.

Google sister Deepmind is now showing Transframer, an AI model that could offer a glimpse of the next generation of generative AI models.

Deepmind Transframer: A model with many tasks

Deepmind's Transframer is a visual prediction framework that can solve eight image modeling and processing tasks at once, such as depth estimation, instance segmentation, object recognition or video prediction.

Transframer uses a set of context images with associated annotations such as time stamps or camera viewpoints and processes the query for an image based on these.

Transframer provides a framework for multiple image tasks. | Image: Deepmind

The model processes compressed images using a U-net whose outputs are passed to a DCTransfromer decoder. Specifically, the images are compressed using DCT (discrete cosine transform); DCT is also used in the JPEG compression method. The DCTransformer is specialized on DCT tokens.

Transframer generates new angles and whole videos

In addition to traditional image tasks such as depth estimation and object detection, Transframer is also capable of synthesizing new viewpoints of an object and predicting video trajectories.

In a short tweet, Deepmind shows about six 30-second videos that Transframer dreamed up from a single input image. Despite the low resolution, some consistency can be seen.

Transframer is a general-purpose generative framework that can handle many image and video tasks in a probabilistic setting. New work shows it excels in video prediction and view synthesis, and can generate 30s videos from a single image: https://t.co/wX3nrrYEEa 1/ pic.twitter.com/gQk6f9nZyg

- DeepMind (@DeepMind) August 15, 2022

Deepmind says the results show that a framework such as Transframer is suitable for challenging image and video modeling tasks. Transframer can also act as a multitasker to solve image and video analysis problems that previously used specialized models, the researchers said.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

More than 16% discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Deepmind: Transframer AI dreams 30-second video from an image

Deepmind Transframer: A model with many tasks

Transframer generates new angles and whole videos

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.