CAT4D from Google Deepmind turns videos into simple 3D scenes

A new AI system from Google Deepmind can turn ordinary videos into dynamic 3D scenes. The team, which includes researchers from Columbia University and UC San Diego, calls their creation CAT4D.

The system uses a diffusion model to take a video shot from a single angle and generate views from multiple perspectives. It then builds these different viewpoints into a dynamic 3D scene. The end result? A video where you can look at the subject from many angles.

Video: Google Deepmind

Until now, capturing something like this required elaborate setups with multiple cameras recording the same scene simultaneously. CAT4D simplifies the process by working with regular video footage.

Training challenges and solutions

The team faced one problem: there wasn't much existing data to train their AI. To work around this, they got creative and mixed real-world footage with computer-generated content. The training data included multi-view images of static scenes, single-perspective videos, and synthetic 4D data.

The diffusion model learns to create images from specific angles at specific moments in time. According to the researchers, CAT4D produces higher quality results than similar systems, though it still struggles with generating videos longer than the original footage.

Technology like CAT4D could find its way into several industries, the researchers say. Game developers might use it to create virtual environments, while filmmakers and AR developers could incorporate it into their workflows.

Anyone interested in seeing more examples can check out the project's GitHub page.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

CAT4D from Google Deepmind turns videos into simple 3D scenes

Training challenges and solutions

DeepMind Vice President sees AI on the brink of a fundamental shift towards autonomous agents

BioCoder is a benchmark for AI-generated bioinformatics code

"Gemini": Google and Deepmind develop GPT-4 competition

The ARC benchmark's fall marks another casualty of relentless AI optimization

DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

CAT4D from Google Deepmind turns videos into simple 3D scenes

Training challenges and solutions

DeepMind Vice President sees AI on the brink of a fundamental shift towards autonomous agents

BioCoder is a benchmark for AI-generated bioinformatics code

"Gemini": Google and Deepmind develop GPT-4 competition