Content
summary Summary

Google researchers have developed ReCapture, a new AI technique that allows users to modify camera movements in videos after they've been recorded. The system aims to bring professional-grade video editing capabilities to casual users.

Ad

Changing camera angles in existing footage has traditionally been challenging. Current methods often struggle to maintain complex movements and details while processing different types of video content.

Rather than using an explicit 4D representation as an intermediate step, ReCapture taps into the motion knowledge stored in generative video models. The researchers reframed the task as video-to-video translation using Stable Video Diffusion.

Video: Zhang et al.

Ad
Ad

Two-step process combines temporal and spatial layers

ReCapture operates in two phases. First, it creates an "anchor video" – an initial version of the desired output with new camera movements. This preliminary version might contain some temporal inconsistencies and visual artifacts.

To generate the anchor video, the system can use diffusion models like CAT3D, which create videos from multiple angles. Alternatively, it can generate the anchor through frame-by-frame depth estimation and point cloud rendering.

Flowchart: Two-stage video synthesis architecture with anchor video generation and LoRA-based fine-tuning for motion control.
The ReCapture architecture combines spatial and temporal LoRA modules to improve video synthesis. The system uses anchor videos and masking for precise motion control and contextual image generation. | Image: Zhang et al.

In the second phase, ReCapture applies masked video fine-tuning. This step uses a generative video model trained on existing footage to create realistic movements and temporal changes.

The system incorporates a temporal LoRA (Low-Rank Adaptation) layer to optimize the model for the input video. This layer specifically handles temporal changes, allowing the model to understand and replicate the anchor video's specific dynamics without requiring full model retraining.

Picture gallery with six rows of video sequences: butterfly on flower, tiger, drinks being photographed, Pomeranian dog, swan in water and car-to-robot transformation.
ReCapture enables camera perspectives in existing videos to be changed retrospectively. The example sequences demonstrate these perspective changes for various motifs - from nature shots to technical scenes. | Image: Zhang et al.

A spatial LoRA layer ensures image details and content remain consistent with the new camera movements. The generative video model can perform zooming, panning, and tilting while maintaining the original video's characteristic movements.

Recommendation

The project website and research paper provide additional technical details, including post-processing techniques like SDEdit to enhance image quality and reduce blur.

Generative AI for video is still experimental

While the researchers see their work as progress toward user-friendly video manipulation, ReCapture remains a research project far from commercial release. Google hasn't yet brought any of its numerous video AI projects to market, though its Veo project might be close.

Meta also recently introduced its Movie-Gen model, but like Google, it's not commercializing it. And let's not get into Sora, OpenAI's video frontier model, which was unveiled earlier this year but hasn't been seen since. Currently, startups like Runway are leading the video AI market, having launched their latest Gen-3 Alpha model last summer.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google researchers have developed ReCapture, a method for retroactively adjusting camerawork in videos so that even non-professionals can perform professional post-production.
  • ReCapture works in two steps: creating an anchor video with the desired new camera work, which may still contain errors, and then using masked video fine-tuning with temporal and spatial LoRA layers to improve the video synthesis.
  • While the researchers see ReCapture as an important step towards user-friendly video manipulation, there is still a long way to go before it can be used commercially.
Sources
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.