Netflix has open-sourced an AI framework that can remove objects from videos and automatically adjust the physical effects those objects had on the rest of the scene. The system is called VOID, short for "Video Object and Interaction Deletion." What makes it special is that beyond erasing objects from a scene, it also handles the downstream physical effects, like collisions, that the removed object originally caused.
VOID is built on top of Alibaba's CogVideoX video diffusion model, fine-tuned with synthetic data from Google's Kubric and Adobe's HUMOTO for interaction detection. Google's Gemini 3 Pro analyzes the scene and identifies affected areas, while Meta's SAM2 handles segmenting the objects that need to be removed. An optional second pass uses optical flow to correct any shape distortions.
The project was developed by Netflix researchers in collaboration with INSAIT Sofia University. Code, paper, and demo are available on GitHub, arXiv, and Hugging Face. The system ships under the Apache 2.0 license, which means it can be used commercially.