Netflix launches open‑source void‑model project on GitHub, expanding AI toolkit.

Netflix has opened a new AI project called VOID on GitHub, a video‑object and interaction deletion model that removes objects from footage along with their physical effects, built on CogVideoX and fine‑tuned for interaction‑aware video inpainting, reports indicate.

Key Facts

•Key company: Netflix

Netflix’s foray into open‑source AI isn’t just a vanity release; it’s a fully‑fledged research‑grade pipeline that lets developers erase objects from video while also wiping out the physical ripples those objects cause. The VOID (Video Object and Interaction Deletion) model, now live on GitHub, builds on Alibaba’s CogVideoX‑Fun‑V1.5‑5b‑InP checkpoint and adds a second “warped‑noise refinement” pass to tighten temporal consistency, according to the project’s README. In practice, the system can remove a guitarist from a clip and automatically make the instrument fall to the floor, rather than hovering in mid‑air—a step beyond traditional inpainting that only paints over static backgrounds.

The codebase is deliberately modular. Pass 1 handles the base inpainting using a transformer checkpoint that takes an interaction‑aware mask as input; Pass 2 refines the result with a warped‑noise model that smooths artifacts across frames. The repository ships with a Jupyter notebook that pulls the necessary weights from HuggingFace, sets up the environment, and runs a demo on a sample video, but it also provides scripts for custom pipelines. To generate the masks, the authors rely on Meta’s SAM2 segmentation model and Google’s Gemini API, meaning users need a GPU with at least 40 GB of VRAM (e.g., an A100) and an active Gemini API key, as outlined in the setup instructions.

Training data for VOID is generated through a hybrid pipeline that stitches together HUMOTO and Kubric synthetic‑scene tools, producing “quadmask” videos that encode four‑value masks for foreground, background, and interaction layers. The repository’s directory tree reflects this workflow, with separate folders for raw video, masks, and prompt JSON files that describe the desired background after deletion. The prompts are intentionally limited to background descriptions—developers are told not to mention the object being removed, a design choice that forces the model to infer the clean scene from context alone.

Beyond the technical heft, Netflix frames VOID as a democratizing effort: by open‑sourcing a model that can handle interaction‑aware video editing, the streaming giant hopes to lower the barrier for creators, advertisers, and researchers who need clean footage without costly manual rotoscoping. The project’s authors—Saman Motamed, William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, and Ta‑Ying Cheng—list their affiliations as Netflix and INSAIT/Sofia University, underscoring a cross‑institution collaboration that blends industry resources with academic rigor.

The release also hints at future directions for Netflix’s AI strategy. While the company has largely kept its recommendation and content‑generation models under wraps, VOID shows a willingness to contribute tooling that could eventually feed into its own production pipeline—think automated cleanup of user‑generated clips or dynamic ad insertion that respects scene continuity. For now, the model sits at the cutting edge of video inpainting, and its open‑source nature invites the community to push it further, whether by scaling to higher resolutions, reducing hardware requirements, or integrating it with other generative video frameworks.

Netflix launches open‑source void‑model project on GitHub, expanding AI toolkit.

Key Facts

Sources

🏢Companies in This Story

Related Stories