Kling's Video O1 launches as the first all-in-one video model for generation and editing

Dec 1, 2025

Kling AI

Chinese AI company Kuaishou has introduced "Video O1." According to the company, this is the "world's first unified multimodal video model," a system designed to handle both video generation and editing tasks within a single framework.

According to Kuaishou, Video O1 integrates several tasks that previously required separate tools. The model can generate videos ranging from three to 10 seconds using prompts or reference images, but it also edits existing footage, like swapping protagonists, changing the weather, or adjusting styles and colors. Video O1 handles these requests in a single prompt, letting users add a subject, modify the background, and change the visual style simultaneously.

Processing multiple inputs simultaneously

The model processes different types of input at the same time, interpreting up to seven images, videos, subjects, and text strings as prompts. Users can edit videos with text commands like "remove passersby" or "change daylight to twilight" without needing manual masking or keyframes.

Users can upload characters, props, or scenes, which the system then uses in different contexts. Actions or camera movements can also serve as references. Kling says the system understands the input data well enough to keep subjects, people, or products consistent across different shots.

Video O1 relies on a multimodal transformer architecture, though the company hasn't shared many details. Kling introduced a "Multimodal Visual Language" (MVL) to act as an interactive bridge between text and multimodal signals. The model uses reasoning chains to deduce events, enabling intelligent video generation that moves beyond simple pattern reconstruction, echoing the kind of language Google used to describe its own recent advancements with Nano Banana Pro.

Internal tests show performance gains over competitors

Kuaishou tested Video O1 internally against Google Veo 3.1 and Runway Aleph. In tasks involving video creation from image references, Video O1 reportedly performed far better than Google's "ingredients to video" feature. For video transformations—editing existing videos—evaluators preferred O1 over Runway Aleph in 230 percent of cases. However, these figures come from Kuaishou's own internal tests and haven't been verified externally.

Two bar charts show benchmark results from Kling VIDEO O1: Left Image Reference Task with 62% wins, 32% ties, 6% losses against Google Veo 3.1 (win/loss ratio 247%). Right: Instruction transformation task with 61% wins, 29% ties, 10% losses against Runway Aleph (win/loss ratio 230%). — According to Kuaishou's internal benchmarks, Video O1 outperforms Google Veo 3.1 in image reference tasks and beats Runway Aleph in video transformations. | Image: Kling

O1 is available now via Kling's web interface. While the Chinese company may have taken a step forward with O1, the market remains highly competitive. At almost the same time, Runway unveiled Gen-4.5, its most powerful video model to date. Alongside Western companies like Google, OpenAI, and Midjourney, Kling competes with Chinese rivals such as Hailuo, Seedance, and Vidu, which focus primarily on cost efficiency.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Kling's Video O1 launches as the first all-in-one video model for generation and editing

Processing multiple inputs simultaneously

Internal tests show performance gains over competitors

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.