Bytedance shows impressive progress in AI video with Seedance 2.0
Key Points
- Bytedance has released Seedance 2.0, a multimodal AI video generation model that processes images, videos, audio, and text simultaneously to create short videos with automatic sound effects.
- The standout feature is its reference capability: the model can adopt camera work, movements, and effects from uploaded reference videos, then replace characters or extend clips based on that input.
- The release pushed up share prices for Chinese media and AI companies, coming shortly after competitor Kuaishou unveiled its Kling 3.0 model, which also takes a multimodal approach.
Bytedance has released Seedance 2.0 to a limited group of users. The previous model was already one of the most capable AI video generators available. The new version pushes things even further.
The multimodal video generation model handles up to four types of input at once: images, videos, audio, and text. Users can combine up to nine images, three videos, and three audio files, up to a total of twelve files. Generated videos run between 4 and 15 seconds long and automatically come with sound effects or music.
The demo videos come straight from ByteDance and were almost certainly cherry-picked from a larger batch of generated clips. Nobody knows yet how consistently the model hits this quality bar in real-world use, what it costs, or how long generation takes. So what we're seeing is likely a best-case scenario—and even when these capabilities look impressive on paper, there are still significant hurdles to getting them into professional workflows, like consistency. Still, the quality on display is genuinely impressive.
Prompt: The camera follows a man in black clothing who flees quickly. Behind him, a crowd of people pursues him. The camera switches to a sideways chase shot. The figure knocks over a roadside fruit stand in panic, picks himself up and runs on. The excited shouts of the crowd can be heard in the background.
Prompt: A girl elegantly hangs up laundry. Once she has finished, she takes the next item of clothing from the bucket and shakes it out vigorously.
According to ByteDance, the standout new feature is reference capability: the model can pick up camera work, movements, and special effects from uploaded reference videos, swap out characters, and seamlessly extend existing clips. Video editing tasks like replacing or adding characters work too.
Users write simple text commands like "Take @image1 as the first image of the scene. First person perspective. Take the camera movement from @Video1. The scene above is based on @Frame2, the scene on the left on @Frame3, the scene on the right on @Frame4."
The user records a camera movement ...
... which the AI model transfers into a generated video, along with other elements.
For compliance reasons, realistic human faces are currently blocked in uploaded materials. Seedance 2.0 is only available as a beta on the official Jimeng website at jimeng.jianying.com.
Prompt: The figure in the picture has a guilty expression on her face, her eyes look to the left and right, then she leans out of the picture frame. She quickly stretches her hand out of the frame, reaches for a Coke and takes a sip, then shows a satisfied expression on her face. At this moment, footsteps can be heard. The figure in the picture hastily puts the Coke back in its original place. A western cowboy comes along, takes the Coke from the cup and walks away. Finally, the camera moves forward, the background slowly fades to black, only a spotlight from above illuminates a can of Coke. An artfully designed subtitle with a narrator's voice appears at the bottom of the screen: "Yikou Cola - you have to try it!"
The release comes just days after competitor Kuaishou unveiled its Kling 3.0 model, which also supports multimodal input and output. The AI video race is heating up in China's stock market too: according to the South China Morning Post, the launch of these powerful video models pushed share prices of Chinese media and AI companies up by as much as 20 percent.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now