ByteDance's Seedance 1.0 is trading blows with Google's Veo 3

Jun 12, 2025

ByteDance

Key Points

ByteDance unveiled Seedance 1.0, a new AI model for video generation. According to the company's tests, Seedance 1.0 outperforms established systems such as Google's Veo, Kuaishou's Kling, and OpenAI's Sora on text-to-video and image-to-video tasks.
Seedance 1.0 was trained using a large set of cleaned, detailed, annotated video clips. The training process included several stages, such as reward training, which used human feedback to improve the implementation of movements, camera angles, and styles.
Seendance 1.0 produces complex videos with multiple scenes and camera changes faster than many competing products. However, it does not support audio output. ByteDance plans to use Seedance 1.0 on its platforms, such as Doubao and Jimeng, for professional and everyday applications.

ByteDance, the company behind TikTok, has introduced Seedance 1.0, a new AI video generation model.

According to ByteDance, Seedance 1.0 outperforms existing models in several areas, including how well it follows user prompts, the quality of motion, and image sharpness. On the benchmarking platform Artificial Analysis, Seedance 1.0 ranks first for both text-to-video and image-to-video tasks, beating out competitors like Google's Veo 3, Kling 2.0 from Kuaishou, and OpenAI's Sora.

Seedance 1.0 is designed to turn simple prompts into complex videos. The model can handle not just single scenes, but longer sequences with multiple camera angles and consistent characters throughout. ByteDance says that compared to other models, Seedance 1.0 is more likely to stick to the details in a prompt—whether that's specific movements, camera changes, or visual styles.

Large-scale data and extensive filtering

According to ByteDance, Seedance 1.0 was trained on a massive collection of video clips gathered from public and licensed sources. The clips went through several rounds of cleaning to remove features like logos, subtitles, or violent content. Both automated and manual annotation added detailed descriptions that covered movement, appearance, and style, giving the model a better foundation for handling complex prompts.

Seedance 1.0’s training process happened in several stages. First, the model learned from a broad set of image and video data, then it was adapted specifically for image-to-video tasks. Fine-tuning followed, using carefully selected clips, along with reward training where humans picked better outputs—such as videos with more natural movement or scenes that matched the prompt more closely. This feedback loop directly shaped the model's development.

Seedance 1.0 and speed

One of Seedance 1.0’s standout features is its speed at the quality level it delivers. Generating five seconds of Full HD video takes about 41 seconds, which ByteDance says is significantly faster than comparable models. However, with the launch of Veo 3 Fast from Google, this advantage may have negated this advantage. Seedance 1.0 does not currently support audio generation.

ByteDance plans to integrate Seedance 1.0 into its own platforms, such as Doubao and Jimeng. The model is aimed at both professional users and the general public, supporting use cases from marketing and content production to simple video editing via voice commands.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Seedance | ByteDance (Paper) | Artificial Analysis