Content
summary Summary

Tencent announced HunyuanVideo, a new open source AI model for video generation that aims to match the capabilities of existing commercial solutions. With more than 13 billion parameters, Tencent says it's the largest publicly available model of its kind.

Ad

According to technical documentation, HunyuanVideo performs better than current systems like Runway Gen-3 and Luma 1.6, as well as three major Chinese video generation models. The system shows particularly strong results in motion quality testing.

Video: Tencent

The model can handle multiple tasks, including generating videos from text descriptions, converting still images into videos, creating animated avatars, and producing audio for video content.

Ad
Ad

Tencent's engineers developed a multi-stage training process for HunyuanVideo. The model starts with low-resolution image training at 256 pixels, then moves to mixed-scale training at higher resolutions.

The final stage involves progressive video and image training, where both resolution and video length increase gradually. The development team reports this approach leads to better convergence and higher quality video output.

HunyuanVideo is open source

By releasing HunyuanVideo as open source, Tencent aims to reduce the gap between proprietary and open systems. The company has published the code on GitHub and plans ongoing development with new features.

The release puts Tencent in direct competition with established players like Runway and OpenAI's Sora project, as well as several other Chinese companies developing video models, including KLING.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Tencent launches HunyuanVideo, an open-source AI video generation model with 13 billion parameters that, according to technical documentation, outperforms existing systems such as Runway Gen-3 and Luma 1.6 in terms of motion quality.
  • The model undergoes multi-stage training, starting with low-resolution images (256 pixels), followed by mixed-scale training, and progressive video and image training with increasing resolution and video length.
  • By releasing it as open source on GitHub, Tencent aims to bridge the gap between closed and open systems. The system can generate video from text, convert images to video and create avatar animations.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.