Content
summary Summary

Alibaba has released Wan2.2, the latest version of its open-source video generation model. The smallest version can generate 720P videos on a single RTX 4090 GPU.

Ad

The company says Wan2.2 brings significant improvements in generation quality and model capabilities compared to Wan2.1. The model is available under the Apache 2.0 license and comes in three main versions: T2V-A14B for text-to-video, I2V-A14B for image-to-video, and TI2V-5B for combined text-and-image-to-video generation.

The A14B models generate 5-second videos at 720P and 16fps. For the TI2V-5B model, Alibaba specifies a special 720P resolution of 1280×704 or 704×1280 pixels.

MoE architecture boosts efficiency

The biggest change in Wan2.2 is the introduction of a Mixture-of-Experts (MoE) architecture in its video diffusion models. The A14B models use a two-expert design, totaling 27 billion parameters, but with only 14 billion active parameters per inference step.

Ad
Ad

The first expert focuses on the early denoising stages, where noise is high, and the overall layout is established. The second expert handles later stages to refine video details.

Alibaba says it has also significantly expanded the training dataset for Wan2.2, using 65.6 percent more images and 83.2 percent more videos than Wan2.1.

Compact 5B model for consumer hardware

Alongside the 27B MoE models, Alibaba has developed a more compact 5B model called TI2V-5B. This version can generate 5-second 720P videos in under 9 minutes on a single consumer GPU like the RTX 4090, making it the fastest model to reach this quality on that hardware.

TI2V-5B supports both text-to-video and image-to-video generation in a unified framework, producing 720P videos at 24fps. For the larger A14B models, Alibaba recommends at least 80GB of VRAM for single-GPU inference.

Integration and availability

The models are available through Hugging Face and ModelScope. Wan2.2 is already integrated with ComfyUI and Diffusers.

Recommendation

A Hugging Face Space is available for direct use of the TI2V-5B model.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Alibaba has released Wan2.2, the latest version of its open-source video generation model, which can produce 720P videos on a single RTX 4090 GPU and is available under the Apache 2.0 license in three main versions for different types of video generation.
  • The new model introduces a Mixture-of-Experts (MoE) architecture that uses two specialized components to improve efficiency and quality, with the A14B models totaling 27 billion parameters but only 14 billion active per inference step.
  • Wan2.2 is accessible on Hugging Face and ModelScope, with the compact TI2V-5B version able to generate high-quality 5-second 720P videos quickly on consumer hardware, and integration already available for platforms like ComfyUI and Diffusers.
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.