Alibaba launches Wan2.2, its improved open-source video generation model

Alibaba has released Wan2.2, the latest version of its open-source video generation model. The smallest version can generate 720P videos on a single RTX 4090 GPU.

The company says Wan2.2 brings significant improvements in generation quality and model capabilities compared to Wan2.1. The model is available under the Apache 2.0 license and comes in three main versions: T2V-A14B for text-to-video, I2V-A14B for image-to-video, and TI2V-5B for combined text-and-image-to-video generation.

The A14B models generate 5-second videos at 720P and 16fps. For the TI2V-5B model, Alibaba specifies a special 720P resolution of 1280×704 or 704×1280 pixels.

MoE architecture boosts efficiency

The biggest change in Wan2.2 is the introduction of a Mixture-of-Experts (MoE) architecture in its video diffusion models. The A14B models use a two-expert design, totaling 27 billion parameters, but with only 14 billion active parameters per inference step.

The first expert focuses on the early denoising stages, where noise is high, and the overall layout is established. The second expert handles later stages to refine video details.

Alibaba says it has also significantly expanded the training dataset for Wan2.2, using 65.6 percent more images and 83.2 percent more videos than Wan2.1.

Compact 5B model for consumer hardware

Alongside the 27B MoE models, Alibaba has developed a more compact 5B model called TI2V-5B. This version can generate 5-second 720P videos in under 9 minutes on a single consumer GPU like the RTX 4090, making it the fastest model to reach this quality on that hardware.

TI2V-5B supports both text-to-video and image-to-video generation in a unified framework, producing 720P videos at 24fps. For the larger A14B models, Alibaba recommends at least 80GB of VRAM for single-GPU inference.

Integration and availability

The models are available through Hugging Face and ModelScope. Wan2.2 is already integrated with ComfyUI and Diffusers.

Recommendation

AI in practice

OpenAI unveils o3, its most advanced reasoning model yet

A Hugging Face Space is available for direct use of the TI2V-5B model.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Alibaba launches Wan2.2, its improved open-source video generation model

MoE architecture boosts efficiency

Compact 5B model for consumer hardware

Integration and availability

OpenAI unveils o3, its most advanced reasoning model yet

Microsoft pushes back on report claiming it cut AI sales targets

Google's Workspace Studio puts Gemini 3 agents to work automating tasks

Anthropic brings Bun in-house, the runtime powering Claude Code’s $1B ARR

The ARC benchmark's fall marks another casualty of relentless AI optimization

DeepseekMath-V2 is Deepseek's latest attempt to pop the US AI bubble

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

Alibaba launches Wan2.2, its improved open-source video generation model

MoE architecture boosts efficiency

Compact 5B model for consumer hardware

Integration and availability

Share

Bank details