Alibaba has launched Wan2.5-Preview, a new video model capable of generating short clips with synchronized audio.
The system combines text, images, video, and audio in a single architecture, putting it in the same category as Google's Veo 3. Details about how Wan2.5-Preview works are sparse. Alibaba mentions that reinforcement learning with human feedback was used and calls the model "a solid step [...] on our journey towards a 'World Model'". There's no technical report or transparency about training data.
Wan2.5-Preview generates 10-second, 1080p videos with audio tracks that can include multiple voices, background music, and sound effects. In a demo video posted on X, Alibaba strings together several clips to show off its audio generation. At first glance, the audio and visuals seem to match, but a closer look reveals that drumming and music often fall out of sync, and the model struggles to maintain consistent faces.
Video: Alibaba
The system takes text, images, or audio as input. Users can, for example, upload a photo and use a text prompt to make a video with matching music. Alibaba advertises "cinematic aesthetics" and a "cinematographic control system."
Wan2.5-Preview also offers image generation and editing at wan.video. The tool can produce photorealistic images, various art styles, and diagrams. Image editing works via voice commands, such as changing colors or combining different concepts.

Access and Pricing
Wan2.5-Preview is not open source, unlike earlier Alibaba models. Alibaba has not responded to requests for a code release, and there are no signs this will change.
The service is available on wan.video with monthly subscriptions starting at $6.50, or with pay-as-you-go credits. Depending on the plan, each clip costs between 13 and 25 cents. API pricing is between 5 and 15 cents per second, which is well below Veo 3's API cost of 15 to 40 cents per second.
Alibaba's previous model, Wan2.2, was open source under the Apache 2.0 license and could generate 720p videos on consumer GPUs like the RTX 4090.