Alibabas Wan2.5-Preview lets users turn photos and text prompts into videos with matching audio

Sep 25, 2025

GPT-4o prompted by THE DECODER

Alibaba has launched Wan2.5-Preview, a new video model capable of generating short clips with synchronized audio.

The system combines text, images, video, and audio in a single architecture, putting it in the same category as Google's Veo 3. Details about how Wan2.5-Preview works are sparse. Alibaba mentions that reinforcement learning with human feedback was used and calls the model "a solid step [...] on our journey towards a 'World Model'". There's no technical report or transparency about training data.

Wan2.5-Preview generates 10-second, 1080p videos with audio tracks that can include multiple voices, background music, and sound effects. In a demo video posted on X, Alibaba strings together several clips to show off its audio generation. At first glance, the audio and visuals seem to match, but a closer look reveals that drumming and music often fall out of sync, and the model struggles to maintain consistent faces.

Video: Alibaba

The system takes text, images, or audio as input. Users can, for example, upload a photo and use a text prompt to make a video with matching music. Alibaba advertises "cinematic aesthetics" and a "cinematographic control system."

Wan2.5-Preview also offers image generation and editing at wan.video. The tool can produce photorealistic images, various art styles, and diagrams. Image editing works via voice commands, such as changing colors or combining different concepts.

Screenshot of a video editor interface: drop-down menu with functions (text-to-video selected) and bar with format and duration settings. — The wan.video interface, including its drop-down menus, looks almost identical to OpenAI's Sora. | Image: Screenshot by THE DECODER

Access and Pricing

Wan2.5-Preview is not open source, unlike earlier Alibaba models. Alibaba has not responded to requests for a code release, and there are no signs this will change.

The service is available on wan.video with monthly subscriptions starting at $6.50, or with pay-as-you-go credits. Depending on the plan, each clip costs between 13 and 25 cents. API pricing is between 5 and 15 cents per second, which is well below Veo 3's API cost of 15 to 40 cents per second.

Alibaba's previous model, Wan2.2, was open source under the Apache 2.0 license and could generate 720p videos on consumer GPUs like the RTX 4090.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Alibabas Wan2.5-Preview lets users turn photos and text prompts into videos with matching audio

Access and Pricing

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.