Content
summary Summary

Chinese tech giant Alibaba has introduced Qwen VLo, a multimodal AI model designed to analyze, generate, and edit images.

Ad

According to Alibaba, Qwen VLo uses a progressive generation approach, building images step by step from left to right and top to bottom while continuously refining its output. This method allows for more control over results, especially with longer text outputs. The company has not disclosed technical details, but Qwen VLo likely relies on an autoregressive method similar to what GPT-4o uses, rather than a diffusion-based approach.

Image editing with natural language

Qwen VLo can interpret complex editing instructions in natural language, letting users swap backgrounds, insert new objects, change visual styles, or even blend multiple images into one.

Portrait of a Shiba Inu with golden brown fur and a black collar in front of a neutral background.
With a series of editing steps, the prompt "Generate a cute shiba inu" leads to… | Image: Alibaba
3D Shiba Inu avatar with glasses and red QwenVLo cap in glass ball on meadow
…a cartoon dog wearing a cap and headset inside a glass ball. | Image: Alibaba

The system supports both artistic and technical image modifications. For example, it can generate segmentation maps, perform edge detection, or create depth maps with colored overlays on demand.

Ad
Ad
Glass dome on wooden table with bright pink dog sculpture and hand writing in a notebook with a pen.
Qwen VLo can identify image segments and estimate depth maps. | Image: Alibaba

Qwen VLo handles images with variable resolutions and aspect ratios, supporting extreme formats like 4:1 or 1:3, though this feature is not yet active. The model also works in multiple languages, including Chinese and English.

Early preview with limitations

Qwen VLo is currently available in preview through Qwen Chat, Alibaba's web interface. The company notes that the model still struggles with generation errors, inconsistencies with source images, and following detailed instructions. Alibaba says it plans to keep improving the model's reliability and stability.

Until now, Alibaba has been a reliable source of competitive AI language models - for example, it released Qwen3 and its model weights in April - making the company an important contributor to open AI research. It's not clear why Qwen VLo hasn't been released with model weights or whether this signals a broader shift in Alibaba's approach to open publishing.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Alibaba has released Qwen VLo, a multimodal AI model that can analyze, generate, and edit images, which is now available as a preview version through a web interface.
  • The image generation method builds images step by step, offering more control for complex cases, such as longer text prompts or targeted image editing.
  • Qwen VLo can follow detailed editing instructions in everyday language, combine several images, change backgrounds, add new elements, or adjust the visual style.
Sources
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.