Content
summary Summary

Alibaba has updated its Qwen image model with new editing tools for both visual and semantic changes. 

Ad

Qwen-Image-Edit is built on Alibaba's 20-billion-parameter Qwen-Image model and combines two processing strategies: Qwen2.5-VL handles semantic control, while a Variational Autoencoder (VAE) manages the visual appearance. Alibaba hasn't shared detailed technical information about the architecture yet.

According to Alibaba, the system can handle everything from simple touch-ups to complex semantic edits. Appearance editing lets users change specific areas while keeping the rest of the image untouched. Semantic editing makes it possible to modify pixels across the entire image, but the main subject stays consistent.

Video: Alibaba

Ad
Ad

Two editing modes for different workflows

For semantic editing, Alibaba demonstrates how the model can create new IP content featuring its Capybara mascot. Even when most pixels change, the character remains recognizable.

Eight illustrations of the Qwen Capybara mascot in various roles: as a painter with an easel, a chef with vegetables, a guitarist, a magician in a tailcoat, a basketball player, a gardener with a watering can, an astronaut in a space suit, and a ballerina in a tutu.
Qwen Image Edit generates new versions of the Capybara mascot that can be used as stickers in messenger apps and other formats. | Image: Alibaba

Other use cases include generating new perspectives with 90- or 180-degree object rotations and using style transfer for avatar creation, such as converting portraits into Studio Ghibli-style images.

Eight images in four pairs: toddler facing forward and in profile, golden dog facing forward and from the side, black raven facing forward and from behind on a branch, lion in profile and from behind on a rock.
The model generates new viewpoints for people, animals, and objects. | Image: Alibaba

Qwen Image Edit can also add signs with realistic reflections, remove stray hairs, change letter colors, and edit backgrounds or clothing.

Two images of a group of penguins on a coast: on the left, the original scene; on the right, the same scene with an orange wooden sign added.
Qwen Image Edit places a wooden sign reading "Welcome to Penguin Beach" in front of a penguin colony and generates natural shadows. | Image: Alibaba

Bilingual text editing with step-by-step correction

One of Qwen Image Edit's main strengths is its ability to edit text in both Chinese and English. The system can add, remove, or change text directly in images while preserving the original font, size, and style.

Three images of Scrabble tiles on white paper: Left
Qwen Image Edit updates Scrabble tiles from "Health Insurance" to "Financial Planning," maintaining the original look. | Image: Alibaba

Users can draw bounding boxes around incorrect or unwanted text. The model then updates those marked areas. While it sometimes struggles with rare or unusual characters like "稽," users can make step-by-step edits, marking specific spots and having the model refine the results until they are satisfied.

Recommendation
Two Chinese calligraphy texts on yellowish paper side by side, with the image on the right showing corrected characters compared to the original on the left.
The tool replaces incorrect characters and lets users directly mark the areas that need changes. | Image: Alibaba

Alibaba says Qwen Image Edit delivers state-of-the-art performance on public image editing benchmarks, though it hasn't shared specific numbers. The model is available through Qwen Chat's "Image Editing" feature and can also be found on Github, Hugging Face, and Modelscope.

Qwen Image Edit reflects just how quickly targeted image editing and text rendering are advancing. Until recently, it was difficult for AI to change only specific parts of an image without disrupting everything else.

Black Forest Labs has also entered the space with Flux.1 Context, a model that combines text-to-image generation and image editing. But Flux.1 Context still shows visible artifacts in longer editing chains and sometimes has trouble handling prompts accurately.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Alibaba is expanding its Qwen Image model with new editing capabilities designed to manipulate text in both Chinese and English images.
  • Qwen Image Edit combines semantic and visual controls, offering two modes: precise adjustments to specific areas and broader, consistent edits such as style changes, viewpoint shifts, and the addition or removal of objects and text.
  • The model can be accessed through Qwen Chat, Github, Hugging Face, and Modelscope, and Alibaba reports that it delivers state-of-the-art results for image editing tasks.
Sources
Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.