AI research

StableDrag's simple point-and-click image editing makes turning Mona Lisa's head easy

Jonathan Kemper

Many AI image generators already provide a powerful tool for modifying image content with text, called inpainting. Point-based editing makes adjustments even easier.

Researchers from Nanjing University and Tencent have developed a new AI-based image editing method called StableDrag that allows elements to be easily moved to new positions while maintaining the correct perspective, according to their paper.

The method builds on recent advances in AI image editing like FreeDrag, DragDiffusion, and Drag-GAN, and delivers significantly better results in benchmarks.

An iexample is changing the viewing direction of the "Mona Lisa" by moving her nose a little to the right. The input image with source (red) and destination (blue) is shown on the left, the result of DragDiffusion in the middle and StableDrag-Diff on the right.

Image: Cui et al.

The tool works well on photos, illustrations, and other AI-generated images, with human faces and subjects like cars, landscapes, and animals.

Image: Cui et al.
Image: Cui et al.
Image: Cui et al. | Image: Cui et al.
Image: Cui et al. | Image: Cui et al.

The key innovations are a point tracking method to precisely localize updated target points and a confidence-based strategy to maintain high image quality at each step, the researchers explain. The confidence value evaluates the editing quality and reverts to original image features if it drops too low, preserving the source material without limiting editing options.

Image: Cui et al. | Image: Cui et al.
Image: Cui et al. | Image: Cui et al.

While AI image generation from text has rapidly advanced, enabling highly realistic photos, image manipulation is still catching up in comparison. Some AI models offer inpainting to alter selected areas with text input, but StableDrag's point-based editing promises more precision. The researchers say they will open source the code soon.

Apple is taking a different manipulation approach with MGIE, which uses text prompts to add, remove or change objects without selecting specific regions.

Sources: