StableDrag's simple point-and-click image editing makes turning Mona Lisa's head easy

Many AI image generators already provide a powerful tool for modifying image content with text, called inpainting. Point-based editing makes adjustments even easier.

Researchers from Nanjing University and Tencent have developed a new AI-based image editing method called StableDrag that allows elements to be easily moved to new positions while maintaining the correct perspective, according to their paper.

The method builds on recent advances in AI image editing like FreeDrag, DragDiffusion, and Drag-GAN, and delivers significantly better results in benchmarks.

An iexample is changing the viewing direction of the "Mona Lisa" by moving her nose a little to the right. The input image with source (red) and destination (blue) is shown on the left, the result of DragDiffusion in the middle and StableDrag-Diff on the right.

Example of a Mona Lisa whose head is turned by AI image processing until she is looking head-on into the camera — Image: Cui et al.

The tool works well on photos, illustrations, and other AI-generated images, with human faces and subjects like cars, landscapes, and animals.

The key innovations are a point tracking method to precisely localize updated target points and a confidence-based strategy to maintain high image quality at each step, the researchers explain. The confidence value evaluates the editing quality and reverts to original image features if it drops too low, preserving the source material without limiting editing options.

While AI image generation from text has rapidly advanced, enabling highly realistic photos, image manipulation is still catching up in comparison. Some AI models offer inpainting to alter selected areas with text input, but StableDrag's point-based editing promises more precision. The researchers say they will open source the code soon.

Apple is taking a different manipulation approach with MGIE, which uses text prompts to add, remove or change objects without selecting specific regions.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

StableDrag's simple point-and-click image editing makes turning Mona Lisa's head easy

Alibaba's new GPT-4o competitor Qwen VLo is no longer open source

New AI image model Recraft v3 takes top spot in benchmarks

Janus AI model fuses image understanding and generation in a single adaptable framework

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

StableDrag's simple point-and-click image editing makes turning Mona Lisa's head easy

Alibaba's new GPT-4o competitor Qwen VLo is no longer open source

New AI image model Recraft v3 takes top spot in benchmarks

Janus AI model fuses image understanding and generation in a single adaptable framework