PhotoDoodle AI turns your photos into whimsical works of art with just a few prompts

Researchers from universities in China and Singapore, along with ByteDance, have created PhotoDoodle, an impressive new AI system for image editing. The model can learn different artistic styles from just a few sample images and then accurately implement specific editing instructions.

PhotoDoodle builds on the Flux.1 image generation model developed by German startup Black Forest Labs, leveraging its diffusion transformer architecture and pre-trained parameters.

Building on Flux.1's foundation

The researchers first developed OmniEditor, a version of Flux.1 modified for image processing using LoRA (Low-Rank Adaptation). This technique doesn't change all the network's weights directly but adds small, specialized matrices instead. These matrices can be trained without drastically altering the original model, enabling everything from small concept changes to complete style transformations. The latter requires larger versions of these typically small networks, as in OmniEditor's case.

The team likely sourced the necessary SeedEdit dataset from experiments with ByteDance's image editing model of the same name, which was introduced last year. The paper doesn't provide specific details about the dataset's origin.

Vergleichsdarstellung von acht Bildpaaren: Originalfotos und deren künstlerische Bearbeitungen mit digitalen Doodles, magischen Effekten und dekorativen Elementen. — PhotoDoodle adds playful elements such as monsters, magical effects and decorative illustrations while retaining the original image composition. | Image: Huang et al.

The researchers then trained OmniEditor to replicate individual artists' styles using a LoRA variant called EditLoRA. By studying selected pairs of images, EditLoRA learns the nuances of each artistic style. According to the paper, the training data was created in collaboration with the artists themselves.

This approach solves a critical problem: harmoniously inserting decorative elements into images while maintaining the right perspective, context, and desired style. The researchers note that previous methods, which either changed an entire image's style or only edited small areas, couldn't adequately address this challenge.

How position encoding cloning keeps everything in place

A key component of PhotoDoodle is "position encoding cloning." In simple terms, the AI remembers the exact position of every pixel in the original image.

Bildmatrix mit fünf Reihen: Originalfotos und deren Variationen mit Cartoon-Monstern, Umrisslinien, 3D-Effekten und fließenden Farbblöcken in verschiedenen künstlerischen Stilen. — PhotoDoodle transforms everyday photos using various artistic styles - from cute cartoon monsters to hand-drawn lines and color effects. | Image: Huang et al.

When adding new elements, PhotoDoodle uses this stored position information to place them precisely and blend them seamlessly into the image. This technique requires no additional parameter training, making the process more efficient.

The system also requires "noise-free" input data - meaning the original image must be high quality to prevent unintentional background alterations during processing.

Recommendation

AI research

AI agents outperform human teams in hacking competitions

Setting a new standard for image editing

The team conducted extensive testing to demonstrate PhotoDoodle's capabilities. The system accurately implemented prompts like "Make the cat a little whiter" and "Add a pink monster climbing on the building."

When compared to existing methods, PhotoDoodle achieved superior results across various benchmarks measuring aspects like the similarity between images and text descriptions. It significantly outperformed comparison models in both targeted edits and global image changes.

Vergleichsmatrix: Vier KI-Systeme bearbeiten Katzen- und Architekturfotos nach identischen Anweisungen, aufgeteilt in universelle und spezifische Bildbearbeitung. — The comparison of PhotoDoodle with existing AI image editing systems shows clear differences in the implementation quality of specific prompts. | Image: Huang et al.

Looking toward single-image training

The research team acknowledges that PhotoDoodle currently requires dozens of image pairs and thousands of training steps. Their next goal is to develop a system that can learn styles from just a single pair of images.

To support further research in this area, the scientists have published a dataset containing six different artistic styles and more than 300 image pairs. The code is available on GitHub.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

PhotoDoodle AI turns your photos into whimsical works of art with just a few prompts

Building on Flux.1's foundation

How position encoding cloning keeps everything in place

AI agents outperform human teams in hacking competitions

Setting a new standard for image editing

Looking toward single-image training

Cognition is seeking over $300 million in new funding at a $10 billion valuation

Google is testing an new AI search tool called Web Guide

FDA is using an AI system that staff say frequently invents or misrepresents drug research

Google DeepMind's Gemini wins Mathematical Olympiad gold using only natural language

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

PhotoDoodle AI turns your photos into whimsical works of art with just a few prompts

Building on Flux.1's foundation

How position encoding cloning keeps everything in place

Setting a new standard for image editing

Looking toward single-image training

Share

Bank details