Content
summary Summary

Google Deepmind is adding a new image editing model to the Gemini app that can make dramatic changes to photos on demand while keeping people and animals recognizable.

Ad

The new "Gemini 2.5 Flash Image Generation" model builds on Gemini's earlier native image generation tools but delivers much sharper prompt handling. Google says it often outperforms the GPT-4o model used in ChatGPT, especially when it comes to following text prompts for image edits. While many pure image models still struggle with prompt accuracy, Gemini 2.5 Flash gets it right more often.

A key feature is "character consistency": the model can keep a person, animal, or object visually consistent across multiple images, even as poses, backgrounds, or lighting change.

Gemini 2.5 Flash keeps characters consistent across new scenes. Whether it outperforms more complex fine-tuning remains to be seen. | Image: Google Deepmind

This opens up new possibilities for creating image series or product shots from multiple angles. Google says the model is ideal for generating consistent brand assets and product catalogs, and claims Gemini 2.5 Flash outperforms other image systems on a wide range of editing tasks.

Ad
Ad
Gemini 2.5 Flash outperforms previous models on several human-rated image editing benchmarks (ELO score). | Image: Google

The model also supports precise, localized edits through text prompts. Users can blur backgrounds, remove blemishes, add colors, or erase entire objects without manual selection. A template app called "PixShop" shows off these editing features with a simple interface and prompt controls.

PixShop demonstrates Gemini 2.5 Flash's text-based editing tools. | Image: Google Deepmind

Image composition, style transfer, and real-world reasoning

Gemini 2.5 Flash can blend up to three images at once. For example, you can combine a product photo and a room photo to create a realistic interior scene. Complex compositions with several elements can be generated from a single prompt. Google also offers an interactive canvas tool for multi-image fusion.

Gemini 2.5 Flash blends multiple images into one composition. | Image: Google Deepmind

The model handles style transfer too, moving patterns, colors, or textures from one object to another while keeping the shape and details intact. Typical examples include dresses with butterfly patterns or boots with floral textures.

Gemini 2.5 Flash applies patterns and styles across objects. | Image: Google Deepmind

Gemini 2.5 Flash can also visualize simple cause-and-effect, which Google calls "real-world reasoning." In one demo, the model generates an image of a balloon drifting toward a cactus, then another image showing what happens next.

The model can illustrate cause and effect, such as a balloon meeting a cactus. | Image: Google Deepmind

These semantic features draw on Gemini 2.5's world knowledge, Google says. You can try them yourself using a painting app that follows text instructions.

Recommendation

Available to users and developers

The Gemini 2.5 Flash image tools are now available in the Gemini app. Instead of selecting the "Imagen" image model in the chat bar, you need to switch to the "Flash" language model at the top left to use the new features. The setup might be a little confusing at first, but it makes sense given Gemini's language-based editing approach.

To use Gemini 2.5 Flash image editing, select the "Flash" language model in the Gemini app. | Image: Screenshot of THE DECODER

After picking the right model, you can upload an image and give Gemini editing instructions. Every image comes with both a visible watermark and an invisible SynthID digital watermark.

Gemini 2.5 Flash Image is also available in preview through the Gemini API, Google AI Studio, and Vertex AI. Pricing is $30 per million output tokens. Each image uses about 1,290 tokens, or roughly $0.039 per image, the same as Gemini 2.0 Flash Image.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google Deepmind has added the Gemini 2.5 Flash image editing model to the Gemini app, letting users make AI-powered image changes using text prompts, such as removing objects or creating visually matching image series.
  • The model supports displaying people or objects in different poses and lighting, merges multiple images, handles style transfer, and makes simple visual adjustments, all guided by text instructions.
  • This feature is available in the Gemini app when the "Flash" model is turned on, and developers can access Gemini 2.5 Flash through the Gemini API, Google AI Studio, and Vertex AI.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.