Google’s Gemini 2.5 Flash gives you speed when you need it and reasoning when you can afford it

GPT-4 prompted by THE DECODER

Google has released an early preview of Gemini 2.5 Flash, a faster, more flexible version of its lightweight AI model.

Developers can try it now through the Gemini API using Google AI Studio and Vertex AI. The model is also available to users in the Gemini app.

Built on the 2.0 Flash foundation, the new version is designed to offer stronger reasoning while optimizing for speed and cost.

Google describes it as a hybrid model that gives developers more control over how much "thinking" the system does. With this control, users can set budgets to balance quality, response time, and cost.

Even with "thinking" turned off, Gemini 2.5 Flash still outperforms its predecessor. Turning it on improves output quality, but the price increases—from $0.004 to $3.50 per response.

Despite the higher cost, the model is still cheaper than comparable systems. Only OpenAI's o4-mini comes close in terms of price performance.

Flash extends Google’s hybrid Gemini 2.5 lineup

The release of Flash complements Google’s broader Gemini 2.5 series of hybrid reasoning models. While Flash is designed for speed and affordability, Gemini 2.5 Pro targets more demanding tasks with full-scale reasoning and multimodal support.

Gemini 2.5 Pro is Google’s most capable model to date, leading several benchmark tests. It performs well across math, science, and programming tasks, scoring 18.8% on the "Humanity’s Last Exam" and 63.8% on SWE-Bench Verified. Pro is available now in Google AI Studio and to Gemini Advanced subscribers.

But Gemini 2.5 Pro comes with higher pricing. Input tokens cost $1.25 per million for prompts up to 200,000 tokens and $2.50 beyond that. Output tokens—including "thinking"—are priced at $10 per million under 200k tokens, and $15 above.

Recommendation

AI in practice

Meta's LibGen controversy reveals how desperate AI companies are for quality training data

Together, Gemini 2.5 Flash and Pro offer developers more flexibility across speed, cost, and reasoning power—part of Google’s broader strategy to deliver scalable AI options for a wide range of use cases.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Google’s Gemini 2.5 Flash gives you speed when you need it and reasoning when you can afford it

Flash extends Google’s hybrid Gemini 2.5 lineup

Meta's LibGen controversy reveals how desperate AI companies are for quality training data

Amazon's new DeepFleet mode helps robots deliver your packages even faster

Black Forest Labs opens its AI image model FLUX.1 context [dev] for private use

Meta reportedly offers top OpenAI researchers up to $300 million over four years

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Google’s Gemini 2.5 Flash gives you speed when you need it and reasoning when you can afford it

Flash extends Google’s hybrid Gemini 2.5 lineup

Share

Bank details