Google makes Gemini 3 Flash the default for search and slashes reasoning costs

Dec 17, 2025

NBPro prompted by THE DECODER

Google's latest model focuses on price and speed. Gemini 3 Flash aims to deliver reasoning similar to more expensive models at a lower cost. The main question is whether this faster, cheaper model is good enough to replace mid-tier alternatives.

Google has released Gemini 3 Flash, the latest model in its Gemini series. According to Google, it’s designed to offer similar reasoning capabilities as the larger Gemini 3 Pro, but at a significantly lower price. The target audience is developers who previously had to pick between speed and advanced features.

Model	Input (per 1M tokens)	Output (per 1M tokens)
Gemini 3 Flash	$0.50	$3.00
Gemini 3 Pro	$2.00	$12.00
Claude Sonnet 4.5	$3.00	$15.00
GPT-5.2 Extra High (OpenAI)	$1.75	$14.00

Developers can squeeze out more savings through context caching, which cuts costs by up to 90 percent when reusing tokens. The batch API takes another 50 percent off for async jobs.

According to an analysis by Artificial Analysis, 3 Flash beats Gemini 2.5 Pro, runs three times faster, and costs way less. Google says even the model's lowest "thinking level" often outperforms older versions cranked up to their maximum settings.

Google's chart positions Gemini 3 Flash at the cost-performance frontier. | Image: Google

Google Search now runs on Gemini 3 Flash worldwide

Google has made Gemini 3 Flash the standard model for AI Mode in Google Search, so it's now handling the bulk of daily search queries. According to Google, Gemini 3 Flash is better at interpreting user intent, pulling in up-to-date information and links, and organizing answers with visuals and suggestions. This model is also supposed to handle more complex, multi-part questions, such as planning a trip or quickly learning a new topic.

Google's published benchmarks show Gemini 3 Flash hitting 90.4 percent on GPQA Diamond, a PhD-level science test. On Humanity's Last Exam, it scored 33.7 percent solo and 43.5 percent with search and code tools. The AIME 2025 math test came in at 95.2 percent without tools and 99.7 percent with code execution. On SWE-bench Verified, a highly competitive coding benchmark, 3 Flash reaches 78 percent. Google says that's actually better than Gemini 3 Pro, though it still falls short of GPT-5.2 and Claude Opus 4.5.

Gemini 3 Flash scores competitively across standard AI benchmarks. | Image: Google

What matters more to developers is whether a model works reliably on everyday tasks. Google claims 3 Flash can adjust how long it thinks based on difficulty, and typically chews through fewer tokens than 2.5 Pro on normal workloads.

Visual reasoning and code execution get an upgrade

Google says Gemini 3 Flash handles visual and spatial reasoning better than before, which should help with video analysis. The model can also run code to zoom into images, count objects, or make edits. Developers need to turn on "thought signatures" in the API or use the new Interactions API to access these features.

The model is available through Google AI Studio, the Gemini API, Google Antigravity, Gemini CLI, and Android Studio. Enterprise users can get it through Vertex AI. Google calls out the Gemini CLI as a good fit for developers who spend a lot of time in the terminal.

A few companies have started building on Gemini 3 Flash. Gaming platform Astrocade uses it to power a system that generates full game plans and working code from a single prompt.

Nick Walton, who runs Latitude, says the model lets his team handle harder tasks in their AI game engine without paying for expensive models like Sonnet 4.5. Resemble AI uses 3 Flash for spotting deepfakes in real time. The company says it analyzes multimodal content four times faster than Gemini 2.5 Pro did.

Google also recently shipped a "Deep Think" mode for Gemini Ultra subscribers, which sits at the opposite end of the speed spectrum. It lets the model reason in parallel for tougher problems, but takes much longer to respond. That trade-off limits it to niche use cases for now—most users aren't willing to wait for better AI answers, as OpenAI's router rollback shows.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Google makes Gemini 3 Flash the default for search and slashes reasoning costs

Google Search now runs on Gemini 3 Flash worldwide

Visual reasoning and code execution get an upgrade

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.