Gemini 2.5 Flash-Lite is the fastest and most cost-effective model in Google's Gemini lineup

Jun 17, 2025

GPT-4o prompted by THE DECODER

Key Points

Google has released the Gemini 2.5 Flash and Gemini 2.5 Pro models as stable versions, making them available for regular use beyond the earlier preview phase.
The company is also introducing Gemini 2.5 Flash-Lite as a preview, which it says operates more quickly and at lower cost than previous Gemini models, especially for tasks needing to process large numbers of requests with minimal delay.
All Gemini 2.5 models, including Flash-Lite, support multimodal input, can be connected to tools like Google Search, handle up to a million tokens in context, and are designed for efficient, high-performance operation at a low price.

Google has officially launched the stable versions of its Gemini 2.5 Flash and Pro models, marking them as production-ready after a successful preview phase.

Both models have already posted strong results in industry benchmarks, and according to anecdotal reports and our own experience, this performance carries over into real-world use.

Alongside these releases, Google is previewing a new variant: Gemini 2.5 Flash-Lite. The company describes Flash-Lite as the fastest and most cost-effective model in the Gemini 2.5 lineup so far.

Developers can now access Flash-Lite in Google AI Studio and Vertex AI, with the stable Flash and Pro models also available through these platforms and the Gemini app. Google Search uses custom versions of Flash and Flash-Lite as well.

Flash-Lite: Speed and efficiency at a lower price point

According to Google, Gemini 2.5 Flash-Lite outperforms its predecessor (2.0 Flash-Lite) in benchmarks for programming, math, science, logical reasoning, and multimodal tasks. In tests like GPQA (science), AIME (math), and LiveCodeBench (code generation), Flash-Lite scores substantially higher than earlier Lite models and even closes the gap with larger models in some areas.

Flash-Lite pricing is the same for both the standard and "Thinking" modes: $0.10 per million input tokens and $0.40 per million output tokens. However, "Thinking" models generate significantly more tokens—so-called reasoning traces—to improve results, which means their actual usage costs are typically higher.

Google says Gemini 2.5 Flash-Lite is especially well-suited for high-volume, low-latency tasks like translation and classification. Benchmark results support this, with Flash-Lite posting 86.8% in FACTS Grounding and 84.5% in Multilingual MMLU. Visual benchmarks are also strong, with scores of 72.9% on MMMU and 57.5% for image understanding.

Table with benchmark results and prices for different variants of Google's Gemini 2.5 models: Gemini 2.5 Flash-Lite (with and without — Flash-Lite delivers low prices and fast response times, while Pro leads in complex reasoning and accuracy. | Image: Google

Like the other Gemini 2.5 models, Flash-Lite supports multimodal input, tool integrations such as Google Search and code execution, and context windows up to one million tokens.

The entire Gemini 2.5 family is designed for hybrid reasoning, aiming to balance high performance with low cost and latency. Google positions these models on the Pareto front, optimizing for both efficiency and capability.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Google