Ad
Skip to content

Google's Gemini 3.5 Flash follows Anthropic and OpenAI in making newer AI models significantly pricier

Image description
Nano Banana Pro prompted by THE DECODER

Key Points

  • Google Deepmind has released Gemini 3.5 Flash, a new AI model that delivers more than 280 output tokens per second, making it the fastest model in its intelligence class, though it comes at 5.5 times the operating cost of its predecessor.
  • Token prices have tripled, and because agent tasks consume significantly more tokens, total benchmark costs actually exceed those of the more expensive Pro model, raising questions about cost efficiency.
  • While Gemini 3.5 Flash shows its strongest improvements in agentic and multimodal tasks, it has a notable weakness in programming, where it falls clearly behind competitors like GPT-5.5 and Claude Opus 4.7.

Google's new Gemini 3.5 Flash is a step up from its predecessor, but it costs more than five times as much to run. High token consumption on agent tasks pushes total costs past the pricier Pro model in benchmark testing.

Google Deepmind has released Gemini 3.5 Flash, the latest version of its Flash model family. Flash was long positioned as the cheaper, faster alternative to Google's more powerful Pro models. An analysis by Artificial Analysis, which got early access, found that Gemini 3.5 Flash costs 5.5 times more to run in benchmark testing than Gemini 3 Flash and nearly twice as much as the Pro model Gemini 3.1. The context window stays at one million tokens.

Gemini 3.5 Flash has gotten much more expensive than its predecessor, both in token price and token consumption. | Image: Artificial Analysis

Token prices alone have tripled: Google now charges $1.50 per million input tokens and $9.00 per million output tokens, up from $0.50 and $3.00 for Gemini 3 Flash. Per token, that's still cheaper than Gemini 3.1 Pro at $2.00 and $12.00.

In practice, though, the math flips. Gemini 3.5 Flash burns through so many more tokens on agent-based tasks that total costs end up 75 percent higher than Gemini 3.1 Pro, according to Artificial Analysis.

Ad
DEC_D_Incontent-1

How much the price hike stings will depend on the application. But Google is following a broader industry trend. Anthropic's Opus 4.7 had a hidden price increase of roughly 30 to 40 percent over its predecessor due to higher token consumption. OpenAI's GPT 5.5 jumped even more, about 50 to 90 percent over 5.4. There, token consumption went down, but base prices went up. Google raised both.

For developers and companies, raw token price is becoming less useful as a standalone metric. What matters now is efficiency, how many tokens a model actually needs to finish a job.

Smarter, but hallucinations remain a problem

Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index, nine points above Gemini 3 Flash. That puts it ahead of Grok 4.3 (high, 53) and Claude Sonnet 4.6 (max, 52). Gains show up across nearly every category tested. As always, benchmarks only capture specific scenarios; real-world performance only becomes clear over extended use with everyday and novel tasks.

Gemini 3.5 Flash verbessert sich deutlich bei Wissensgenauigkeit und Halluzinationsreduktion, liegt aber bei der Halluzinationsrate noch hinter Spitzenmodellen. | Bild: Artificial Analysis
Gemini 3.5 Flash performs much better than its predecessor in benchmarks. | Image: Artificial Analysis

On AA Omniscience, which measures knowledge accuracy and hallucination tendency, Gemini 3.5 Flash improves by 11 points. Its hallucination rate drops to 61 percent, down 31 percentage points from Gemini 3 Flash. That jump sounds impressive until you look at the leaders: MiMo-V2.5-Pro and Grok 4.3 (high) both sit at just 25 percent.

Ad
DEC_D_Incontent-2

Gemini 3.5 Flash cuts its hallucination rate sharply but still trails top models. In answer accuracy, it's even slightly worse than its (strong) predecessor. | Image: Artificial Analysis

Agent tasks show the biggest gains and drive the biggest costs

Agentic tasks have historically been a weak spot for Gemini. That's where 3.5 Flash improves the most. On GDPval-AA, which tests real agent tasks with web and shell access, it hit an Elo score of 1,656, a massive leap over Gemini 3 Flash (1,204) and Gemini 3.1 Pro (1,314), just barely behind GPT-5.4 (xhigh, 1,674).

That performance comes at a cost. Gemini 3.5 Flash needs an average of 49 turns per task , more than any other model tested. Claude Opus 4.7 (max) takes 45, GPT-5.4 (xhigh) takes 40, and Gemini 3.1 Pro only needs 23. All those extra interaction steps drive input token consumption way up.

Gemini 3.5 Flash nearly matches GPT-5.4 on agent tasks but needs the most interaction steps of any model tested. | Image: Artificial Analysis

Output token usage barely changed: 73 million versus 72 million for Gemini 3 Flash. Input tokens are the culprit, pushing Gemini 3.5 Flash past Gemini 3.1 Pro in total cost despite lower per-token prices.

Coding remains a weak spot

Programming is where fast, capable, cheap models are in highest demand, and it's where Gemini 3.5 Flash falls short. On the Artificial Analysis Coding Index, which combines Terminal-Bench Hard and SciCode, it scores just 45. That's well behind Gemini 3.1 Pro Preview (55) and far behind GPT-5.5 (xhigh, 59) and GPT-5.4 (xhigh, 57). Claude Opus 4.7 (max, 53) and Claude Sonnet 4.5 (max, 51) also beat it.

In coding, Gemini 3.5 Flash trails its own Pro model by ten points, despite costing more in practice due to higher token consumption. | Image: Artificial Analysis

For a model that matches these rivals on the overall intelligence index, that's a striking gap. Its strengths clearly lie in agentic and multimodal tasks, but coding is one of the most important use cases for agentic AI, which limits the practical value of those agent gains.

The fastest model at its intelligence level

Gemini 3.5 Flash clocks over 280 output tokens per second, roughly 70 percent faster than Gemini 3 Flash, according to Artificial Analysis. No other model with similar intelligence comes close to that output rate.

Gemini 3.5 Flash combines high intelligence with the fastest output speed in its class. | Image: Artificial Analysis

Unlike many rivals, it also supports video and audio input alongside text and images. Claude Opus 4.7, Grok 4.3, and GPT-5.5 are limited to image input, per Artificial Analysis. On the multimodal benchmark MMMU-Pro, Gemini 3.5 Flash scores 84 percent, the highest result ever recorded. Google takes the top two spots, with Gemini 3.1 Pro second at 82 percent.

Google takes the top two places in the multimodal MMMU-Pro benchmark. | Image: Artificial Analysis

The rising prices reflect a deeper shift: today's AI models are built for complex, multi-step tasks where they plan on their own, use tools, and work through many rounds of interaction. That agentic behavior needs more compute per task than simple chatbots.

Rising costs and murky ROI will force companies to rethink AI spending

Unless inference costs for the underlying hardware drop as fast as compute per task goes up, prices for stronger models will keep climbing. For simpler use cases, cheaper older models or smaller options like Gemini 3.1 Flash-Lite will still be around.

For companies, AI return on investment is getting harder to pin down. Isolated tasks like code generation or translation are easier to measure—faster turnaround, lower staffing costs—but even there, the picture is muddier than it looks.

Knowledge work is where it gets really fuzzy. How do you put a number on a better decision memo or a strategy paper finished in half the time with AI? And what about downstream costs: time spent checking for errors or the learning that doesn't happen when AI does the work?

Those productivity gains tend to be spread thin across departments, show up late, and are hard to separate from other factors. Paying for pricier models is a bet that the efficiency gains will be worth it and that AI-assisted work is just how things will be done. A deep dive into this topic is available in our AI Radar #2.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.