Analysts say Google now leads the AI performance race with Gemini 3 Pro

Nov 18, 2025

Google

Gemini 3 Pro debuts with major gains in logic, multimodality, and speed, along with a new agent platform that hints at where Google wants AI to go next.

Google has introduced Gemini 3, calling it the company's most intelligent model yet. CEO Sundar Pichai and Google Deepmind leaders say the new lineup is designed to push forward logical reasoning, multimodal understanding, and agent capabilities.

Gemini 3 Pro is launching as a preview and will roll out across Google products, including the Gemini app, AI Studio, Vertex AI, and the AI mode in Google Search. It marks the first time a new Gemini model is available in Search on day one. Google says Gemini 3 delivers stronger contextual understanding and more nuanced responses. The answers are meant to be intelligent, concise, and direct, avoiding cliches and flattery in favor of real insight.

Stronger reasoning pushes Gemini 3 to new benchmark highs

Google highlights Gemini 3 Pro's performance with a long list of benchmark results. It reportedly leads the LMArena rankings with an Elo score of 1501 and shows what Google describes as PhD-level reasoning on tests like Humanity's Last Exam at 37.5 percent without tools and GPQA Diamond at 91.9 percent. That places it ahead of xAI's recent Grok 4.1 release. The model also posts strong scores in math at 23.4 percent on MathArena Apex and in multimodal understanding at 81 percent on MMMU-Pro.

According to the official model card, Gemini 3 Pro is built on a sparse mixture-of-experts transformer architecture. Google trained it on a large multimodal dataset that includes publicly available web documents, licensed data, synthetic AI data, and user data from Google products and services. The model's knowledge cutoff is January 2025.

Multimodal performance becomes a core advantage for Gemini 3

One of Gemini 3's defining traits is its native multimodality, allowing it to process text, images, video, and audio. Google reports top-tier results at 81 percent on MMMU-Pro and 87.6 percent on Video-MMMU. The model's strengths show up clearly in interface understanding. On the ScreenSpot-Pro benchmark, which tests how well a model can locate elements on a screen, Gemini 3 Pro scores 72.7 percent. That pushes it past the earlier leader Holo2 at 66.1 percent, even though Holo2 was purpose-built for UI navigation. It also far outperforms competitors like Claude 4.5 Sonnet at 36.2 percent and GPT-5.1 at 3.5 percent, and it represents a major jump over Gemini 2.5 Pro at 11.4 percent.

Google says these capabilities open up practical uses such as analyzing sports footage to improve technique or generating code for advanced visualizations. In Search's AI mode, Gemini 3 can produce new immersive visual layouts. And in Chrome, the model is expected to act as a more reliable browser agent.

Deep Think and the Antigravity platform

Alongside Gemini 3 Pro, Google is introducing a new Deep Think mode designed for harder reasoning tasks. In testing, Deep Think surpasses the standard model's already strong results, reaching 41.0 percent on Humanity's Last Exam and 45.1 percent on the ARC-AGI-2 benchmark. Google says Deep Think will first be available to safety testers before rolling out to Google AI Ultra subscribers.

For developers, Google is launching Google Antigravity, a new agent-focused development platform. The goal is to shift AI from a passive assistant to what Google calls an active partner. Agents get direct access to the editor, terminal, and browser and can plan, execute, and verify complex software tasks on their own.

Early analysis suggests Gemini 3 takes the lead in the model race

Independent evaluations appear to support Google's claims. The analytics firm Artificial Analysis, which received early access to Gemini 3 Pro, says the model now leads the market and scores three points higher than GPT-5.1 on the Artificial Analysis Intelligence Index.

The team reports on X that the model takes first place on five of ten core benchmarks, including GPQA Diamond, MMLU-Pro, and HLE. They say Gemini 3 Pro is particularly strong in coding tasks, agent tasks, and multimodal reasoning, where it posts the highest MMMU-Pro score. Artificial Analysis also notes that the model's results on the AA-Omniscience benchmark, which measures knowledge and hallucinations, suggest a comparatively large model size, similar to Anthropic's Opus 4.1.

Performance boosts come with higher operating costs

Artificial Analysis says Gemini 3 Pro's top performance also brings higher costs. For contexts under 200,000 tokens, pricing is 2 dollars per million input tokens and 12 dollars per million output tokens. That makes it more expensive than Gemini 2.5 Pro at 1.25 and 10 dollars and GPT-5.1 at the same rates.

Google still prices Gemini 3 Pro below other high-end models like Claude 4.5 Sonnet at 3 and 15 dollars and Grok 4.1 at 3 and 15 dollars. It also remains far cheaper than the most expensive options, including Claude 4.1 Opus at 15 and 75 dollars and GPT-5 Pro at 15 and 120 dollars.

For larger contexts above 200,000 tokens, Gemini 3 Pro jumps to 4 dollars for input and 18 dollars for output. Deep Think is expected to cost even more.

The model is more token-efficient than Gemini 2.5 Pro, but the higher rates still pushed the cost of running the Artificial Analysis benchmark index up by 12 percent compared to the older model. Analysts note that Gemini 3 Pro compensates with speed, generating up to 128 output tokens per second, which is faster than models like GPT-5.1.

The reliability analysis presents a mixed picture. Gemini 3 Pro reaches 88 percent accuracy in knowledge tests, one of the highest reported scores, but Artificial Analysis also observes a higher hallucination rate than competing models. Google does not give specific hallucination metrics in the model card, describing them only as a known limitation of foundation models.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder