Ad
Skip to content

Google's new experimental Gemini model leads AI rankings until you strip away the formatting

Image description
Google

Google's latest version of its Gemini AI model (Exp-1114) has achieved top scores across most test categories in the Chatbot Arena, now sharing the leading position with OpenAI's GPT-4o, according to testing platform lmarena.ai.

Based on more than 6,000 community evaluations, Gemini-Exp-1114 leads the rankings in mathematics, image processing, and creative writing categories. For programming tasks, the model ranks third. In head-to-head comparisons, Gemini wins 50 percent of matches against GPT-4o, 56 percent against o1-preview, and 62 percent against Claude 3.5 Sonnet.

When factoring in style control metrics, which assess pure content performance without considering formatting elements like text length or headings, Gemini's position changes significantly. Under these adjusted metrics, which aim to prevent models from scoring higher simply through longer or visually enhanced responses, Gemini drops to fourth place.

The experimental Gemini version is publicly available through Google's AI Studio platform.

Ad
DEC_D_Incontent-1

Gemini 2 or just a minor update?

Gemini, first introduced in December 2023 and updated to version 1.5 in February 2024, currently offers a Pro variant that processes up to one million tokens, with a beta version handling up to ten million tokens. The system works with text, images, audio, video, and code. Google integrates Gemini across various products, including Workspace, Google Search, and the Gemini app.

Reports suggest Google plans to introduce Gemini 2 in December, though its performance reportedly falls short of expectations. Whether this new experimental version represents a variant of Gemini 2 remains unclear.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

  • Over 20 percent launch discount.
  • Read without distractions – no Google ads.
  • Access to comments and community discussions.
  • Weekly AI newsletter.
  • 6 times a year: “AI Radar” – deep dives on key AI topics.
  • Up to 25 % off on KI Pro online events.
  • Access to our full ten-year archive.
  • Get the latest AI news from The Decoder.
Subscribe to The Decoder