Google's new experimental Gemini model leads AI rankings until you strip away the formatting

Nov 15, 2024

Google

Key Points

Google's AI model Gemini (Exp-1114) shares first place with OpenAI's GPT-4o in the chatbot arena with over 6,000 community ratings, and leads in the math, image processing and creative writing categories.
With style control, Gemini falls to fourth place. The adjusted score shows that part of the good overall result is due to style factors such as text length and formatting.
The model wins 50 percent of head-to-head comparisons against GPT-4o, 56 percent against o1-preview and 62 percent against Claude 3.5 Sonnet. The experimental version is publicly available on Google's AI Studio platform.

Google's latest version of its Gemini AI model (Exp-1114) has achieved top scores across most test categories in the Chatbot Arena, now sharing the leading position with OpenAI's GPT-4o, according to testing platform lmarena.ai.

Based on more than 6,000 community evaluations, Gemini-Exp-1114 leads the rankings in mathematics, image processing, and creative writing categories. For programming tasks, the model ranks third. In head-to-head comparisons, Gemini wins 50 percent of matches against GPT-4o, 56 percent against o1-preview, and 62 percent against Claude 3.5 Sonnet.

When factoring in style control metrics, which assess pure content performance without considering formatting elements like text length or headings, Gemini's position changes significantly. Under these adjusted metrics, which aim to prevent models from scoring higher simply through longer or visually enhanced responses, Gemini drops to fourth place.

The experimental Gemini version is publicly available through Google's AI Studio platform.

Gemini 2 or just a minor update?

Gemini, first introduced in December 2023 and updated to version 1.5 in February 2024, currently offers a Pro variant that processes up to one million tokens, with a beta version handling up to ten million tokens. The system works with text, images, audio, video, and code. Google integrates Gemini across various products, including Workspace, Google Search, and the Gemini app.

Reports suggest Google plans to introduce Gemini 2 in December, though its performance reportedly falls short of expectations. Whether this new experimental version represents a variant of Gemini 2 remains unclear.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.