Content
summary Summary

Google's latest version of its Gemini AI model (Exp-1114) has achieved top scores across most test categories in the Chatbot Arena, now sharing the leading position with OpenAI's GPT-4o, according to testing platform lmarena.ai.

Ad

Based on more than 6,000 community evaluations, Gemini-Exp-1114 leads the rankings in mathematics, image processing, and creative writing categories. For programming tasks, the model ranks third. In head-to-head comparisons, Gemini wins 50 percent of matches against GPT-4o, 56 percent against o1-preview, and 62 percent against Claude 3.5 Sonnet.

When factoring in style control metrics, which assess pure content performance without considering formatting elements like text length or headings, Gemini's position changes significantly. Under these adjusted metrics, which aim to prevent models from scoring higher simply through longer or visually enhanced responses, Gemini drops to fourth place.

The experimental Gemini version is publicly available through Google's AI Studio platform.

Ad
Ad

Gemini 2 or just a minor update?

Gemini, first introduced in December 2023 and updated to version 1.5 in February 2024, currently offers a Pro variant that processes up to one million tokens, with a beta version handling up to ten million tokens. The system works with text, images, audio, video, and code. Google integrates Gemini across various products, including Workspace, Google Search, and the Gemini app.

Reports suggest Google plans to introduce Gemini 2 in December, though its performance reportedly falls short of expectations. Whether this new experimental version represents a variant of Gemini 2 remains unclear.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google's AI model Gemini (Exp-1114) shares first place with OpenAI's GPT-4o in the chatbot arena with over 6,000 community ratings, and leads in the math, image processing and creative writing categories.
  • With style control, Gemini falls to fourth place. The adjusted score shows that part of the good overall result is due to style factors such as text length and formatting.
  • The model wins 50 percent of head-to-head comparisons against GPT-4o, 56 percent against o1-preview and 62 percent against Claude 3.5 Sonnet. The experimental version is publicly available on Google's AI Studio platform.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.