AI in practice

Jun 29, 2024Jun 29, 2024

GPT-4o and Claude 3.5 Sonnet dominate vision language models

Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.

Profile

E-Mail

LMSYS Org has added image recognition to the Chatbot Arena to compare vision language models (VLMs) from OpenAI, Anthropic, Google, and other AI vendors. In two weeks, more than 17,000 user preferences were collected in more than 60 languages. GPT-4o and Claude 3.5 Sonnet performed significantly better at image recognition than Gemini 1.5 Pro and GPT-4 Turbo. While Claude 3 Opus is better than Gemini 1.5 Flash for language models, both are similarly good for VLMs. The open-source model Llava-v1.6-34b is slightly better than Claude-3-Haiku. The data collected shows common applications such as image description, math problems, document comprehension, meme explanation, and story writing. Next, the team plans to add support for multiple images, as well as PDFs, video, and audio. The Large Model Systems Organization (LMSYS Org) is an open research organization founded by UC Berkeley students and faculty in collaboration with UCSD and CMU.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:

Bank transfer

Sources

LMSYS Org

Maximilian Schreiner

Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.

Profile

E-Mail

AI research

Jan 28, 2025Jan 28, 2025

Deepseek's Janus Pro is a good upgrade, but it won't fuel a US AI 'Sputnik crisis'

News, tests and reports about VR, AR and MIXED Reality.

What happens next with MIXED My personal farewell to MIXED Meta and Anduril are now jointly developing XR headsets for the US military MIXED-NEWS.com

AI research

Nov 6, 2024Nov 6, 2024

Seattle startup shrinks computer vision AI to fit in your pocket

AI research

Oct 13, 2024

'OCR 2.0' model converts images of text, formulas, notes, and shapes into editable text

Google News

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

GPT-4o and Claude 3.5 Sonnet dominate vision language models

Deepseek's Janus Pro is a good upgrade, but it won't fuel a US AI 'Sputnik crisis'

Seattle startup shrinks computer vision AI to fit in your pocket

'OCR 2.0' model converts images of text, formulas, notes, and shapes into editable text

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

GPT-4o and Claude 3.5 Sonnet dominate vision language models

Deepseek's Janus Pro is a good upgrade, but it won't fuel a US AI 'Sputnik crisis'

Seattle startup shrinks computer vision AI to fit in your pocket

'OCR 2.0' model converts images of text, formulas, notes, and shapes into editable text