Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

Amid backlash over antisemitic content and major leadership shakeups, Elon Musk has introduced Grok 4, the latest flagship AI model from xAI.

The launch comes during a period of upheaval for Musk's companies, with reports of xAI's chief scientist Igor Babuschkin and X CEO Linda Yaccarino both departing shortly before the announcement.

Musk positioned Grok 4 as a leap forward for artificial intelligence, claiming it outperforms competitors like OpenAI and Google by a wide margin across multiple benchmarks.

New versions, features, and premium options

xAI has released two versions of its new model: Grok 4 and Grok 4 Heavy. Grok 4 Heavy uses a multi-agent setup, allowing several agents to tackle problems simultaneously and compare results, similar to a study group. This approach leads to significantly stronger benchmark results.

The latest update brings multimodal capabilities, so the model can process both text and images. There is also a "Grok 4 Code" version aimed at developers for coding assistance, and "Grok 4 Voice" for natural-sounding speech output. Grok 4 maintains real-time internet access through DeepSearch, drawing especially from data on Musk's X platform.

Access to Grok 4 is priced at $30 per month. For $300 per month, the new "SuperGrok Heavy" subscription provides early access to Grok 4 Heavy and upcoming features.

Performance and benchmarks

Musk claims Grok 4 surpasses even advanced graduate students in every subject, though he acknowledges the model sometimes lacks common sense and has yet to make new scientific discoveries. According to xAI, that is only a matter of time.

To back up its performance, xAI highlights results from the demanding "Humanity's Last Exam" benchmark, which covers math, humanities, and science. Grok 4 scored 25.4 percent without external tools, ahead of Google's Gemini 2.5 Pro (21.6 percent) and OpenAI's o3 (high) (21 percent). With tools, Grok 4 Heavy reached 44.4 percent, widening the gap.

On the challenging ARC-AGI-2 test, Grok 4 set a new high with a score of 16.2 percent, nearly doubling the next-best commercial competitor, Claude Opus 4.

Recommendation

AI in practice

Anthropic launches Claude 3.7 Sonnet hybrid AI model and Claude Code programming tool

Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9%

This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA pic.twitter.com/YbCMLXPJ2e

- ARC Prize (@arcprize) July 10, 2025

In the Artificial Analysis Intelligence Index, which aggregates several benchmarks, Grok 4 now leads the field, surpassing OpenAI, Google, Anthropic, and Deepseek. This is the first time an xAI model has taken the top spot. Grok 4 also performs best in the SWE-Bench coding benchmark and a range of other standard tests.

xAI gave us early access to Grok 4 - and the results are in. Grok 4 is now the leading AI model.

We have run our full suite of benchmarks and Grok 4 achieves an Artificial Analysis Intelligence Index of 73, ahead of OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude... pic.twitter.com/Vc9781SIzd

- Artificial Analysis (@ArtificialAnlys) July 10, 2025

Controversy over antisemitic content

The Grok 4 launch has been overshadowed by controversy after a version of Grok integrated into X generated antisemitic posts, praised Adolf Hitler, and criticized Jewish executives in Hollywood.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

xAI responded by temporarily restricting Grok’s automated account, deleting the offensive posts, and updating the system prompt to remove language that encouraged politically incorrect statements. During the nearly hour-long launch event, Musk and his team did not address these incidents.

On X, Musk explained that Grok had been too compliant with user instructions and too easily manipulated.

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

New versions, features, and premium options

Performance and benchmarks

Anthropic launches Claude 3.7 Sonnet hybrid AI model and Claude Code programming tool

Controversy over antisemitic content

xAI says Grok 4 is no longer searching for Musk's views before it answers

OpenAI’s math breakthrough might also mean AI is getting better at knowing its own limits

Google DeepMind's Gemini wins Mathematical Olympiad gold using only natural language

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

New versions, features, and premium options

Performance and benchmarks

Anthropic launches Claude 3.7 Sonnet hybrid AI model and Claude Code programming tool

Controversy over antisemitic content

xAI says Grok 4 is no longer searching for Musk's views before it answers