Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks
Key Points
- Elon Musk introduced Grok 4, xAI’s newest AI model, during a period of leadership changes and controversy over antisemitic content generated by the system on X.
- Grok 4 and its advanced version, Grok 4 Heavy, feature multimodal capabilities, coding and voice options, and a multi-agent setup that allows stronger performance on benchmarks. The top-tier “SuperGrok Heavy” subscription is priced at $300 per month and provides early access to advanced features.
- xAI claims Grok 4 now outperforms rivals from OpenAI, Google, and Anthropic on major benchmarks, taking first place in the Artificial Analysis Intelligence Index and showing significant gains on coding and reasoning tests, though concerns remain about its vulnerability to generating harmful content.
Amid backlash over antisemitic content and major leadership shakeups, Elon Musk has introduced Grok 4, the latest flagship AI model from xAI.
The launch comes during a period of upheaval for Musk's companies, with reports of xAI's chief scientist Igor Babuschkin and X CEO Linda Yaccarino both departing shortly before the announcement.
Musk positioned Grok 4 as a leap forward for artificial intelligence, claiming it outperforms competitors like OpenAI and Google by a wide margin across multiple benchmarks.
New versions, features, and premium options
xAI has released two versions of its new model: Grok 4 and Grok 4 Heavy. Grok 4 Heavy uses a multi-agent setup, allowing several agents to tackle problems simultaneously and compare results, similar to a study group. This approach leads to significantly stronger benchmark results.
The latest update brings multimodal capabilities, so the model can process both text and images. There is also a "Grok 4 Code" version aimed at developers for coding assistance, and "Grok 4 Voice" for natural-sounding speech output. Grok 4 maintains real-time internet access through DeepSearch, drawing especially from data on Musk's X platform.
Access to Grok 4 is priced at $30 per month. For $300 per month, the new "SuperGrok Heavy" subscription provides early access to Grok 4 Heavy and upcoming features.
Performance and benchmarks
Musk claims Grok 4 surpasses even advanced graduate students in every subject, though he acknowledges the model sometimes lacks common sense and has yet to make new scientific discoveries. According to xAI, that is only a matter of time.
To back up its performance, xAI highlights results from the demanding "Humanity's Last Exam" benchmark, which covers math, humanities, and science. Grok 4 scored 25.4 percent without external tools, ahead of Google's Gemini 2.5 Pro (21.6 percent) and OpenAI's o3 (high) (21 percent). With tools, Grok 4 Heavy reached 44.4 percent, widening the gap.
On the challenging ARC-AGI-2 test, Grok 4 set a new high with a score of 16.2 percent, nearly doubling the next-best commercial competitor, Claude Opus 4.
Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9%
This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA pic.twitter.com/YbCMLXPJ2e
- ARC Prize (@arcprize) July 10, 2025
In the Artificial Analysis Intelligence Index, which aggregates several benchmarks, Grok 4 now leads the field, surpassing OpenAI, Google, Anthropic, and Deepseek. This is the first time an xAI model has taken the top spot. Grok 4 also performs best in the SWE-Bench coding benchmark and a range of other standard tests.
xAI gave us early access to Grok 4 - and the results are in. Grok 4 is now the leading AI model.
We have run our full suite of benchmarks and Grok 4 achieves an Artificial Analysis Intelligence Index of 73, ahead of OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude... pic.twitter.com/Vc9781SIzd
- Artificial Analysis (@ArtificialAnlys) July 10, 2025
Controversy over antisemitic content
The Grok 4 launch has been overshadowed by controversy after a version of Grok integrated into X generated antisemitic posts, praised Adolf Hitler, and criticized Jewish executives in Hollywood.
xAI responded by temporarily restricting Grok’s automated account, deleting the offensive posts, and updating the system prompt to remove language that encouraged politically incorrect statements. During the nearly hour-long launch event, Musk and his team did not address these incidents.
On X, Musk explained that Grok had been too compliant with user instructions and too easily manipulated.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now