Amid backlash over antisemitic content and major leadership shakeups, Elon Musk has introduced Grok 4, the latest flagship AI model from xAI.
The launch comes during a period of upheaval for Musk's companies, with reports of xAI's chief scientist Igor Babuschkin and X CEO Linda Yaccarino both departing shortly before the announcement.
Musk positioned Grok 4 as a leap forward for artificial intelligence, claiming it outperforms competitors like OpenAI and Google by a wide margin across multiple benchmarks.
New versions, features, and premium options
xAI has released two versions of its new model: Grok 4 and Grok 4 Heavy. Grok 4 Heavy uses a multi-agent setup, allowing several agents to tackle problems simultaneously and compare results, similar to a study group. This approach leads to significantly stronger benchmark results.
The latest update brings multimodal capabilities, so the model can process both text and images. There is also a "Grok 4 Code" version aimed at developers for coding assistance, and "Grok 4 Voice" for natural-sounding speech output. Grok 4 maintains real-time internet access through DeepSearch, drawing especially from data on Musk's X platform.
Access to Grok 4 is priced at $30 per month. For $300 per month, the new "SuperGrok Heavy" subscription provides early access to Grok 4 Heavy and upcoming features.
Performance and benchmarks
Musk claims Grok 4 surpasses even advanced graduate students in every subject, though he acknowledges the model sometimes lacks common sense and has yet to make new scientific discoveries. According to xAI, that is only a matter of time.
To back up its performance, xAI highlights results from the demanding "Humanity's Last Exam" benchmark, which covers math, humanities, and science. Grok 4 scored 25.4 percent without external tools, ahead of Google's Gemini 2.5 Pro (21.6 percent) and OpenAI's o3 (high) (21 percent). With tools, Grok 4 Heavy reached 44.4 percent, widening the gap.
On the challenging ARC-AGI-2 test, Grok 4 set a new high with a score of 16.2 percent, nearly doubling the next-best commercial competitor, Claude Opus 4.
Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9%
This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA pic.twitter.com/YbCMLXPJ2e
- ARC Prize (@arcprize) July 10, 2025
In the Artificial Analysis Intelligence Index, which aggregates several benchmarks, Grok 4 now leads the field, surpassing OpenAI, Google, Anthropic, and Deepseek. This is the first time an xAI model has taken the top spot. Grok 4 also performs best in the SWE-Bench coding benchmark and a range of other standard tests.
xAI gave us early access to Grok 4 - and the results are in. Grok 4 is now the leading AI model.
We have run our full suite of benchmarks and Grok 4 achieves an Artificial Analysis Intelligence Index of 73, ahead of OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude... pic.twitter.com/Vc9781SIzd
- Artificial Analysis (@ArtificialAnlys) July 10, 2025
Controversy over antisemitic content
The Grok 4 launch has been overshadowed by controversy after a version of Grok integrated into X generated antisemitic posts, praised Adolf Hitler, and criticized Jewish executives in Hollywood.
xAI responded by temporarily restricting Grok’s automated account, deleting the offensive posts, and updating the system prompt to remove language that encouraged politically incorrect statements. During the nearly hour-long launch event, Musk and his team did not address these incidents.
On X, Musk explained that Grok had been too compliant with user instructions and too easily manipulated.