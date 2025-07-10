AI in practice
Musk unveils Grok 4 as xAI's new AI model that beats OpenAI and Google on major benchmarks

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Amid backlash over antisemitic content and major leadership shakeups, Elon Musk has introduced Grok 4, the latest flagship AI model from xAI.

The launch comes during a period of upheaval for Musk's companies, with reports of xAI's chief scientist Igor Babuschkin and X CEO Linda Yaccarino both departing shortly before the announcement.

Musk positioned Grok 4 as a leap forward for artificial intelligence, claiming it outperforms competitors like OpenAI and Google by a wide margin across multiple benchmarks.

New versions, features, and premium options

xAI has released two versions of its new model: Grok 4 and Grok 4 Heavy. Grok 4 Heavy uses a multi-agent setup, allowing several agents to tackle problems simultaneously and compare results, similar to a study group. This approach leads to significantly stronger benchmark results.

The latest update brings multimodal capabilities, so the model can process both text and images. There is also a "Grok 4 Code" version aimed at developers for coding assistance, and "Grok 4 Voice" for natural-sounding speech output. Grok 4 maintains real-time internet access through DeepSearch, drawing especially from data on Musk's X platform.

Access to Grok 4 is priced at $30 per month. For $300 per month, the new "SuperGrok Heavy" subscription provides early access to Grok 4 Heavy and upcoming features.

Performance and benchmarks

Musk claims Grok 4 surpasses even advanced graduate students in every subject, though he acknowledges the model sometimes lacks common sense and has yet to make new scientific discoveries. According to xAI, that is only a matter of time.

To back up its performance, xAI highlights results from the demanding "Humanity's Last Exam" benchmark, which covers math, humanities, and science. Grok 4 scored 25.4 percent without external tools, ahead of Google's Gemini 2.5 Pro (21.6 percent) and OpenAI's o3 (high) (21 percent). With tools, Grok 4 Heavy reached 44.4 percent, widening the gap.

On the challenging ARC-AGI-2 test, Grok 4 set a new high with a score of 16.2 percent, nearly doubling the next-best commercial competitor, Claude Opus 4.

In the Artificial Analysis Intelligence Index, which aggregates several benchmarks, Grok 4 now leads the field, surpassing OpenAI, Google, Anthropic, and Deepseek. This is the first time an xAI model has taken the top spot. Grok 4 also performs best in the SWE-Bench coding benchmark and a range of other standard tests.

Controversy over antisemitic content

The Grok 4 launch has been overshadowed by controversy after a version of Grok integrated into X generated antisemitic posts, praised Adolf Hitler, and criticized Jewish executives in Hollywood.

xAI responded by temporarily restricting Grok’s automated account, deleting the offensive posts, and updating the system prompt to remove language that encouraged politically incorrect statements. During the nearly hour-long launch event, Musk and his team did not address these incidents.

On X, Musk explained that Grok had been too compliant with user instructions and too easily manipulated.

  • Elon Musk introduced Grok 4, xAI’s newest AI model, during a period of leadership changes and controversy over antisemitic content generated by the system on X.
  • Grok 4 and its advanced version, Grok 4 Heavy, feature multimodal capabilities, coding and voice options, and a multi-agent setup that allows stronger performance on benchmarks. The top-tier “SuperGrok Heavy” subscription is priced at $300 per month and provides early access to advanced features.
  • xAI claims Grok 4 now outperforms rivals from OpenAI, Google, and Anthropic on major benchmarks, taking first place in the Artificial Analysis Intelligence Index and showing significant gains on coding and reasoning tests, though concerns remain about its vulnerability to generating harmful content.
