Elon Musk's AI company xAI has unveiled its new language model Grok 3. Initial tests show strengths in complex tasks, but also clear weaknesses. It will be significantly more expensive to use.
xAI unveiled its latest AI model Grok 3 on Monday, marking a significant advancement in the company's AI capabilities. According to Elon Musk, the new model required ten times more computing power than its predecessor, utilizing a Memphis data center equipped with approximately 200,000 GPUs.
The Grok 3 family introduces several variants, including a streamlined mini version that trades accuracy for speed. A notable addition is the new "Reasoning" models, specifically designed for mathematical and scientific problems. Users can adjust these capabilities through "Thinking" and "Big Brain" settings in the Grok interface. The company notes that this release isn't final - the model continues to undergo training, and the team plans to implement improvements over the coming weeks.
Grok 3: Strong benchmarks and praise from OpenAI founder
According to the AI benchmarking platform lmarena.ai, Grok 3 has achieved an unprecedented score above 1400 in the chatbot arena, leading across all categories including programming and outperforming models from OpenAI, Anthropic, and Google. However, real-world performance may differ from benchmark results. For instance, while Claude 3.5 Sonnet scores lower on coding benchmarks than some models, many users still consider it the superior choice for most programming tasks.
OpenAI founder Andrej Karpathy, who received early access to Grok 3, highlights the model's logical reasoning capabilities. The "Think" function successfully handles complex tasks like calculating training flops for GPT-2 or creating hexagon grids for board games - abilities previously limited to OpenAI's premium o1-pro model. This function also improves accuracy on basic mathematical operations like counting letters and comparing decimal numbers.
- xAI (@xai) February 18, 2025
New search function with teething problems
Karpathy reports that the new DeepSearch function matches Perplexity's research tool in quality, providing relevant answers to current topics like upcoming Apple launches and Palantir stock movements. However, he identified several significant issues: the model sometimes generates fake URLs, makes unsupported claims, and only cites X posts when specifically prompted. It also appears unaware of its own existence, having omitted xAI from a list of major AI labs.
These limitations mean DeepSearch hasn't yet reached the quality level of OpenAI's "Deep Research," and it struggles with humor and ethical questions.
Higher prices for new functions
The new release brings significant price changes: X is doubling its Premium+ subscription to $50 monthly. Additionally, xAI introduces a separate $30 monthly SuperGrok subscription that unlocks advanced features like unlimited image generation and higher reasoning limits. Grok will also be available through a web interface at www.grok.com, though this service isn't yet accessible in the UK and EU.
Looking ahead, xAI plans to launch a voice mode and enterprise API in the coming weeks. The company also intends to release Grok 2 as an open-source model once Grok 3 achieves stability.