xAI has introduced Grok 4 Fast, a lighter version of its flagship model. According to the company, it performs on par with Grok 4 in most tasks but uses about 40 percent less compute. That efficiency also translates into lower costs - xAI says the price per task can drop by as much as 98 percent.
In benchmarks like GPQA Diamond (85.7 percent) and AIME 2025 (92.0 percent), Grok 4 Fast scores close to models like Grok 4 and even GPT-5. The company highlights that the model cuts down on so-called "thinking tokens," using an average of 40 percent fewer tokens to reach similar results. The gap becomes most obvious on complex problems, where other models take more intermediate steps and require more computation.
Earlier versions relied on separate models for simple answers and reasoning-heavy tasks. Grok 4 Fast combines both approaches into one architecture, with its behavior controlled through the system prompt. This fits the broader trend toward hybrid models.
The system has also been trained to use external tools on its own, including web browsing and code execution. On benchmarks like BrowseComp (44.9 percent) and X Bench Deepsearch (74 percent), it outperforms Grok 4. In the LMArena-Search benchmark, it even tops OpenAI's o3-websearch, which previously held the lead. In Text Arena, Grok 4 Fast currently ranks 8th, ahead of other models in a similar size range.
A single model for different tasks
Grok 4 Fast is available through grok.com, iOS and Android apps, and the xAI API. It comes in two versions: one optimized for reasoning-heavy work and another for quick answers. Both support a 2-million-token context window. Pricing ranges from $0.05 to $1.00 per million tokens, depending on token type. For now, Grok 4 Fast is also free to use via OpenRouter and Vercel.