xAI releases cheaper, fast language model Grok 4 Fast

xAI has introduced Grok 4 Fast, a lighter version of its flagship model. According to the company, it performs on par with Grok 4 in most tasks but uses about 40 percent less compute. That efficiency also translates into lower costs - xAI says the price per task can drop by as much as 98 percent.

In benchmarks like GPQA Diamond (85.7 percent) and AIME 2025 (92.0 percent), Grok 4 Fast scores close to models like Grok 4 and even GPT-5. The company highlights that the model cuts down on so-called "thinking tokens," using an average of 40 percent fewer tokens to reach similar results. The gap becomes most obvious on complex problems, where other models take more intermediate steps and require more computation.

Earlier versions relied on separate models for simple answers and reasoning-heavy tasks. Grok 4 Fast combines both approaches into one architecture, with its behavior controlled through the system prompt. This fits the broader trend toward hybrid models.

The system has also been trained to use external tools on its own, including web browsing and code execution. On benchmarks like BrowseComp (44.9 percent) and X Bench Deepsearch (74 percent), it outperforms Grok 4. In the LMArena-Search benchmark, it even tops OpenAI's o3-websearch, which previously held the lead. In Text Arena, Grok 4 Fast currently ranks 8th, ahead of other models in a similar size range.

A single model for different tasks

Grok 4 Fast is available through grok.com, iOS and Android apps, and the xAI API. It comes in two versions: one optimized for reasoning-heavy work and another for quick answers. Both support a 2-million-token context window. Pricing ranges from $0.05 to $1.00 per million tokens, depending on token type. For now, Grok 4 Fast is also free to use via OpenRouter and Vercel.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

xAI releases cheaper, fast language model Grok 4 Fast

A single model for different tasks

Meta's SAM 3 segmentation model blurs the boundary between language and vision

As Google pulls ahead, OpenAI's comeback plan is codenamed 'Shallotpeat'

OpenAI report suggests GPT‑5 is starting to ease scientists’ daily workloads

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

German court deepens the split on AI and copyright with its latest ruling

xAI releases cheaper, fast language model Grok 4 Fast

A single model for different tasks

Meta's SAM 3 segmentation model blurs the boundary between language and vision

As Google pulls ahead, OpenAI's comeback plan is codenamed 'Shallotpeat'

OpenAI report suggests GPT‑5 is starting to ease scientists’ daily workloads