Nvidia rival Cerebras opens six data centers for rapid AI inference

Cerebras Systems plans to strengthen its AI inference capabilities by building new data centers across North America and Europe.

The company plans to concentrate 85 percent of its capacity in the United States, with three facilities already operational in Santa Clara, Stockton, and Dallas. Additional centers will open in Minneapolis (Q2 2025), Oklahoma City and Montreal (Q3), and Atlanta and France (Q4).

Stacked column chart: Cerebra's AI inference capacity in 2025 by quarter and region, showing 20-fold growth from Q1 to Q4. — Cerebras plans major expansion of its AI inference capacity in 2025, focusing primarily on U.S. locations. | Image: Cerebras

At the heart of these facilities are Cerebras' wafer-scale engines, a specialized chip architecture optimized for AI applications. The company says its CS-3 systems will process 40 million Llama-70B tokens per second for inference tasks.

The Oklahoma City facility will house more than 300 CS-3 systems. Built to Level 3+ standards, the center includes protection against tornadoes and earthquakes, plus triple redundant power supplies. Operations begin in June 2025.

Early adoption by industry leaders

Several prominent AI companies have already signed on to use Cerebras' infrastructure, including French startup Mistral with its Le Chat assistant and AI answer engine Perplexity. HuggingFace and AlphaSense have also committed to the platform.

The technology particularly benefits reasoning models like Deepseek-R1 and OpenAI o3, which typically require several minutes for calculations and generate numerous tokens during their thought processes.

The expansion represents part of Cerebras' broader 2025 scaling strategy, with some locations operated in partnership with Emirati company G42. In Montreal, Bit Digital subsidiary Enovum will manage the facility, which promises inference speeds ten times faster than current GPUs when it launches in July 2025.

Cerebras Systems, a U.S.-based company, specializes in developing AI chips with a unique approach: using entire wafers as single chips, called "Wafer Scale Engines." The WSE-3 represents their third generation of this technology.

The system is currently used at Argonne National Laboratory, Pittsburgh Supercomputing Center, and GlaxoSmithKline. However, it has limitations: it doesn't support native CUDA (Nvidia's standard) and offers less server compatibility than Nvidia solutions.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI in practice

Nvidia rival Cerebras opens six data centers for rapid AI inference

Early adoption by industry leaders

Nvidia positions GR00T N1 to dominate robotics ecosystem

New Google model to autonomously control browsers and mobile apps

Google expands AI-powered search mode to Europe

Elevenlabs releases open-source UI library for voice and audio applications

OpenAI suddenly remembers that copyright law exists after a few days of wild Sora videos

OpenAI unveils Sora 2 video model with realistic physics, high-quality audio, and a new social app

Deepmind says video models for visual tasks could become what LLMs are for text tasks

Nvidia rival Cerebras opens six data centers for rapid AI inference

Early adoption by industry leaders

Share

Bank details