Content
summary Summary

Cerebras Systems plans to strengthen its AI inference capabilities by building new data centers across North America and Europe.

Ad

The company plans to concentrate 85 percent of its capacity in the United States, with three facilities already operational in Santa Clara, Stockton, and Dallas. Additional centers will open in Minneapolis (Q2 2025), Oklahoma City and Montreal (Q3), and Atlanta and France (Q4).

Stacked column chart: Cerebra's AI inference capacity in 2025 by quarter and region, showing 20-fold growth from Q1 to Q4.
Cerebras plans major expansion of its AI inference capacity in 2025, focusing primarily on U.S. locations. | Image: Cerebras

At the heart of these facilities are Cerebras' wafer-scale engines, a specialized chip architecture optimized for AI applications. The company says its CS-3 systems will process 40 million Llama-70B tokens per second for inference tasks.

The Oklahoma City facility will house more than 300 CS-3 systems. Built to Level 3+ standards, the center includes protection against tornadoes and earthquakes, plus triple redundant power supplies. Operations begin in June 2025.

Ad
Ad

Early adoption by industry leaders

Several prominent AI companies have already signed on to use Cerebras' infrastructure, including French startup Mistral with its Le Chat assistant and AI answer engine Perplexity. HuggingFace and AlphaSense have also committed to the platform.

The technology particularly benefits reasoning models like Deepseek-R1 and OpenAI o3, which typically require several minutes for calculations and generate numerous tokens during their thought processes.

The expansion represents part of Cerebras' broader 2025 scaling strategy, with some locations operated in partnership with Emirati company G42. In Montreal, Bit Digital subsidiary Enovum will manage the facility, which promises inference speeds ten times faster than current GPUs when it launches in July 2025.

Cerebras Systems, a U.S.-based company, specializes in developing AI chips with a unique approach: using entire wafers as single chips, called "Wafer Scale Engines." The WSE-3 represents their third generation of this technology.

The system is currently used at Argonne National Laboratory, Pittsburgh Supercomputing Center, and GlaxoSmithKline. However, it has limitations: it doesn't support native CUDA (Nvidia's standard) and offers less server compatibility than Nvidia solutions.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • AI chip manufacturer Cerebras Systems is building data centers in North America and Europe to provide high-speed inference capabilities, with the ability to process up to 40 million Llama-70B tokens per second.
  • The data centers will utilize unique wafer-scale chips, where an entire silicon wafer serves as a single chip. The planned facility in Oklahoma City alone will house approximately 300 of these systems.
  • Early adopters of this new infrastructure include AI startup Mistral, search engine Perplexity, as well as HuggingFace and AlphaSense. The first locations are slated to become operational by mid-2025.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.