AMD’s MI350 chips deliver big on memory but lag in networking against Nvidia

Jun 17, 2025

GPT-1-image prompted by THE DECODER

Key Points

AMD has introduced the Instinct MI350X and MI355X accelerators, aiming to challenge Nvidia in the AI chip market with increased memory, new data formats, and improved performance, especially for small to mid-size AI workloads.
SemiAnalysis finds that while the MI355X offers competitive total cost of ownership and some technical advantages over Nvidia's B200, it falls behind for very large models and complex tasks, mainly due to slower multi-chip communication and less effective rack-scale integration.
Interest in AMD's chips is growing among cloud providers like AWS, Meta, Oracle, and Microsoft, but software compatibility and ecosystem maturity remain concerns; AMD plans a true rack-scale MI400 series for 2026, with broader MI350 availability expected in Q3 2025.

AMD is aiming to challenge Nvidia's dominance in the AI chip market with its new Instinct MI350 series accelerators. The company hopes the chips will offer advantages in certain workloads and lower total costs, but software remains a sticking point.

AMD recently unveiled two new AI chips, the Instinct MI350X and MI355X, based on its latest CDNA 4 architecture and built on TSMC's 3-nanometer process. Each chip contains up to 185 billion transistors and supports new data formats like FP4 and FP6. Both models come with 288 gigabytes of HBM3E memory, which is crucial for AI applications.

The air-cooled MI350X draws 1,000 watts, while the MI355X uses 1,400 watts and can be cooled by air or liquid. On paper, the MI355X only offers a slightly higher TFLOPs rating than the MI350X, but SemiAnalysis expects it to deliver more than 10 percent better real-world performance.

How do AMD's new chips compare to Nvidia?

According to SemiAnalysis, the MI355X could compete with Nvidia's HGX B200 in terms of performance per total cost of ownership (TCO) for certain AI workloads, especially when running small to mid-size language models. TCO factors in not just the purchase price but also power and maintenance, where AMD claims a 33 percent advantage for self-operated systems. AMD says the MI355X offers 1.6 times more memory and 2.2 times the FP6 performance of Nvidia's B200. However, according to SemiAnalysis, when it comes to FP4 calculations, Nvidia's B300 1.3 times faster than the MI355X.

So the MI350 looks like a real competitor. But when compared to Nvidia's top-tier GB200 NVL72 system, the MI355X falls behind for very large models or training new models, according to SemiAnalysis. One reason is the smaller "world size" - only 8 MI355X chips can communicate at full speed, while Nvidia's GB200 NVL72 allows 72 chips to do so. As a result, complex workloads requiring frequent communication between many chips could run at least 18 times slower on AMD's setup, SemiAnalysis writes.

They also criticize AMD's marketing of a "128 GPU rack" with MI355X as a rack-scale solution. In reality, it consists of 16 separate servers with 8 GPUs each, rather than a tightly integrated rack-scale system.

AMD has improved the data exchange speed within an 8-GPU cluster (scale-up) over its XGMI connection to 76.8 gigabytes per second, but Nvidia's comparable systems are still 1.6 times faster.

Software, partnerships, and what's next

Software is a critical piece of AMD's AI push. With ROCm version 7, AMD has improved AI application performance, claiming an average 3.5x boost over the previous version. Support for Triton, a programming tool for AI, has also gotten better. Still, the ROCm Collective Communication Library (RCCL) - key for multi-chip collaboration - remains a weak point and is essentially a copy of Nvidia's NCCL software.

SemiAnalysis reports that AMD is working to expand its ecosystem of "neocloud" providers that rent out AMD compute power. AMD itself rents capacity from companies like AWS and Oracle. The company also offers its own "AMD Developer Cloud," with MI300X GPUs available for as little as $1.99 per hour to spur competition. At the same time, AMD is working to bring its AI engineers' salaries in line with the broader market.

Interest in the new chips is strong among major cloud platforms and AI research labs. AWS is planning to make a larger purchase of AMD GPUs for the first time. Meta, the company behind Facebook, is starting to train models on AMD hardware, and Oracle is preparing to deploy 30,000 MI355X accelerators. Microsoft is ordering smaller quantities of the MI355, but is showing interest in the upcoming MI400.

The MI400 series is expected in the second half of 2026 and is intended to be a true rack-scale solution that could compete with Nvidia's VR200 NVL144 system. It will use "UALink over Ethernet," an AMD-developed method for high-speed connections over standard Ethernet, similar to Nvidia's NVLink. SemiAnalysis is skeptical that this will match the performance of a dedicated solution. Looking further ahead, the MI500 UAL256 with 256 chips is planned for late 2027. The MI350 series is available for order now and will see broader availability starting in Q3 2025.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: SemiAnalysis