AMD is aiming to challenge Nvidia's dominance in the AI chip market with its new Instinct MI350 series accelerators. The company hopes the chips will offer advantages in certain workloads and lower total costs, but software remains a sticking point.
AMD recently unveiled two new AI chips, the Instinct MI350X and MI355X, based on its latest CDNA 4 architecture and built on TSMC's 3-nanometer process. Each chip contains up to 185 billion transistors and supports new data formats like FP4 and FP6. Both models come with 288 gigabytes of HBM3E memory, which is crucial for AI applications.
The air-cooled MI350X draws 1,000 watts, while the MI355X uses 1,400 watts and can be cooled by air or liquid. On paper, the MI355X only offers a slightly higher TFLOPs rating than the MI350X, but SemiAnalysis expects it to deliver more than 10 percent better real-world performance.
How do AMD's new chips compare to Nvidia?
According to SemiAnalysis, the MI355X could compete with Nvidia's HGX B200 in terms of performance per total cost of ownership (TCO) for certain AI workloads, especially when running small to mid-size language models. TCO factors in not just the purchase price but also power and maintenance, where AMD claims a 33 percent advantage for self-operated systems. AMD says the MI355X offers 1.6 times more memory and 2.2 times the FP6 performance of Nvidia's B200. However, according to SemiAnalysis, when it comes to FP4 calculations, Nvidia's B300 1.3 times faster than the MI355X.
So the MI350 looks like a real competitor. But when compared to Nvidia's top-tier GB200 NVL72 system, the MI355X falls behind for very large models or training new models, according to SemiAnalysis. One reason is the smaller "world size" - only 8 MI355X chips can communicate at full speed, while Nvidia's GB200 NVL72 allows 72 chips to do so. As a result, complex workloads requiring frequent communication between many chips could run at least 18 times slower on AMD's setup, SemiAnalysis writes.
They also criticize AMD's marketing of a "128 GPU rack" with MI355X as a rack-scale solution. In reality, it consists of 16 separate servers with 8 GPUs each, rather than a tightly integrated rack-scale system.
AMD has improved the data exchange speed within an 8-GPU cluster (scale-up) over its XGMI connection to 76.8 gigabytes per second, but Nvidia's comparable systems are still 1.6 times faster.
Software, partnerships, and what's next
Software is a critical piece of AMD's AI push. With ROCm version 7, AMD has improved AI application performance, claiming an average 3.5x boost over the previous version. Support for Triton, a programming tool for AI, has also gotten better. Still, the ROCm Collective Communication Library (RCCL) - key for multi-chip collaboration - remains a weak point and is essentially a copy of Nvidia's NCCL software.
SemiAnalysis reports that AMD is working to expand its ecosystem of "neocloud" providers that rent out AMD compute power. AMD itself rents capacity from companies like AWS and Oracle. The company also offers its own "AMD Developer Cloud," with MI300X GPUs available for as little as $1.99 per hour to spur competition. At the same time, AMD is working to bring its AI engineers' salaries in line with the broader market.
Interest in the new chips is strong among major cloud platforms and AI research labs. AWS is planning to make a larger purchase of AMD GPUs for the first time. Meta, the company behind Facebook, is starting to train models on AMD hardware, and Oracle is preparing to deploy 30,000 MI355X accelerators. Microsoft is ordering smaller quantities of the MI355, but is showing interest in the upcoming MI400.
The MI400 series is expected in the second half of 2026 and is intended to be a true rack-scale solution that could compete with Nvidia's VR200 NVL144 system. It will use "UALink over Ethernet," an AMD-developed method for high-speed connections over standard Ethernet, similar to Nvidia's NVLink. SemiAnalysis is skeptical that this will match the performance of a dedicated solution. Looking further ahead, the MI500 UAL256 with 256 chips is planned for late 2027. The MI350 series is available for order now and will see broader availability starting in Q3 2025.