A five-month investigation by SemiAnalysis reveals that AMD's new MI300X AI chips fall short of their potential due to major software problems, leaving Nvidia's market dominance unchallenged.
The research found that AMD's software is plagued with bugs that make training AI models nearly impossible without significant debugging. While AMD struggles with quality assurance and ease of use, Nvidia keeps widening the gap by rolling out new features, libraries, and performance updates.
The analysts ran extensive tests, including GEMM benchmarks and single-node training, only to find that AMD can't overcome what they call the "CUDA moat" - Nvidia's strong software advantage.
On paper, the MI300X looks impressive, offering 1,307 TeraFLOPS in FP16 calculations and 192 GB of HBM3 memory. This compares to Nvidia's H100 with 989 TeraFLOPS and 80 GB memory, though Nvidia's newer H200 closes this memory gap with its 141 GB configuration. AMD systems also offer lower total ownership costs thanks to cheaper prices and more affordable Ethernet networks.
Hardware advantages overshadowed by software problems
However, these advantages mean little in practice. According to SemiAnalysis, comparing these specs is like "comparing cameras by merely examining megapixel count" - suggesting that AMD is merely playing a numbers game without delivering enough real-world performance.
The analysts had to work directly with AMD engineers to fix numerous bugs just to get usable benchmark results. In contrast, Nvidia's systems worked smoothly right out of the box.
"AMD's Out of the Box Experience is very difficult to work with and can require considerable patience and elbow grease to move towards a usable state," they write.
In a particularly telling detail, SemiAnalysis revealed that Tensorwave, AMD's largest GPU cloud provider, had to give AMD's own team free access to GPUs—the same hardware Tensorwave had purchased from AMD—just to fix software issues.
SemiAnalysis recommends that AMD CEO Lisa Su invest heavily in software development and testing. Specifically, they suggest allocating thousands of MI300X chips for automated testing - following Nvidia's approach - and simplifying the complex environment variables while implementing better default settings. "Make the out-of-the-box experience usable!" they write.
While SemiAnalysis wants to see AMD succeed as a competitor to Nvidia, they say "unfortunately, there is still much work to be done." Without major improvements to its software, AMD risks falling further behind as Nvidia prepares to launch its next-generation Blackwell chips, though reports suggest Nvidia's next-gen rollout isn't going entirely smoothly either.