Content
summary Summary

The latest round of the MLPerf Inference Benchmark is dominated by Nvidia's H200 GPU - but the competition is barely showing up. Instead of a benchmark to compare different available AI chips, Nvidia seems to be competing against itself.

In the latest MLPerf Inference benchmarks, Nvidia is ahead with its Hopper GPUs, especially the H200 model. The H200 has 76% more HBM3e memory and 43% more bandwidth than the H100. The benchmark suite was expanded for the first time to include tests with the large Llama 2 70B and Stable Diffusion XL models.

The memory-enhanced H200 GPUs generated up to 31,000 tokens/second in their MLPerf debut with TensorRT-LLM, a record for the MLPerf Llama 2 benchmark. Even if Nvidia's GPUs only deliver a fraction of this performance in practice, it would still be at Groq's speed level.

In the "Open Division", Nvidia also demonstrated three techniques for speeding up inference: Structured Sparsity, Pruning, and DeepCache. They are said to increase efficiency by up to 74%.

Ad
Ad

Nvidia competes with Nvidia, Intel also joins in

Nvidia was the only vendor to provide results in all tests. Intel participated with Gaudi2 and CPU results, Google only contributed a TPU v5e result. Gaudi2 did not reach the performance of Nvidia, but according to Intel it should offer a better price-performance ratio. Intel will probably try to exploit this advantage with the next generation Gaudi3. However, Gaudi3 was completely absent, as were AMD's MI300X and Cerebra's solution. Qualcomm's Cloud AI cards made an appearance, but were underwhelming.

To summarize: The MLPerf benchmark is increasingly becoming an Nvidia benchmark where the company is competing against itself. The other vendors are holding back - but still seem to find customers for their AI accelerators. Maybe the situation will change next year when Nvidia releases its new Blackwell generation and the new chips from AMD and Intel are in use.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Nvidia dominated the latest round of the MLPerf inference benchmark with its Hopper GPUs, particularly the H200, which has 76% more HBM3e memory and 43% more bandwidth than the H100.
  • The H200 GPU achieved a record of up to 31,000 tokens/second in its MLPerf debut, while Nvidia demonstrated three inference acceleration techniques in the "Open Division" that are said to increase efficiency by up to 74%.
  • Nvidia was the only vendor to deliver results in all tests, while Intel participated with Gaudi2 and CPU results, and Google contributed only a TPU v5e result. Other vendors such as AMD, Cerebras, and Qualcomm held back or failed to impress.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.