Content
newsletter Newsletter

Nvidia leads this year's MLPerf inference benchmark. New data shows performance leaps with Hopper and new hardware.

Ad

In the MLPerf benchmark, hardware vendors and service providers compete with their AI systems. The test is hosted by MLCommons and aims to transparently compare different chip architectures and system variants.

Today MLPerf released new results of the MLPerf Inference 3.0 benchmark. A new feature is a network environment that tests the AI performance of different systems under more realistic conditions: Data is streamed to an inference server. The test is designed to more accurately reflect how data enters the AI accelerator and is output in the real world, thus revealing bottlenecks in the network.

Nvidia Hopper makes significant year-over-year gains

According to Nvidia, the H100 Tensor Core GPUs in the DGX H100 systems have up to 54 percent more inference performance than last year due to software optimizations. This jump is seen in RetinaNet inference, with other models such as 99% accurate BERT running 12% faster, ResNet-50 running 13% faster, and 3D U-Net used in medical applications running 31% faster.

Ad
Ad
Nvidia's H100 delivers significantly higher performance in some cases this year due to software optimizations. | Image: Nvidia

In presenting the results, Nvidia emphasized that they see themselves as the clear leader in performance, but also as the equally important leader in the versatility of their architecture. Nvidia is the only company to present results for all tasks in MLPerf Inference 3.0.

A comparison with some accelerators that participated in MLPerf 3.0. | Image: Nvidia

Compared to an A100 GPU, the H100 GPU is also significantly stronger at inferencing transformer models, such as BERT 99.9, thanks to the Transformer engine, where the H100 delivers more than four times the performance.

As a result, the card promises to deliver big performance gains for many generative AI models, such as those that generate text, images, or 3D models.

New L4 card up to 3 times faster than predecessor

New to the benchmark is Nvidia's L4 Tensor GPU, which the company recently unveiled at GTC. The card is already available from some cloud providers and delivers 2.2 to 3.1 times the inference performance of its predecessor, the T4, in the benchmarks.

Nvidia's L4 is significantly faster than the predecessor T4. | Image: Nvidia

Nvidia's Jetson AGX Orin for the Edge is also up to 63 percent more energy efficient and up to 81 percent more powerful than last year thanks to numerous improvements.

Recommendation

In the newly added network test, Nvidia's DGX A100 systems delivered 96 percent of maximum local performance on the BERT model, which is the performance the system delivers when the model is running locally. According to Nvidia, the reason for the slight performance drop is due to CPU latency. In the RestNet 50 test, which is run exclusively on GPUs, the DGX systems achieved 100 percent of local performance.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.