MLPerf: Nvidia claims leadership in performance and versatility

Nvidia leads this year's MLPerf inference benchmark. New data shows performance leaps with Hopper and new hardware.

In the MLPerf benchmark, hardware vendors and service providers compete with their AI systems. The test is hosted by MLCommons and aims to transparently compare different chip architectures and system variants.

Today MLPerf released new results of the MLPerf Inference 3.0 benchmark. A new feature is a network environment that tests the AI performance of different systems under more realistic conditions: Data is streamed to an inference server. The test is designed to more accurately reflect how data enters the AI accelerator and is output in the real world, thus revealing bottlenecks in the network.

Nvidia Hopper makes significant year-over-year gains

According to Nvidia, the H100 Tensor Core GPUs in the DGX H100 systems have up to 54 percent more inference performance than last year due to software optimizations. This jump is seen in RetinaNet inference, with other models such as 99% accurate BERT running 12% faster, ResNet-50 running 13% faster, and 3D U-Net used in medical applications running 31% faster.

Nvidia's H100 delivers significantly higher performance in some cases this year due to software optimizations. | Image: Nvidia

In presenting the results, Nvidia emphasized that they see themselves as the clear leader in performance, but also as the equally important leader in the versatility of their architecture. Nvidia is the only company to present results for all tasks in MLPerf Inference 3.0.

A comparison with some accelerators that participated in MLPerf 3.0. | Image: Nvidia

Compared to an A100 GPU, the H100 GPU is also significantly stronger at inferencing transformer models, such as BERT 99.9, thanks to the Transformer engine, where the H100 delivers more than four times the performance.

As a result, the card promises to deliver big performance gains for many generative AI models, such as those that generate text, images, or 3D models.

New L4 card up to 3 times faster than predecessor

New to the benchmark is Nvidia's L4 Tensor GPU, which the company recently unveiled at GTC. The card is already available from some cloud providers and delivers 2.2 to 3.1 times the inference performance of its predecessor, the T4, in the benchmarks.

Nvidia's L4 is significantly faster than the predecessor T4. | Image: Nvidia

Nvidia's Jetson AGX Orin for the Edge is also up to 63 percent more energy efficient and up to 81 percent more powerful than last year thanks to numerous improvements.

Recommendation

AI in practice

OpenAI's new Realtime API lets developers add realistic conversations to their apps

In the newly added network test, Nvidia's DGX A100 systems delivered 96 percent of maximum local performance on the BERT model, which is the performance the system delivers when the model is running locally. According to Nvidia, the reason for the slight performance drop is due to CPU latency. In the RestNet 50 test, which is run exclusively on GPUs, the DGX systems achieved 100 percent of local performance.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

MLPerf: Nvidia claims leadership in performance and versatility

Nvidia Hopper makes significant year-over-year gains

New L4 card up to 3 times faster than predecessor

OpenAI's new Realtime API lets developers add realistic conversations to their apps

GTC '25: Nvidia showcases Blackwell Ultra, DGX Spark, RTX Pro, Dynamo and reasoning models

Nvidia aims for 'ImageNet-like revolution' in AI-generated hardware design

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

MLPerf: Nvidia claims leadership in performance and versatility

Nvidia Hopper makes significant year-over-year gains

New L4 card up to 3 times faster than predecessor

OpenAI's new Realtime API lets developers add realistic conversations to their apps

GTC '25: Nvidia showcases Blackwell Ultra, DGX Spark, RTX Pro, Dynamo and reasoning models

Nvidia aims for 'ImageNet-like revolution' in AI-generated hardware design