Nvidia Blackwell GPU can run GPT-4 level models up to 30x faster

Nvidia's GTC 2024 is all about generative AI - and the hardware that makes the current boom possible. With the next generation of Blackwell Nvidia plans to set new standards.

According to Nvidia CEO Jensen Huang, Blackwell will be the driving force behind the new industrial revolution. The platform promises to enable generative AI with large language models of up to several trillion parameters.

According to Nvidia, the architecture includes the world's most powerful chip with 208 billion transistors. Specifically, Blackwell combines two dies manufactured using TSMC's 4NP process with a connection speed of 10TB/second, allowing them to operate as a single CUDA GPU. In addition, Blackwell includes a second-generation Transformer Engine that enables AI applications with FP4 accuracy, enhanced NVLink communication technology for data exchange between up to 576 GPUs, and a new RAS Engine that enables AI predictive maintenance, among other features. A dedicated decompression engine is also designed to accelerate database queries.

Blackwell combines two dies into one GPU. | Image: Nvidia

Blackwell GPU will provide an AI computing power of 10 petaFLOPS in FP8 and 20 petaFLOPS in FP4. When using the new Transformer Engine with so-called "Micro Tensor Scaling", twice the computing power, twice the model size, and twice the bandwidth are possible. The chip is also equipped with 192 gigabytes of HBM3e memory.

Compared to the H100 GPU, Blackwell is said to offer four times the training performance, up to 25 times the power efficiency, and up to 30 times the inference performance. The latter is a clear sign that Nvidia is taking on the competition from inference-focused chips that are currently trying to steal market share from the top dog. However, this performance is only achieved with so-called mixture-of-expert models such as GPT-4; with classical large transformation models such as GPT-3, the leap is 7x. However, MoE models are becoming more and more important, Google's Gemini is also based on this principle. This significant leap is made possible by the new NVLink and NVLink Switch 7.2, which enable more efficient communication between GPUs - previously a bottleneck in MoE models.

Nvidia expects the Blackwell platform to be used by nearly all major cloud providers and server manufacturers. Companies such as Amazon Web Services, Google, Meta, Microsoft, and OpenAI will be among the first to deploy Blackwell.

New DGX SuperPOD with 11.5 ExaFLOPs

The new generation also includes a new version of the DGX SuperPOD. The DGX SuperPOD features a new, highly efficient, liquid-cooled, rack-scale architecture and delivers 11.5 ExaFLOPS of FP4-precision AI supercomputing performance and 240 terabytes of fast memory. The system can scale to tens of thousands of chips with additional racks.

At the heart of the SuperPOD is a GB200 NVL72, which connects 36 Nvidia GB200 supercomputer chips, each with 36 Grace CPUs and 72 Blackwell GPUs via Nvidia's fifth-generation NVLink to form a supercomputer. According to Nvidia, the GB200 Super Chips deliver up to 30 times the performance of the same number of Nvidia H100 Tensor Core GPUs for inference workloads with large language models.

One DGX GB200 NVL72 is - thanks to the new NVLink chip - basically "on giant GPU", Huang said. It delivers 720 PetaFLOPS for training FP8, and 1,44 ExaFLOPS for inference in FP4.

Recommendation

AI in practice

Is OpenAI's brain drain a sign of AI winter or just bad management?

Nvidia also introduced the DGX B200 system, a platform for AI model training, tuning, and inference. The DGX B200 is the sixth generation of the air-cooled DGX design and connects eight B200 Tensor Core GPUs to CPUs. Both systems will be available later this year.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Nvidia Blackwell GPU can run GPT-4 level models up to 30x faster

New DGX SuperPOD with 11.5 ExaFLOPs

Is OpenAI's brain drain a sign of AI winter or just bad management?

Nvidia can resume exports of its H20 AI chip to China after a US policy reversal

Malaysia tightens grip on US AI chip shipments

Bloomberg: China’s AI expansion in Xinjiang relies on Nvidia chips despite U.S. export controls

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Nvidia Blackwell GPU can run GPT-4 level models up to 30x faster

New DGX SuperPOD with 11.5 ExaFLOPs

Share

Bank details