Google's AI Hypercomputer gets a major upgrade with TPU v5p and Nvidia Blackwell integration

Apr 9, 2024 Maximilian Schreiner

At its annual Next '24 developer conference, Google unveiled further developments to its AI Hypercomputer architecture. The focus is on new performance-optimized hardware components such as the Cloud TPU v5p and Nvidia's upcoming Blackwell GPUs.

Google Cloud is announcing a number of enhancements to its AI Hypercomputer architecture at Next '24, including general availability of the TPU v5p and integration with the latest Nvidia Blackwell platform. These innovations are designed to accelerate the training and deployment of sophisticated AI models.

The Cloud TPU v5p is now generally available and is the most powerful and scalable TPU generation to date, according to Google. A single TPU v5p pod contains 8,960 connected chips - more than twice as many as a TPU v4 pod. In addition, the TPU v5p offers more than twice as many chip-level FLOPS and three times as much high-speed memory as the previous generation.

This makes the TPU v5p ideal for training large AI models. To facilitate this, the Google Kubernetes Engine (GKE) now fully supports TPU v5p clusters and multi-host serving. According to Google, the latter makes it possible to manage and monitor a group of model servers distributed across multiple hosts as a single logical unit.

Google launches Blackwell instances in early 2025

Google Cloud is also expanding its GPU offerings. A new Nvidia H100 GPU-based A3 Mega instance will be generally available next month. It will offer twice the GPU-to-GPU network bandwidth of the A3.

Google also announced that the new Nvidia Blackwell platform will be integrated into its AI Hypercomputer architecture in two configurations. Google Cloud customers will have access to Nvidia's HGX B200 and GB200 NVL72 systems in the spring of 2025. The new HGX B200 systems are designed for today's most demanding AI, data analytics, and HPC workloads. Taking it a step further, the liquid-cooled GB200 NVL72 systems are designed for real-time language model inference and trillion-parameter model training.

Sources:

Google 1, Google 2