GTC 2026: With Groq 3 LPX, Nvidia adds dedicated inference hardware to its platform for the first time

Mar 17, 2026

Nvidia

Key Points

Nvidia fleshed out the Vera Rubin platform at GTC 2026: The POD comprises 40 racks with 1,152 Rubin GPUs and 60 exaflops of compute. The central NVL72 rack is expected to deliver 4x training performance and 10x inference performance per watt compared to Blackwell. With the Groq 3 LPX rack, Nvidia is also introducing a dedicated low-latency inference pipeline.
Nvidia is founding the Nemotron Coalition with partners like Mistral AI, Perplexity, and Cursor to develop open frontier models - while tying model builders more closely to its own infrastructure. NemoClaw is a security stack for AI agents that CEO Jensen Huang compared to Linux and Kubernetes in terms of importance.
DLSS 5 uses AI to add photorealistic lighting to game scenes and is set to ship for the RTX 50 series in fall 2026. Digital Foundry called the initial results "astonishing," but the gaming community pushed back - many users see the altered faces as a generic AI filter that destroys developers' artistic intent.

At GTC 2026, Nvidia expanded the Vera Rubin platform it introduced at CES with custom CPU racks, dedicated inference chips, a new storage architecture, an inference operating system, open model alliances, and agent security software.

Nvidia introduced the Vera Rubin platform at CES 2026 back in January. At GTC 2026, the company significantly expanded that framework. The platform now includes seven chips and five rack types, grouped into what Nvidia calls the Vera Rubin POD: 40 racks, 1.2 quadrillion transistors, nearly 20,000 Nvidia dies, 1,152 Rubin GPUs, 60 exaflops of compute, and 10 petabytes per second of scale-up bandwidth.

NVL72 rack serves as the POD's central compute engine

The Vera Rubin NVL72 rack is the core compute unit. It integrates 72 Rubin GPUs, 36 Vera CPUs, ConnectX-9 SuperNICs, and BlueField-4 DPUs across 18 compute trays and 9 NVLink switch trays. All told, Nvidia says the single 19-inch-wide rack packs 1.3 million individual components and roughly 1,300 chips, weighing about 4,000 pounds.

Nvidia claims up to 4x training performance and 10x inference performance per watt compared to Blackwell. The sixth-generation NVLink delivers 3.6 terabytes per second of bandwidth per GPU and 260 terabytes per second across the full rack. The backbone consists of four modular copper cable cartridges holding 5,000 copper cables that span more than two miles.

One of the bigger improvements, according to CEO Jensen Huang, is assembly. The compute trays are completely cable-free, hose-free, and fanless. A PCB midplane replaces traditional cabling, which Nvidia says cuts assembly time per tray from nearly two hours down to five minutes.

Rubin Ultra scales to 576 and 1,152 GPUs

Above the NVL72, Nvidia introduces two additional scaling tiers. Vera Rubin Ultra NVL576 uses a new two-layer all-to-all NVLink topology that connects eight NVL72 racks - each with 72 Rubin Ultra GPUs - into a single 576-GPU NVLink domain via copper and direct optical connections. Nvidia has already built a working prototype called Polyphe based on the older GB200 architecture.

Beyond that, Nvidia announced the Kyber rack, which doubles the NVLink domain per rack to 144 GPUs. Instead of horizontal server trays, the design uses vertical layers: compute hardware with four Rubin Ultra GPUs and two Vera CPUs at the front, a midplane behind it, and an NVLink backplane at the rear. The cable-free design is meant to cut installation time significantly. Eight Kyber racks together form the NVL1152 with 1,152 GPUs. Nvidia describes Kyber as the foundation for its next-next-generation architecture, Feynman. This gives Rubin Ultra three scale-up options: NVL72, NVL144, and NVL576.

A single Rubin Ultra reportedly delivers 100 petaflops in the FP4 data format. The GPU consists of four compute dies rather than two, each exceeding 800 square millimeters, paired with 16 HBM4e memory stacks totaling one terabyte of capacity. A complete NVL144 Kyber system reaches 15 FP4 exaflops, according to Nvidia.

Dedicated CPU racks address a blind spot in agentic AI

The new Vera CPU rack houses 256 liquid-cooled Vera processors alongside 64 BlueField-4 DPUs, more than 22,500 cores, and 400 terabytes of memory. Nvidia says a single rack can sustain over 22,500 concurrent reinforcement learning or agent sandbox environments. The reasoning behind it points to a problem that was easy to overlook during the GPU-centric focus of recent years: agentic AI systems don't run exclusively on GPUs. Tool calling, SQL queries, compilation, and sandbox execution still require CPUs.

The Vera processor itself features 88 custom Olympus Arm cores, LPDDR5X memory with up to 1.2 terabytes per second of bandwidth, and NVLink C2C for direct connection to Rubin GPUs, according to Nvidia's CPU announcement.

Groq 3 LPX gives Nvidia a dedicated inference pipeline

One of the more interesting announcements is a direct result of Nvidia's quasi-acquisition of Groq: with Groq 3 LPX, Nvidia introduces a dedicated inference pipeline for the first time. The rack contains 32 compute trays with eight LPUs each, connected via a direct chip-to-chip spine consisting of thousands of paired copper connections. Multiple LPX racks can operate as a single inference engine.

The LPUs are designed for low-latency token generation at lower operating costs. This type of specialized hardware has spawned several startups in recent years, including Cerebras, which has a deal with OpenAI among others. With Groq 3 LPX, customers can now buy comparable hardware directly from Nvidia, letting the company leverage its platform advantage.

Combined with the NVL72, the system reportedly delivers up to 35x more tokens and 10x more revenue opportunity for trillion-parameter models compared to Blackwell. Availability is planned for the second half of the year.

CMX storage, inference OS, and Spectrum 6 networking round out the stack

The new CMX platform based on BlueField-4 STX offloads the KV cache into a dedicated high-bandwidth storage layer. The KV cache is a buffer where a language model stores intermediate computation results from a conversation so it doesn't have to recalculate them from scratch with every new token. The longer a conversation or agent chain gets, the more memory this cache eats up.

According to the technical blog, CMX treats this temporary inference context as a reusable AI-native data type that can be shared across individual conversation turns, entire sessions, and different agents. Nvidia claims 5x higher token throughput and 5x better power efficiency compared to conventional storage approaches.

On top of that sits Dynamo 1.0, an open-source inference operating system designed to distribute GPU and memory resources across the cluster. Nvidia integrates it into frameworks like LangChain, SGLang, and vLLM. According to Nvidia, Dynamo is already supported by AWS, Azure, Google Cloud, Oracle, CoreWeave, Together AI, Nebius, Cursor, Perplexity, and Pinterest.

The Spectrum-6 SPX networking racks tie the entire POD together into a single supercomputer. The new Spectrum-6 switch delivers 102.4 terabits per second across 512 lanes at 200 gigabits per second using co-packaged optics integrated directly into the chip. Nvidia replaces conventional pluggable transceivers with integrated silicon photonics, which should deliver higher energy efficiency and lower latency.

MGX rack architecture handles energy management from chip to grid

The third-generation MGX rack architecture forms the mechanical foundation for all five rack types. According to Nvidia, NVL and ETL racks share the same physical infrastructure: enclosures, trays, cable cartridges, liquid cooling manifolds, busbars, and more. All racks are designed for 45 degrees Celsius warm-water inlet temperature and are 100 percent liquid-cooled.

New is what Nvidia calls Intelligent Power Smoothing: capacitors with 6x more energy storage than the previous generation (400 joules per GPU) smooth out load spikes, reducing peak current demands by up to 25 percent. Dynamic Max-Q lets data centers dynamically allocate power per rack depending on the workload, which Nvidia says enables up to 30 percent more GPUs within the same power budget.

Nvidia has contributed the GB200 NVL72 design to the Open Compute Project. More than 80 partners form the ecosystem for manufacturing and distributing the rack systems, according to the company.

Digital twins let operators plan and run AI factories before hardware ships

With the DSX reference design and the DSX Air system, Nvidia extends its reach to the planning and operation of entire facilities. DSX bundles compute, networking, storage, power, and cooling into a blueprint for AI factories. DSX Air turns that blueprint into a digital twin - a complete simulation of the environment before any hardware is delivered.

Companies like CoreWeave, Siam.AI, and Hydra Host are already using these simulations to reduce the time to first production token, according to Nvidia.

Nemotron Coalition and NemoClaw bring open models with built-in guardrails

On the model side, Nvidia is founding the Nemotron Coalition, an alliance of Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam, and Thinking Machines Lab. The goal is to jointly develop open frontier models that are freely available rather than locked behind proprietary interfaces. Nvidia is providing DGX Cloud compute for the effort but doesn't disclose how much. The first model will be developed jointly by Mistral AI and Nvidia and will later underpin the Nemotron 4 family.

Officially, this is about open models. In practice, Nvidia ties model builders more closely to its own infrastructure. That logic continues in the expansion of open model families: Nvidia is expanding Nemotron for agentic systems, extending the model portfolio for robotics and autonomous vehicles with Cosmos and Isaac GR00T, and pushing into biotech and drug discovery with BioNeMo, Proteina Complexa, and nvQSP.

To make sure these models run securely in production agent applications, Nvidia is introducing NemoClaw, its own software stack. Huang presented the platform during his GTC keynote and compared OpenClaw to earlier infrastructure standards: "OpenClaw gave us, gave the industry exactly what it needed at exactly the time. Just as Linux gave the industry exactly what it needed at exactly the time, just as Kubernetes showed up at exactly the right time, just as HTML showed up." Every company needs an OpenClaw strategy today, Huang said: "For the CEOs, the question is, what's your OpenClaw strategy?"

NemoClaw isn't an Nvidia clone of the open-source agent framework OpenClaw. Instead, it's a variant of OpenClaw with guardrails. OpenClaw provides the basic building blocks for AI agents: a runtime, memory, and reusable skills. NemoClaw adds a security and privacy layer via Nvidia's Agent Toolkit and OpenShell that controls which actions an agent can perform and which data it can access. Nvidia developed NemoClaw together with OpenClaw creator Peter Steinberger.

Nvidia itself describes NemoClaw as an early alpha release: "Expect rough edges. We are building toward production-ready sandbox orchestration, but the starting point is getting your own environment up and running."

Adobe partnership and space modules expand Nvidia's reach

Nvidia also announced a partnership with Adobe at GTC. According to Nvidia, Adobe plans to integrate Firefly, Firefly Foundry, Acrobat, Frame.io, and a new cloud-native 3D digital twin solution for marketing with CUDA X, NeMo, Cosmos, Agent Toolkit, and Omniverse.

On top of that, Nvidia unveiled compute modules for space. A Space-1 Vera Rubin module, together with IGX Thor and Jetson Orin, is designed to bring AI processing into orbit. Listed users include Aetherflux, Axiom Space, Kepler, Planet, Sophia Space, and Starcloud. The Rubin module reportedly delivers up to 25x more AI compute for space-based inference than an H100, according to Nvidia.

DLSS 5 promises photorealistic lighting but faces pushback from gamers

Nvidia also announced DLSS 5 at GTC, a neural rendering technique set to arrive in fall 2026 for the RTX 50 series. Unlike previous DLSS versions, this isn't about upscaling or frame generation. Instead, it's an AI-powered lighting layer designed to enhance scenes with photorealistic light, shadows, and material behavior. Nvidia calls it the biggest leap in graphics since real-time ray tracing. The technology was developed over three years, according to the company.

According to Digital Foundry's hands-on report, DLSS 5 uses only color information and motion vectors from the game engine. The AI network semantically recognizes different surfaces - skin, hair, water, metal - and processes each one differently. Geometry, textures, and materials remain unchanged, according to Nvidia. Digital Foundry tested the technology in titles including Resident Evil Requiem, Hogwarts Legacy, Assassin's Creed Shadows, Oblivion Remastered, and Starfield, describing the results for environments, materials, and foliage as "astonishing."

There's a catch, though: the demo was still running on two RTX 5090s - one GPU played the game while the second exclusively ran DLSS 5. Nvidia says the final version will run on a single GPU but acknowledges that significant optimization and VRAM work remains. Digital Foundry already spotted some visual errors and describes the current state as a "snapshot."

The gaming community's reaction was considerably more negative than Nvidia's own assessment. Numerous users describe the altered faces as a generic AI filter that destroys the artistic intent of game developers.

Digital Foundry itself acknowledges the open question of whether Nvidia's interpretation of photorealism is actually what gamers and developers want. Nvidia points out that developers will get customization options and that the feature remains optional. Feedback from participating studios has been positive, according to the company.

We already reported yesterday on all the news regarding physical AI at GTC 2026.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.