NVIDIA Lays Out Two-Year AI Roadmap With Beast Rubin GPU, Vera CPU And NVL576

hero jensen csp racks
Building out the datacenters—or as NVIDIA calls them, “AI factories”—required for training and operating state-of-the-art AI models is extremely expensive. So much so that even hyperscalers like Google and Microsoft are going to have to plan these purchases well in advance. To that end, NVIDIA offered a sneak peek of its upcoming products at today’s GTC opening keynote, and you might want to sit down for this one.

The company’s extant Blackwell processors are most optimally deployed in the form of “NVL72” racks. These are full racks that are fully integrated using high-speed NVLink interconnects such that the entire rack can be treated as a single GPU. The “72” in the name comes from the number of individual Blackwell GPUs in the rack. Well, NVIDIA’s next-gen parts are going to be called Vera Rubin NVL144. That means twice as many GPUs, right?
vera rubin 144

No, actually. Jensen apologized on stage, but the company is actually changing the way it counts this figure with the next generation. With surprising transparency, he explained that Blackwell is in fact two silicon GPU dice on a single package. Rubin is too, but for the next generation they’re going to start counting the “NVL” number by the number of individual GPU dice. The amount of silicon hasn’t really changed much, but Rubin apparently still offers a 3.3x performance upgrade over Blackwell Ultra.

rubin ultra 576

That doesn’t make the next step after Vera Rubin NVL144 any less impressive, though. After Vera Rubin in the second half of next year, NVIDIA will apparently be bringing out Rubin Ultra NVL576 in the second half of 2027. Even considering that Rubin Ultra combines four reticle-sized GPUs on a single package, that’s still twice as many GPUs in a single rack as with Vera or Blackwell.

The specifications are in the slides, but the short version is that Rubin NVL144 will apparently more than triple the just-announced Blackwell Ultra GB300 in terms of AI inference performance, reaching 3.6 exaFLOPS in a single rack. It’ll have 13 TB/s of “scale up, not scale out” HBM4 memory bandwidth—because remember, the whole rack functions like one GPU—, and 28.8 TB/s of off-rack bandwidth thanks to ConnectX9 InfiniBand networking.
nvidia rubin system

Meanwhile, Rubin Ultra will purportedly bring a full 100 petaFLOPS of FP4 compute on a single GPU package along with 1TB of HBM4e memory. NVIDIA claims that a Rubin Ultra NVL576 rack will offer a dizzying 14 times the already absurd performance of a Blackwell Ultra GB300 NVL72 rack. It will also boast 144 TB of HBM4e memory, if our math is correct—a truly absurd number.

Oh, and “Vera” is the successor to Grace, NVIDIA’s Arm-based datacenter CPU. Each Blackwell Superchip sports a single Grace CPU to drive a pair of Blackwell GPUs; it looks like Rubin will return to a 1:1 ratio, at least for Rubin Ultra. Vera sports 88 “Custom Arm Cores”, a departure from Grace that offered 72 Arm Neoverse-V2 CPU cores. These cores sport two-way simultaneous multi-threading, apparently, as NVIDIA says they support 176 threads.
economics

Jensen Huang said that Rubin Ultra NVL576 draws six hundred kilowatts and has 2.5 million parts. Pricing? If you have to ask, you can’t afford it. These new machines will be available at the end of next year (Vera Rubin NVL144) and the year after that (Vera Rubin Ultra NVL576).