NVIDIA Blackwell AI servers: overheating, architecture flaws see companies cutting orders down

TL;DR: NVIDIA’s Blackwell AI servers face ongoing issues with overheating and architectural flaws, causing major customers like Amazon, Google, Meta, and Microsoft to reduce orders and revert to Hopper AI servers. CEO Jensen Huang acknowledged a design flaw, delaying mass production of GB200 AI servers until possibly Q2 or Q3 2025.

NVIDIA’s new Blackwell AI servers reportedly ran into issues last year with overheating and an architectural flaw, and it seems these issues haven’t gone away, leaving big customers (paying big bucks) stranded, moving back to Hopper AI servers.

NVIDIA Blackwell AI servers: overheating, architecture flaws see companies cutting orders down 49

VIEW GALLERY – 2 IMAGES

In a new report from The Information, we’re learning that the first significant shipments of NVIDIA’s new GB200 AI servers have big customers experiencing overheating, and glitching issues, with the big problem being the “way chips connect”. Big customers like Amazon, Google, Meta, and Microsoft have reportedly cut down their orders because of the issues.

Back in October 2024, NVIDIA CEO Jensen Huang said “we had a design flaw in Blackwell” noting that it was “100% NVIDIA’s fault” and not anything to do with the rumored issues with TSMC’s new CoWoS advanced packaging. A few months later in December 2024, we reported that NVIDIA GB200 AI server mass production and its peak shipments could be delayed until Q2 or even Q3 2025… and here we are with more issues.

It seems that cloud service providers (CSPs) are now delaying the move to Blackwell-based GB200 AI servers, and back to the solid Hopper AI GPU servers… I’m sure this story will continue to build, as more comments (hopefully from NVIDIA soon) pile on.