NVIDIA’s new Blackwell AI servers reportedly ran into issues last year with overheating and an architectural flaw, and it seems these issues haven’t gone away, leaving big customers (paying big bucks) stranded, moving back to Hopper AI servers.
VIEW GALLERY – 2 IMAGES
In a new report from The Information, we’re learning that the first significant shipments of NVIDIA’s new GB200 AI servers have big customers experiencing overheating, and glitching issues, with the big problem being the “way chips connect”. Big customers like Amazon, Google, Meta, and Microsoft have reportedly cut down their orders because of the issues.
Back in October 2024, NVIDIA CEO Jensen Huang said “we had a design flaw in Blackwell” noting that it was “100% NVIDIA’s fault” and not anything to do with the rumored issues with TSMC’s new CoWoS advanced packaging. A few months later in December 2024, we reported that NVIDIA GB200 AI server mass production and its peak shipments could be delayed until Q2 or even Q3 2025… and here we are with more issues.
It seems that cloud service providers (CSPs) are now delaying the move to Blackwell-based GB200 AI servers, and back to the solid Hopper AI GPU servers… I’m sure this story will continue to build, as more comments (hopefully from NVIDIA soon) pile on.