We’re typically hardware-focused around these parts—it’s in the name, after all—so we’re not going to go over the software in any great detail, but the general idea behind Intel’s software approach is that its customers can buy whatever Intel hardware suits their needs and the intermediate layers will take care of everything else. But what AI hardware does Intel offer? Primarily, Gaudi AI accelerators and Xeon processors with AI-focused instruction set extensions.
Intel Announces Powerful, New Gaudi 3 AI Accelerator
Intel has new hardware coming to both of those categories soon, and it announced them at its Vision event. Gaudi 3 is naturally the successor to the Gaudi 2 AI accelerator, and Intel says that the upcoming 5nm package will offer double the FP8 performance, quadruple the BF16 performance, double the network bandwidth, and a 50% uplift in memory bandwidth compared to Gaudi 2.
In terms of overall specifications, this part is optimized for both handling very large models and also for the ability to scale out to massive proportions. Its architectural features and large 96MB SRAM cache give it an advantage in working with very large AI models, and Gaudi 3’s twenty-four on-package 200-Gigabit Ethernet connections allow it to scale, theoretically, all the way to 1,024 nodes in a single cluster. That’d be some 8,192 Gaudi 3 chips, with theoretical FP8 compute throughput somewhere in the 15-exaflop range, according to Intel.
Intel Gaudi 3 Performance Expectations
Intel compares Gaudi 3 directly against Intel’s Hopper H100 GPU training a variety of models. In all of the example cases, Gaudi 3 comes out with a considerable performance advantage. Of course, Intel isn’t going to show us numbers that don’t make Gaudi 3 look anything less than fantastic, right?
Well, in inferencing, Gaudi 3 looks less dominant, although performance is certainly competitive and impressive. It’s really the big Falcon-180B model where Intel’s new part can stretch its legs and run away from NVIDIA’s GPU, with up to a 4x improvement in performance—although we have to stress that these numbers are all projections.
A big point in favor of Gaudi 3 that Intel has been emphasizing to the press is its relative efficiency. Despite being built on a 5-nm process in comparison to Nvidia’s H100 (which is fabricated on TSMC’s 4N), Intel claims that Gaudi 3 can turn in up to 2.3x the power efficiency compared to NVIDIA’s part. A significant part of that efficiency gain is no doubt down to the extremely large 96 MB on-chip cache.
Gaudi 3 is coming to mezzanine (OAM) and PCIe add-in card formats as well as a universal baseboard form factor intended for use in big clusters. Intel says that developers may be able to migrate their models to run on Gaudi in as little as three lines of code; Stability AI says it took “less than a day” to move over to Intel’s hardware. These parts will be sampling to customers very soon, while mass production happens in the back half of the year.
Intel Xeon 6 Product Family Announced
But what about the new Xeons we mentioned? Well, it turns out that Intel is wrapping up its Sierra Forest Xeons—sporting up to 288 Crestmont e-cores—into the same product family with its upcoming Granite Rapids CPUs. Those parts are more traditional Xeon CPUs, with lots of big and fast P-cores. The entirety of both processor families is collectively being branded “Xeon 6” by Intel.
Intel shared few details about either processor family at the moment, although the chipmaker did offer the slide above promising a yearly power reduction of over one megawatt in the case that someone replaces some old 2nd-gen Xeons with brand spanking new Sierra Forest parts. We reckon there are a few more details on Sierra Forest because those chips are expected to launch any day now, while Granite Rapids will come “shortly after“, according to Intel.