Intel announces Gaudi 3 AI accelerator: 128GB HBM2e at up to 3.7TB/sec, up to 900W power

Intel has just unveiled its next-gen Gaudi 3 AI accelerator, which features two 5nm dies made by TSMC, featuring 64 Tensor Cores (5th Generation), 128GB of HBM2e memory, and up to 900W of power on air or water-cooling.

Intel announces Gaudi 3 AI accelerator: 128GB HBM2e at up to 3.7TB/sec, up to 900W power 308

VIEW GALLERY – 8 IMAGES

Intel has 32 Tensor Cores on each chip, for a total of 64 Tensor Cores. Each chip features 48MB of SRAM, for a total of 96MB of SRAM per full package. The SRAM on the Intel Gaudi 3 AI accelerator features 12.8TB/sec of bandwidth, supported by the HBM memory on the Gaudi 3. The 128GB of HBM2e memory features up to 3.7TB/sec of memory bandwidth.

The previous-gen Intel Gaudi 2 AI accelerator featured 96GB of HBM, so the new Gaudi 3 has a bigger 128GB HBM2e capacity with up to 3.7TB/sec of memory bandwidth compared to just 2.45TB/sec of memory bandwidth on Gaudi 2.

Intel’s new Gaudi 3 AI accelerator in its PCIe form factor design (HL-388) uses the newer PCIe 5.0 interface with full PCIe 5.0 x16 lanes. In its PCIe 5.0 form, the new Intel Gaudi 3 AI accelerator has a TDP of 450W and 600W, a power level we don’t see much in this form factor. Intel also has an OAM version (HL-328/325L/335) with a TDP of between 450W and 900W for server air cooling, while the water-cooled Gaudi 3 in OAM form features up to 900W TDP.

Intel announces Gaudi 3 AI accelerator: 128GB HBM2e at up to 3.7TB/sec, up to 900W power 309

Intel compared its new Gaudi 3 AI accelerator against NVIDIA’s current-gen Hopper H100 and H200 AI GPUs, with Intel’s in-house benchmarks have the Gaudi 3 being up to 1.7x faster for AI training compared to the H100. This depends on the Large Language Model (LLM) used, with the Gaudi 3 being between 1.4x and 1.7x faster than H100.

Intel announces Gaudi 3 AI accelerator: 128GB HBM2e at up to 3.7TB/sec, up to 900W power 312

Intel announces Gaudi 3 AI accelerator: 128GB HBM2e at up to 3.7TB/sec, up to 900W power 311Intel announces Gaudi 3 AI accelerator: 128GB HBM2e at up to 3.7TB/sec, up to 900W power 310

In inference speed, performance will vary greatly: Intel’s new Gaudi 3 AI accelerator is lower at times, up to 10% slower than the NVIDIA H100 AI GPU, while at times the Gaudi 3 is up to 70% faster than the H100. Intel’s new Gaudi 3 AI accelerator is reportedly between 1.2x to 1.3x more efficient than H100, measured by tokens per second per card, per watt.

Intel announces Gaudi 3 AI accelerator: 128GB HBM2e at up to 3.7TB/sec, up to 900W power 313

Intel will have the first samples of its new Gaudi 3 AI accelerator in the first half of this year, with larger volumes not expected until the year’s second half. Intel will be fighting NVIDIA’s current-gen Hopper H100 and new H200 AI GPUs which are dominating the market with the company consuming an estimated 90%+ of the AI GPU market, while its next-gen Blackwell B200 AI GPU will be unleashed later this year and flooding systems across the world in 2025.