Chinese engineer, hardware enthusiast, and Akari Akaza fan David Huang apparently got ahold of a pre-release Strix Point system sporting a Ryzen AI 9 365 processor and was able to run some benchmarks on it. From those, he came away with some pretty interesting observations on the Zen 5 core architecture as well as the realized performance of Strix Point.
Tables and graphs in this post by David Huang.
David saves the performance numbers for last, but we’ll pull them out front because this is probably what you’re really here to see. These single-core results from Geekbench 6 match clocks as closely as possible for the full-fat CPU cores, and the results are fascinating. You can click the chart to see the full list of benchmark results. If you do so, you might note that some of the benchmarks actually show regressions from Zen 4 to Zen 5. What’s that about?
Click any image to see the full version.
It turns out that, when tested using InstructionRate to measure performance on specific operations, Zen 5 shows some regressions versus Zen 4. Huang theorizes that certain internal structures in Zen 5 have been reduced in size to be even smaller than that of Zen 3, but what’s really interesting is these performance regressions vanish when you measure two threads on the same core.
As you can see, Zen 5 falls slightly behind Zen 4 in this benchmark of 2-byte NOP fetch bandwidth, but when you extend the benchmark to use two threads on the same core with simultaneous multi-threading (SMT), performance improves drastically. Huang notes that AMD specifically mentioned Zen 5’s “parallel dual pipe front-end” in a slide, and concludes that Zen 5 has a similar design to Intel’s Gracemont, with dual decoders and eight-wide rename stage.
Another interesting note is that while the Zen 5 core is said to implement full-width AVX-512 (where Zen 4’s SIMD units are only 256 bits wide), this is apparently not true for the mobile implementation of the core. Per-clock and per-core AVX performance on the Ryzen AI 9 365 is nearly identical to that of the Ryzen 7 7840U that Huang used for comparison testing. This makes quite a difference between performance of the mobile and desktop parts—the largest ever for an AMD CPU, according to Huang.
Finally, Huang confirms that the Zen 5 and Zen 5C cores of the Ryzen AI 9 365 are in separate CCDs, as expected. This won’t be a surprise to anyone who look at the die shot of the chip that AMD provided, but it’s always good to have hard data. Huang notes that the unusually high latency may be related to dynamic adjustment of FCLK or similar factors that won’t apply to desktop or server CPUs.
There are many more graphs and lots of interesting analysis over at Huang’s blog entry, although you’ll need to bring a translator if you don’t read Chinese. Overall, Huang’s findings largely validate AMD’s claim of 16% IPC gains over Zen 4. While his results were closer to 15%, we have to recall that this is the mobile version of the CPU with half the L3 cache and half the vector execution resources compared to the desktop core. We can expect excellent things out of AMD’s Ryzen 9000 CPUs when they hit Socket AM5 next month.