Comment by wmf

Comment by wmf 8 days ago

11 replies

For AMD I think Infinity Fabric is the bottleneck so increasing memory clock without increasing IF clock does nothing. And it's also possible that 8 cores with massive cache simply don't need more bandwidth.

sliken 8 days ago

My understanding is the single CCD chips (like the 9800x3d) have 2 IF links, while the dual CCD chips (like the 9950x) have 1. Keep in mind these CCDs are shared with turin (12 channel), threadripper pro (8 channel), siena (6 channel), threadripper (4 channel).

The higher CCD configurations have 1 IF link per chip, the lower have 2 IF links per chip. Presumably AMD would bother with the 2 IF link chips unless it helped.

  • CobaltFire 8 days ago

    This was only true for Epyc, and only true for a small number of low CCD SKUs.

    Consumer platforms do NOT do this; this has actually been discussed in depth in the Threadripper Pro space. The low CCD parts were hamstrung by the shortage of IF links, meaning that they got a far smaller bump from more than 4 channels of populated RAM than they could have.

    • sliken 7 days ago

      Ah, interesting and disappointing. I've been looking for more memory bandwidth. The M4 max is tempting, even if only half the bandwidth is available to the CPUs. I was also looking at the low end epyc, like the Epyc Turin 9115 (12 channel) or Siena 8124P (6 channel). Both in the $650-$750 range, but it's frustratingly hard to figure out what they are actually capable of.

      I do look forward to the AMD Strix Halo (256 bit x 8533 MHz).

  • Dylan16807 8 days ago

    I can't find anything to back that up.

    That said, each link gives a CCD 64GB/s of read speed and 32GB/s of write speed. 8000MHz memory at 128 bits would get up to 128GB/s. So being stuck with one link would bottleneck badly enough to hide the effects of memory speed.

    • sliken 8 days ago

      I've been paying close attention, found various hints at anandtech (RIP), chips and cheese, and STH.

      It doesn't make much difference to most apps, but I believe the single CCD (like the 9700x) has better bandwidth to IOD then their dual CCD chips, like the 9900x and 9950x

      Similarly on the server chips you can get 2,4,8, or 16 CCDs. To get 16 cores you can use 2 CCDs or 16 CCDs! But the sweet spot (max bandwidth per CCD) is at 8 CCDs where you get a decent number of cores and twice the bandwidth per CCD. Keep in mind the genoa/turin EPYC chips have 24 channels (32 bit x 24) for a 768 bit wide memory interface. Not nearly as constrained as their desktops.

      Wish I could paste in a diagram, but check out:

      https://www.amd.com/content/dam/amd/en/documents/epyc-techni...

      Page 7 has a diagram of 96 core with one GMI (IF) port per CCD and a 32 core chip two GMI ports per CCD.

      That's a gen old I believe, the max CCDs is now 16, not 12 with turin.

      • Dylan16807 8 days ago

        So "GMI3-wide" and similar terms are the important things to search for.

        some diagrams: https://www.servethehome.com/amd-epyc-genoa-gaps-intel-xeon-...

        From another page: "The most noteworthy aspect is that there is a new GMI3-Wide format. With Client Zen 4 and previous generations of Zen chiplets, there was 1 GMI link between the IOD and CCD. With Genoa, in the lower core count, lower CCD SKUs, multiple GMI links can be connected to the CCD."

        And it seems like all the chiplets have two links, but everything I can find says they just don't hook up both on consumer parts.

      • [removed] 8 days ago
        [deleted]