Comment by kccqzy

What they mean is that they are rewriting low level synchronization primitives in order not to penalize AMD CPUs. For example on the AMD Rome CPUs, the cross-CCD latency of atomic instructions could be as high as 200 nanoseconds even when the instructions supposedly access a memory location already in the cache. Common code patterns like multiple cores atomically incrementing a single counter would have borderline acceptable performance on Intel but terrible performance on AMD.

Or consider things like CPU core allocators, which now need to be CCD-aware when allocating cores within a CPU to a container.