Comment by qingcharles
Comment by qingcharles 2 days ago
That's weird, I looked it up earlier and found the P6 (Pentium Pro) was the first to actually make the xor optimization into a zero clock operation.
https://fanael.github.io/archives/topic-microarchitecture-ar...
A few paragraphs down from that:
“I assume that the ability to recognize that the exclusive-or zeroing idiom doesn't really depend on the previous value of a register, so that it can be dispatched immediately without waiting for the old value — thus breaking the dependency chain — met the same fate; the Pentium Pro shipped without it.
Some of the cut features were introduced in later models: segment register renaming, for example, was added back in the Pentium II. Maybe dependency-breaking zeroing XOR was added in later P6 models too? After all, it seems such a simple yet important thing, and indeed, I remember seeing people claim that's the case in some old forum posts and mailing list messages. On the other hand, some sources, such as Agner Fog's optimization manuals say that not only it was never present in any of the P6 processors, it was also missing in Pentium M.”