Comment by jmmv

Comment by jmmv 2 days ago

6 replies

> It gets better though! Since this is a very common operation, x86 CPUs spot this “zeroing idiom” early in the pipeline and can specifically optimise around it: the out-of-order tracking systems knows that the value of “eax” (or whichever register is being zeroed) does not depend on the previous value of eax, so it can allocate a fresh, dependency-free zero register renamer slot.

While this is probably true ("probably" because I haven't checked it myself, but it makes sense), the CPU could do the exact same thing for "mov eax, 0", couldn't it? (Does it?)

adrian_b 2 days ago

Most Intel/AMD CPUs do the same thing for a few alternative instructions, e.g. "sub rax, rax".

I do not think that anyone bothers to do this for a "mov eax, 0", because neither assembly programmers nor compilers use such an instruction. Either "xor reg,reg" or "sub reg,reg" have been the recommended instructions for clearing registers since 1978, i.e. since the launch of Intel 8086, because Intel 8086 lacked a "clear" instruction, like that of the competing CPUs from DEC or Motorola.

One should remember that what is improperly named "exclusive or" in computer jargon is actually simultaneously addition modulo 2 and subtraction modulo 2 (because these 2 operations are identical; the different methods of carry and borrow generation distinguish addition from subtraction only for moduli greater than 2).

The subtraction of a thing from itself is null, which is why clearing a register is done by subtracting it from itself, either with word subtraction or with bitwise modulo-2 subtraction, a.k.a. XOR.

(The true "exclusive or" operation is a logical operation distinct from the addition/subtraction modulo 2. These 2 distinct operations are equivalent only for 2 operands. For 3 or more operands they are different, but programmers still use incorrectly the term XOR when they mean the addition modulo 2 of 3 or more operands. The true "exclusive" or is the function that is true only when exactly one of its operands is true, unlike "inclusive" or, which is true when at least one of its operands is true. To these 2 logical "or" functions correspond the 2 logical quantifiers "There exists a unique ..." and "There exists a ...".)

lucozade 2 days ago

> couldn't it? (Does it?)

It could of course. It can do pretty much any pattern matching it likes. But I doubt very much it would because that pattern is way less common.

As the article points out, the XOR saves 3 bytes of instructions for a really, really common pattern (to zero a register, particularly the return register).

So there's very good reason to perform the XOR preferentially and hence good reason to optimise that very common idiom.

Other approaches eg add a new "zero <reg>" instruction are basically worse as they're not backward compatible and don't really improve anything other than making the assembly a tiny bit more human readable.

electroly 2 days ago

Sure, lots of longer instructions have this effect. "xor eax,eax" is interesting because it's short. That zero immediate in "mov eax,0" is bigger than the entire "xor eax,eax" instruction.

MobiusHorizons 2 days ago

I believe it does in some newer CPUs. It takes extra silicon to recognize the pattern though, and compilers emit the xor because the instruction is smaller, so I doubt there is much speed up in real workloads.

pwg a day ago

> the CPU could do the exact same thing for "mov eax, 0", couldn't it?

Yes, it could, but mov eax, 0 is still going to also be six bytes of instruction in cache, and fetched, and decoded, so optimizing on the shorter version is marginally better.

addaon 2 days ago

Yes, "mov r, imm" also breaks dependencies -- but the immediate needs to be encoded, so the instruction is longer.