Comment by hota_mazi

Comment by hota_mazi a day ago

6 replies

I think OP was just making a comment on the asymmetry of the syntax. Brackets [] are usually used to dereference.

Why is this written

    lea eax, [rdi + rsi]
instead of just

    lea eax, rdi + rsi

?
jcranmer 21 hours ago

When you encode an x86 instruction, your operands amount to either a register name, a memory operand, or an immediate (of several slightly different flavors). I'm no great connoisseur of ISAs, but I believe this basic trichotomy is fairly universal for ISAs. The operands of an LEA instruction are the destination register and a memory operand [1]. LEA happens to be the unique instruction where the memory operand is not dereferenced in some fashion in the course of execution; it doesn't make a lot of sense to create an entirely new syntax that works only for a single instruction.

[1] On a hardware level, the ModR/M encoding of most x86 instructions allows you to specify a register operand and either a memory or a register operand. The LEA instruction only allows a register and a memory operand to be specified; if you try to use a register and register operand, it is instead decoded as an illegal instruction.

  • aengelke 20 hours ago

    > LEA happens to be the unique instruction where the memory operand is not dereferenced

    Not quite unique: the now-deprecated Intel MPX instructions had similar semantics, e.g. BNDCU or BNDMK. BNDLDX/BNDSTX are even weirder as they don't compute the address as specified but treat the index part of the memory operand separately.

sparkie 15 hours ago

It's due to the way the instruction is encoded. `lea` would've needed special treatment in syntax to remove the brackets.

In `op reg1, reg2`, the two registers are encoded as 3 bits each the ModRM byte which follows the opcode. Obviously, we can't fit 3 registers in the ModRM byte because it's only 8-bits.

In `op reg1, [reg2 + reg3]`, reg1 is encoded in the ModRM byte. The 3 bits that were previously used for reg2 are instead `0b100`, which indicates a SIB byte follows the ModRM byte. The SIB (Scale-Index-Base) byte uses 3 bits each for reg2 and reg3 as the base and index registers.

In any other instruction, the SIB byte is used for addressing, so syntax of `lea` is consistent with the way it is encoded.

Encoding details of ModRM/SIB are in Volume2, Section 2.1.5 of the ISA manual: https://www.intel.com/content/www/us/en/developer/articles/t...

Y_Y 21 hours ago

The way I rationalize it is that you're getting the address of something. A raw address isn't what you want the address of, so you're doing something like &(*(rdi+rsi)).

secondcoming 21 hours ago

Yes, that’s what I meant

  • HarHarVeryFunny 21 hours ago

    LEA stands for Load Effective Address, so the syntax is as-if you're doing a memory access, but you are just getting the calculated address, not reading or writing to that address.

    LEA would normally be used for things like calculating address of an array element, or doing pointer math.