Comment by miningape

Comment by miningape a day ago

12 replies

Loving this series! I'm currently implementing a z80 emulator (gameboy) and it's my first real introduction to CISC, and is really pushing my assembly / machine code skills - so having these blog posts coming from the "other direction" are really interesting and give me some good context.

I've implemented toy languages and bytecode compilers/vms before but seeing it from a professional perspective is just fascinating.

That being said it was totally unexpected to find out we can use "addresses" for addition on x86.

Joker_vD a day ago

A seasoned C programmer knows that "&arr[index]" is really just "arr + index" :) So in a sense, the optimizer rewrote "x + y" into "(int)&(((char*)x)[y])", which looks scarier in C, I admit.

  • crote a day ago

    The horrifying side effect of this is that "arr[idx]" is equal to "idx[arr]", so "5[arr]" is just as valid as "arr[5]".

    Your colleagues would probably prefer if you forget this.

    • miningape a day ago

      Mom, please come pick me up. These kids are scaring me.

    • Joker_vD a day ago

      > so "5[arr]" is just as valid as "arr[5]"

      This is, I am sure, one of the stupid legacy reasons we still write "lr a0, 4(a1)" instead of more sensible "lr a0, a1[4]". The other one is that FORTRAN used round parentheses for both array access and function calls, so it stuck somehow.

      • kragen 16 hours ago

        Generally such constant offsets are record fields in intent, not array indices. (If they were array indices, they'd need to be variable offsets obtained from a register, not immediate constants.) It's reasonable to think of record fields as functions:

                    .equ car, 0
                    .equ cdr, 8
                    .globl length
            length: test %rdi, %rdi         # nil?
                    jz 1f                   # return 0
                    mov cdr(%rdi), %rdi     # recurse on tail of list
                    call length
                    inc %rax
                    ret
                1:  xor %eax, %eax
                    ret
        
        To avoid writing out all the field offsets by hand, ARM's old assembler and I think MASM come with a record-layout-definition thing built in, but gas's macro system is powerful enough to implement it without having it built into the assembler itself. It takes about 13 lines of code: http://canonical.org/~kragen/sw/dev3/mapfield.S

        Alternatively, on non-RISC architectures, where the immediate constant isn't constrained to a few bits, it can be the address of an array, and the (possibly scaled) register is an index into it. So you might have startindex(,%rdi,4) for the %rdi'th start index:

                    .data
            startindex:
                    .long 1024
                    .text
                    .globl length
            length: mov (startindex+4)(,%rdi,4), %eax
                    sub startindex(,%rdi,4), %eax
                    ret
        
        If the PDP-11 assembler syntax had been defined to be similar to C or Pascal rather than Fortran or BASIC we would, as you say, have used startindex[%rdi,4].

        This is not very popular nowadays both because it isn't RISC-compatible and because it isn't reentrant. AMD64 in particular is a kind of peculiar compromise—the immediate "offset" for startindex and endindex is 32 bits, even though the address space is 64 bits, so you could conceivably make this code fail to link by placing your data segment in the wrong place.

        (Despite stupid factionalist stuff, I think I come down on the side of preferring the Intel syntax over the AT&T syntax.)

      • beng-nl 19 hours ago

        Yes, I find this one of the weird things about assembly - appending (or pretending?) a number means addition?! - even after many many years of occasionally reading/writing assembly, I’m never completely sure what these instructions do so I infer from context.

    • rocqua a day ago

      That depends on sizeof(*arr) no?

      • unwind a day ago

        Not in C no, since arithmetic on a pointer is implicitly scaled by the size of the value being pointed at (this statement is kind of breaking the abstraction ... oh well).

      • messe a day ago

        Nope, a[b] is equivalent to *(a + b) regardless of a and b.