Comment by crote

Comment by crote a day ago

10 replies

The horrifying side effect of this is that "arr[idx]" is equal to "idx[arr]", so "5[arr]" is just as valid as "arr[5]".

Your colleagues would probably prefer if you forget this.

Joker_vD a day ago

> so "5[arr]" is just as valid as "arr[5]"

This is, I am sure, one of the stupid legacy reasons we still write "lr a0, 4(a1)" instead of more sensible "lr a0, a1[4]". The other one is that FORTRAN used round parentheses for both array access and function calls, so it stuck somehow.

  • kragen 17 hours ago

    Generally such constant offsets are record fields in intent, not array indices. (If they were array indices, they'd need to be variable offsets obtained from a register, not immediate constants.) It's reasonable to think of record fields as functions:

                .equ car, 0
                .equ cdr, 8
                .globl length
        length: test %rdi, %rdi         # nil?
                jz 1f                   # return 0
                mov cdr(%rdi), %rdi     # recurse on tail of list
                call length
                inc %rax
                ret
            1:  xor %eax, %eax
                ret
    
    To avoid writing out all the field offsets by hand, ARM's old assembler and I think MASM come with a record-layout-definition thing built in, but gas's macro system is powerful enough to implement it without having it built into the assembler itself. It takes about 13 lines of code: http://canonical.org/~kragen/sw/dev3/mapfield.S

    Alternatively, on non-RISC architectures, where the immediate constant isn't constrained to a few bits, it can be the address of an array, and the (possibly scaled) register is an index into it. So you might have startindex(,%rdi,4) for the %rdi'th start index:

                .data
        startindex:
                .long 1024
                .text
                .globl length
        length: mov (startindex+4)(,%rdi,4), %eax
                sub startindex(,%rdi,4), %eax
                ret
    
    If the PDP-11 assembler syntax had been defined to be similar to C or Pascal rather than Fortran or BASIC we would, as you say, have used startindex[%rdi,4].

    This is not very popular nowadays both because it isn't RISC-compatible and because it isn't reentrant. AMD64 in particular is a kind of peculiar compromise—the immediate "offset" for startindex and endindex is 32 bits, even though the address space is 64 bits, so you could conceivably make this code fail to link by placing your data segment in the wrong place.

    (Despite stupid factionalist stuff, I think I come down on the side of preferring the Intel syntax over the AT&T syntax.)

  • beng-nl 20 hours ago

    Yes, I find this one of the weird things about assembly - appending (or pretending?) a number means addition?! - even after many many years of occasionally reading/writing assembly, I’m never completely sure what these instructions do so I infer from context.

rocqua a day ago

That depends on sizeof(*arr) no?

  • unwind a day ago

    Not in C no, since arithmetic on a pointer is implicitly scaled by the size of the value being pointed at (this statement is kind of breaking the abstraction ... oh well).

  • messe a day ago

    Nope, a[b] is equivalent to *(a + b) regardless of a and b.

    • sureglymop a day ago

      Given that, why don't we use just `*(a + b)` everywhere?

      Wouldn't that be more verbose and less confusing? (genuinely asking)

      • tomsmeding a day ago

        Do you really think that `*(a + i)` is clearer than `a[i]`?

        • sureglymop 19 hours ago

          Not necessarily. I think it's confusing when there are two fairly close ways to express the same thing.