Comment by miningape

Comment by miningape a day ago

Loving this series! I'm currently implementing a z80 emulator (gameboy) and it's my first real introduction to CISC, and is really pushing my assembly / machine code skills - so having these blog posts coming from the "other direction" are really interesting and give me some good context.

I've implemented toy languages and bytecode compilers/vms before but seeing it from a professional perspective is just fascinating.

That being said it was totally unexpected to find out we can use "addresses" for addition on x86.

Joker_vD a day ago

A seasoned C programmer knows that "&arr[index]" is really just "arr + index" :) So in a sense, the optimizer rewrote "x + y" into "(int)&(((char*)x)[y])", which looks scarier in C, I admit.

Reply View 11 replies

crote a day ago

The horrifying side effect of this is that "arr[idx]" is equal to "idx[arr]", so "5[arr]" is just as valid as "arr[5]".
Your colleagues would probably prefer if you forget this.

Reply View | 10 replies
- miningape a day ago
  
  Mom, please come pick me up. These kids are scaring me.
  
  Reply View | 0 replies
- Joker_vD a day ago
  
  > so "5[arr]" is just as valid as "arr[5]"
  This is, I am sure, one of the stupid legacy reasons we still write "lr a0, 4(a1)" instead of more sensible "lr a0, a1[4]". The other one is that FORTRAN used round parentheses for both array access and function calls, so it stuck somehow.
  
  Reply View | 2 replies
  
  kragen 16 hours ago
  
  Generally such constant offsets are record fields in intent, not array indices. (If they were array indices, they'd need to be variable offsets obtained from a register, not immediate constants.) It's reasonable to think of record fields as functions:
  .equ car, 0 .equ cdr, 8 .globl length length: test %rdi, %rdi # nil? jz 1f # return 0 mov cdr(%rdi), %rdi # recurse on tail of list call length inc %rax ret 1: xor %eax, %eax ret
  To avoid writing out all the field offsets by hand, ARM's old assembler and I think MASM come with a record-layout-definition thing built in, but gas's macro system is powerful enough to implement it without having it built into the assembler itself. It takes about 13 lines of code: http://canonical.org/~kragen/sw/dev3/mapfield.S
  Alternatively, on non-RISC architectures, where the immediate constant isn't constrained to a few bits, it can be the address of an array, and the (possibly scaled) register is an index into it. So you might have startindex(,%rdi,4) for the %rdi'th start index:
  .data startindex: .long 1024 .text .globl length length: mov (startindex+4)(,%rdi,4), %eax sub startindex(,%rdi,4), %eax ret
  If the PDP-11 assembler syntax had been defined to be similar to C or Pascal rather than Fortran or BASIC we would, as you say, have used startindex[%rdi,4].
  This is not very popular nowadays both because it isn't RISC-compatible and because it isn't reentrant. AMD64 in particular is a kind of peculiar compromise—the immediate "offset" for startindex and endindex is 32 bits, even though the address space is 64 bits, so you could conceivably make this code fail to link by placing your data segment in the wrong place.
  (Despite stupid factionalist stuff, I think I come down on the side of preferring the Intel syntax over the AT&T syntax.)
  
  Reply View | 0 replies
  
  beng-nl 19 hours ago
  
  Yes, I find this one of the weird things about assembly - appending (or pretending?) a number means addition?! - even after many many years of occasionally reading/writing assembly, I’m never completely sure what these instructions do so I infer from context.
  
  Reply View | 0 replies
- rocqua a day ago
  
  That depends on sizeof(*arr) no?
  
  Reply View | 5 replies
  
  unwind a day ago
  
  Not in C no, since arithmetic on a pointer is implicitly scaled by the size of the value being pointed at (this statement is kind of breaking the abstraction ... oh well).
  
  Reply View | 0 replies
  
  messe a day ago
  
  Nope, a[b] is equivalent to *(a + b) regardless of a and b.
  
  Reply View | 3 replies