Comment by snvzz
Comment by snvzz 2 days ago
Because, unlike RISC-V, x86 has no x0 register.
Comment by snvzz 2 days ago
Because, unlike RISC-V, x86 has no x0 register.
ARM64 assembly has a MOV instruction, but for most of the ways it's used, it's an alias in the assembler to something else. For example, MOV between two registers actually generates ORR rd, rZR, rm, i.e. rd := (zero-register) OR rm. Or, a MOV with a small immediate is ORR rd, rZR, #imm.
If trying to set the stack pointer, or copy the stack pointer, instead the underlying instruction is ADD SP, Xn, #0 i.e. SP = Xn + 0. This is because the stack pointer and zero register are both encoded as register 31 (11111). Some instructions allow you to use the zero register, others the stack pointer. Presumably ORR uses the zero register and ADD the stack pointer.
NOP maps to HINT #0. There are 128 HINT values available; anything not implemented on this processor executes as a NOP.
There are other operations that are aliased like CMP Xm, Xn is really an alias for SUBS XZR, Xm, Xn: subtract Xn from Xm, store the result in the zero register [i.e. discard it], and set the flags. RISC-V doesn't have flags, of course. ARM Ltd clearly considered them still useful.
There are other oddities, things like 'rotate right' is encoded as 'extract register from pair of registers', but it specifies the same source register twice.
Disassemblers do their best to hide this from you. ARM list a 'preferred decoding' for any instruction that has aliases, to map back to a more meaningful alias wherever possible.
There is a `c.mv` instruction in the compressed set, which most RISC-V processors implement.
That, and `add rd, rs, x0` could (like the zeroing idiom on x86), run entirely in the decoding and register-renaming stages of a processor.
RISC-V does actually have quite a few idioms. Some idioms are multi-instruction sequences ("macro ops") that could get folded into single micro-ops ("macro-op fusion"/"instruction fusion"): for example `lui` followed by `addi` for loading a 32-bit constant, and left shift followed by right shift for extracting a bitfield.
The DEC Alpha chip was the same. It also had a hardwired zero register (although IIRC the zero register was r31 instead of r0) and about half the addressing modes and a whole bunch of "assembly instructions" were created by interesting uses of that zero register.
x86 has no architectural zero register, but a x86 CPU could have a microarchitectural zero register.
And when the instruction decoder in such a CPU with register renaming sees `xor eax, eax`, it just makes `eax` point to the zero register for instructions after it. It does not have to put any instruction into the pipeline, and it takes effectively 0 cycles. That is what makes the "zeroing idiom" so powerful.
From your past posting history, I presume that you're implying this makes RISC-V better?
Do we have any data showing that having a dedicated zero register is better than a short and canonical instruction for zeroing an arbitrary register?
The zero register helps RISC-V (and MIPS before it) really cut down on the number of instructions, and hardware complexity.
You don't need a mov instruction, you just OR with $zero. You don't need a load immediate instruction you just ADDI/ORI with $zero. You don't need a Neg instruction, you just SUB with $zero. All your Compare-And-Branch instructions get a compare with $zero variant for free.
I refuse to say this "zero register" approach is better, it is part of a wide design with many interacting features. But once you have 31 registers, it's quite cheap to allocate one register to be zero, and may actually save encoding space elsewhere. (And encoding space is always an issue with fixed width instructions).
AArch64 takes the concept further, they have a register that is sometimes acts as the zero register (when used in ALU instructions) and other times is the stack pointer (when used in memory instructions and a few special stack instructions).
>> The zero register helps RISC-V (and MIPS before it) really cut down on the number of instructions, and hardware complexity.
Which if funny because IMHO RISC-V instruction encoding is garbage. It was all optimized around the idea of fixed length 32-bit instructions. This leads to weird sized immediates (12 bits?) and 2 instructions to load a 32 bit constant. No support for 64 bit immediates. Then they decided to have "compressed" instructions that are 16 bits, so it's somewhat variable length anyway.
IMHO once all the vector, AI and graphics instructions are nailed down they should make RISC-VI where it's almost the same but re-encoding the instructions. Have sensible 16-bit ones, 32-bit, and use immediate constants after the opcodes. It seems like there is a lot they could do to clean it up - obviously not as much as x86 ;-)
ARM64 also has fixed length 32-bit instructions. Yes, immediates are normally small and it's not particularly orthogonal as to how many bits are available.
The largest MOV available is 16 bits, but those 16 bits can be shifted by 0, 16, 32 or 48 bits, so the worst case for a 64-bit immediate is 4 instructions. Or the compiler can decide to put the data in a PC-relative pool and use ADR or ADRP to calculate the address.
ADD immediate is 12 bits but can optionally apply a 12-bit left-shift to that immediate, so for immediates up to 24 bits it can be done in two instructions.
ARM64 decoding is also pretty complex, far less orthogonal than ARM32. Then again, ARM32 was designed to be decodable on a chip with 25,000 transistors, not where you can spend thousands of transistors to decode a single instruction.
There's not a strong case for redoing the RISC-V encoding with a new RISC-VI unless they run out of 32-bit encoding space outright, due to e.g. extensive new vector-like or AI-like instructions. And then they could free up a huge amount of encoding space trivially by moving to a 2-address format throughout with Rd=Rs1 and using a simple instruction fusion approach MOV Rd ← Rs1; OP Rd ← etc. for the former 3-address case.
(Any instruction that can be similarly rephrased as a composition of more restricted elementary instructions is also a candidate for this macro-insn approach.)
IMO the riscv decoding is really elegant (arguably excepting the C extension). Things like 64 bit immediates are almost certainly a bad idea (as opposed to just having a load from memory). Most 64 bit constants in use can be sign extended from much smaller values, and for those that can't, supporting 72 bit (or bigger) instructions just to be able to load a 64 bit immediate will necessarily bloat instruction cache, stall your instruction decoder (or limit parallelism), and will only be 2 cycles faster than a L1 cache load (if the instruction is hot). 32 bit immediate would be kind of nice, but the benefit is pretty small. An x86 instruction with 32 bit immediate is 6 bytes, while the 2 RISC-V instructions are 8 bytes. There have been proposals to add 48 bit instructions, which would let Risc-v have 32 bit immediate support with the same 6 bytes as x86 (and 12 byte 2 instructions 64 bit loads vs 10 bit for x86 in the very rare situations where doing so will be faster than a load).
ISA design is always a tradeoff, https://ics.uci.edu/~swjun/courses/2023F-CS250P/materials/le... has some good details, but the TLDR is that RISC-V makes reasonable choices for a fairly "boring" ISA.
MIPS for example also has one, along with a similar number of registers (~32). So it's not like RISC-V took a radical new position here, they were able to look back at what worked and what didn't, and decided that for their target a zero register was the right tradeoff. It's certainly the more "elegant" solution. A zero register is useful as input or output register for all kinds of operations, not just for zeroing
It's a definite liability on a machine with only 8 general purpose registers. Losing 12% of the register space for a constant would be a waste of hardware.
Ever heard of a loop that needed to keep more than 7 variables live? Register renaming helps with pipelining and out-of-order execution, but instructions in the program can only reference the architectural registers - go beyond that and you end up needing to spill some values to (architectural) memory.
There's a reason why AMD added r8-r15 to the architecture, and why intel is adding r16-r31..
In the real world there is no CISC or RISC anymore. RISC is always extended to some new feature and suddenly becomes more complex. Meanwhile CISC is just a decoder over a RISC processor. Either way you get the best of both worlds: simple hardware (the RISC internals and CSIC instructions that do what you need.
Don't get too carried away in the above, x86 is still a lot more complex than ARM or RISC-V. However the complexity is only a tiny part of a CPU and so it doesn't matter.
I think one could just pick a convention where a particular GP register is zeroed at program startup and just make your own zero register that way, getting all the benefits at very small cost. The microarchitecture AIUI has a dedicated zero register so any processor-level optimizations would still apply.
And the other way around: RISC-V doesn't have a move instruction so that's done as "dst = src + 0", and it doesn't have a nop instruction so that's done as "x0 = x0 + 0". There's like a dozen of them.
It's quite interesting what neat tricks roll out once you've got a guaranteed zero register - it greatly reduces the number of distinct instructions you need for what is basically the same operation.