Comment by sylware

Comment by sylware 4 days ago

22 replies

This is implemented with instruction fusion. Just need to document properly and publish properly what will end up "standard instruction fusion patterns" (like the div/rem one).

Adding more instructions is kind of non productive for a R(educed)ISC ISA. It has to be weighted with extreme care. Compressed instructions went thru for the sake of code density (marketing vs arm thumb instructions).

In the end, programs will want probably to stay conservative and will implement only the core ISA, at best giving some love to some instruction fusion patterns and that's it, unless being built knowingly for a specific risc-v hardware implementation.

mort96 14 hours ago

> In the end, programs will want probably to stay conservative and will implement only the core ISA

This is probably not the case. The core ISA doesn't include floating point, it doesn't include integer multiply or divide, it doesn't include atomic and fence instructions.

What has happened is that most compilers and programs for "normal desktop/laptop/server/phone class systems" all have some baseline set of extensions. Today, this is more or less what we call the "G" extension collection (which is short-hand for IMAFD_Zicsr_Zifencei). Though what we consider "baseline" in "normal systems" will obviously evolve over time (just like how SSE is considered a part of "baseline amd64" these days but was once a new and exotic extension).

Then lower power use cases like MCUs will have fewer instructions. There will be lots of MCUs without stuff like hardware floating point support that won't run binaries compiled for the G extension collection. In MCU use cases, you typically know at the time of compiling exactly what MCU your code will be running on, so passing the right flags to the compiler to make sure it generates only the supported instructions is not an issue.

And then HPC use cases will probably assume more exotic extensions.

And normal "desktop/phone/laptop/server" style use cases will have runtime detection of things like vector instructions in some situations, just like in aarch64/amd64.

  • int_19h 3 hours ago

    Was there ever a time when SSE was not a part of baseline amd64? Just going off the dates, SSE showed up in Pentium 3, and if I remember correctly AMD picked it up in 32-bit Athlons already.

    • mort96 an hour ago

      I think you're right. I should've said x86 (or maybe IA-32?), not amd64.

  • panick21_ 12 hours ago

    Its not known as "G". The standard that is target by the software ecosystem is RVA20, RVA22, RVA23.

    https://riscv.org/ecosystem-news/2025/04/risc-v-rva23-a-majo...

    • mort96 2 hours ago

      Thanks, seems I'm out of date (or just wrong). G is indeed IMAFD_Zicsr_Zifencei and I've always viewed it as a "reasonable baseline for most normal code", I wasn't up to date on the RVA/B/C stuff.

  • sylware 2 hours ago

    What??

    Ofc, if your program uses floating point calculations you will want to use the hardware machine instructions for that.

    Here, we were talking about about all those machine instructions which do not bring much more on top of the core ISA. Those would be implemented using fusion, appropriate for R(educed)ISC silicon. The trade-off is code density, and code density on modern silicon, probably in very specific niches, but there, program machine instructions would be generated (BTW, probably written instead of generated for those niches...) with those very specific niches in mind.

    And RISC-V hardware implementations, with proper publishing of most common, and pertinent, machine instruction fusion patterns, will be able to "improve" step by step, targetting what they actually run and what would make real difference. Sure, this will require a bit of coordination to agree on machine instruction fusion patterns.

    • mort96 2 hours ago

      You said "programs will want probably to stay conservative and will implement only the core ISA". I'm saying that the core ISA is very very limited and most programs will want to use more than the core ISA.

      • sylware 2 hours ago

        What???

        Re-read my post, please.

        The problem is those machine instructions not bringing much more than the core ISA which do not require an ISA extension.

        • mort96 2 hours ago

          Integer multiply requires an ISA extension. The core ISA does not have integer multiply.

vardump 17 hours ago

Instruction fusion still means lower code density. You can go overboard, but the newer ARM instruction set(s) are pretty good.

  • duskwuff 14 hours ago

    As an aside: it's only relevant on microcontrollers nowadays, but ARM T32 (Thumb) code density is really good. Most instructions are 2 bytes, and it's got some clever ways to represent commonly used 32-bit values in 12 bits:

    https://developer.arm.com/documentation/ddi0403/d/Applicatio...

    • wren6991 2 hours ago

      RISC-V code density is pretty good these days with Zcmp (push, pop, compressed double move) and Zcb (compressed mul, sign/zero-extend, byte load/store). There is also Zcmt but it's kind of cursed. Hopefully density will keep improving once mainstream compilers have full support for Zilsd/Zclsd (load/store pair for RV32).

      T32 is a pretty good encoding but far from perfect. If they had the chance to redo it I doubt they would spend a full 1/32nd of the encoding space on asrs, for example.

  • Findecanor 16 hours ago

    Not necessarily lower density. On ARM you would often need cmp and csel, which are two instructions, eight bytes.

    RISC-V has cmp-and-branch in a single instruction, which with c.mv normally makes six bytes. If the cmp-and-branch instruction tests one of x8..x15 against zero then that could also be a compressed instruction: making four bytes in total.

    • sylware 3 hours ago

      Compressed instruction only matter for niche (and even in such niche, nowadays, I guess it is very probably very questionable), here you would not use compressed instructions, just the right instruction pattern for fusion, like div/rem.

  • sylware 3 hours ago

    RISC-V instructions are pretty good, without any IP lock like ARM or x86_64.

Pet_Ant 17 hours ago

Compressed instructions are also for microcontroller use. RISC-V -rightly or wrongly- is trying to be an ISA that can handle the whole stack from embedded microcontrollers to a top-end server.

As such, there are compromises for both aims.

mshockwave 6 hours ago

> In the end, programs will want probably to stay conservative and will implement only the core ISA

Unlikely, as pointed out in sibling comments the core ISA is too limited. What might prevail is profiles, specifically profiles for application processors like RVA22U64 and RVA23U64, which the latter one makes a lot more sense IMHO.

wren6991 13 hours ago

> publish properly what will end up "standard instruction fusion patterns" (like the div/rem one).

The div/rem one is odd because I saw it suggested in the ISA manual, but I have yet to ever see that pattern crop up in compiled code. Usually it's just in library functions like C stdlib `div()` which returns a quotient and remainder, but why on earth are you calling that library function on a processor that has a divide instruction?

  • cpgxiii 10 hours ago

    > but why on earth are you calling that library function on a processor that has a divide instruction?

    Because they rightfully expect that div() compiles down to the fastest div/rem idiom for the target hardware. Mainstream compilers go to great lengths to optimize calls to the core C math functions.

    • wren6991 2 hours ago

      You still have the overhead of a function call. If you just use / % operators then you'll get a call inserted to the libgcc or compiler-rt routine if you don't have the M extension, and those routines are div or mod only. Using stdlib for integer division seems like an odd choice.

      If stdlib div() were promoted to a builtin one day (it currently is not in GCC afaict), and its implementation were inlined, then the compiler would recognise the common case of one side of the struct being dead, and you'd still end up with a single div/rem instruction.