Comment by sylware

Comment by sylware 4 days ago

This is implemented with instruction fusion. Just need to document properly and publish properly what will end up "standard instruction fusion patterns" (like the div/rem one).

Adding more instructions is kind of non productive for a R(educed)ISC ISA. It has to be weighted with extreme care. Compressed instructions went thru for the sake of code density (marketing vs arm thumb instructions).

In the end, programs will want probably to stay conservative and will implement only the core ISA, at best giving some love to some instruction fusion patterns and that's it, unless being built knowingly for a specific risc-v hardware implementation.

mort96 14 hours ago

> In the end, programs will want probably to stay conservative and will implement only the core ISA

This is probably not the case. The core ISA doesn't include floating point, it doesn't include integer multiply or divide, it doesn't include atomic and fence instructions.

What has happened is that most compilers and programs for "normal desktop/laptop/server/phone class systems" all have some baseline set of extensions. Today, this is more or less what we call the "G" extension collection (which is short-hand for IMAFD_Zicsr_Zifencei). Though what we consider "baseline" in "normal systems" will obviously evolve over time (just like how SSE is considered a part of "baseline amd64" these days but was once a new and exotic extension).

Then lower power use cases like MCUs will have fewer instructions. There will be lots of MCUs without stuff like hardware floating point support that won't run binaries compiled for the G extension collection. In MCU use cases, you typically know at the time of compiling exactly what MCU your code will be running on, so passing the right flags to the compiler to make sure it generates only the supported instructions is not an issue.

And then HPC use cases will probably assume more exotic extensions.

And normal "desktop/phone/laptop/server" style use cases will have runtime detection of things like vector instructions in some situations, just like in aarch64/amd64.

Reply View 8 replies

int_19h 3 hours ago

Was there ever a time when SSE was not a part of baseline amd64? Just going off the dates, SSE showed up in Pentium 3, and if I remember correctly AMD picked it up in 32-bit Athlons already.

Reply View | 1 reply
- mort96 an hour ago
  
  I think you're right. I should've said x86 (or maybe IA-32?), not amd64.
  
  Reply View | 0 replies
panick21_ 12 hours ago

Its not known as "G". The standard that is target by the software ecosystem is RVA20, RVA22, RVA23.
https://riscv.org/ecosystem-news/2025/04/risc-v-rva23-a-majo...

Reply View | 1 reply
- mort96 2 hours ago
  
  Thanks, seems I'm out of date (or just wrong). G is indeed IMAFD_Zicsr_Zifencei and I've always viewed it as a "reasonable baseline for most normal code", I wasn't up to date on the RVA/B/C stuff.
  
  Reply View | 0 replies
sylware 2 hours ago

What??
Ofc, if your program uses floating point calculations you will want to use the hardware machine instructions for that.
Here, we were talking about about all those machine instructions which do not bring much more on top of the core ISA. Those would be implemented using fusion, appropriate for R(educed)ISC silicon. The trade-off is code density, and code density on modern silicon, probably in very specific niches, but there, program machine instructions would be generated (BTW, probably written instead of generated for those niches...) with those very specific niches in mind.
And RISC-V hardware implementations, with proper publishing of most common, and pertinent, machine instruction fusion patterns, will be able to "improve" step by step, targetting what they actually run and what would make real difference. Sure, this will require a bit of coordination to agree on machine instruction fusion patterns.

Reply View | 3 replies
- mort96 2 hours ago
  
  You said "programs will want probably to stay conservative and will implement only the core ISA". I'm saying that the core ISA is very very limited and most programs will want to use more than the core ISA.
  
  Reply View | 2 replies
  
  sylware 2 hours ago
  
  What???
  Re-read my post, please.
  The problem is those machine instructions not bringing much more than the core ISA which do not require an ISA extension.
  
  Reply View | 1 reply
  
  mort96 2 hours ago
  
  Integer multiply requires an ISA extension. The core ISA does not have integer multiply.
  
  Reply View | 0 replies

vardump 17 hours ago

Instruction fusion still means lower code density. You can go overboard, but the newer ARM instruction set(s) are pretty good.

Reply View 6 replies

duskwuff 14 hours ago

As an aside: it's only relevant on microcontrollers nowadays, but ARM T32 (Thumb) code density is really good. Most instructions are 2 bytes, and it's got some clever ways to represent commonly used 32-bit values in 12 bits:
https://developer.arm.com/documentation/ddi0403/d/Applicatio...

Reply View | 1 reply
- wren6991 2 hours ago
  
  RISC-V code density is pretty good these days with Zcmp (push, pop, compressed double move) and Zcb (compressed mul, sign/zero-extend, byte load/store). There is also Zcmt but it's kind of cursed. Hopefully density will keep improving once mainstream compilers have full support for Zilsd/Zclsd (load/store pair for RV32).
  T32 is a pretty good encoding but far from perfect. If they had the chance to redo it I doubt they would spend a full 1/32nd of the encoding space on asrs, for example.
  
  Reply View | 0 replies
Findecanor 16 hours ago

Not necessarily lower density. On ARM you would often need cmp and csel, which are two instructions, eight bytes.
RISC-V has cmp-and-branch in a single instruction, which with c.mv normally makes six bytes. If the cmp-and-branch instruction tests one of x8..x15 against zero then that could also be a compressed instruction: making four bytes in total.

Reply View | 2 replies
- astrange 16 hours ago
  
  ARMv8.7 added some new instructions for int min/max to replace cmp+csel. (I'm surprised it took them so long to add popcnt.)
  https://www.corsix.org/content/arm-cssc
  
  Reply View | 0 replies
- sylware 3 hours ago
  
  Compressed instruction only matter for niche (and even in such niche, nowadays, I guess it is very probably very questionable), here you would not use compressed instructions, just the right instruction pattern for fusion, like div/rem.
  
  Reply View | 0 replies
sylware 3 hours ago

RISC-V instructions are pretty good, without any IP lock like ARM or x86_64.

Reply View | 0 replies

Pet_Ant 17 hours ago

Compressed instructions are also for microcontroller use. RISC-V -rightly or wrongly- is trying to be an ISA that can handle the whole stack from embedded microcontrollers to a top-end server.

As such, there are compromises for both aims.

Reply View 1 reply

sylware 2 hours ago

"sweet spot"

Reply View | 0 replies

mshockwave 6 hours ago

> In the end, programs will want probably to stay conservative and will implement only the core ISA

Unlikely, as pointed out in sibling comments the core ISA is too limited. What might prevail is profiles, specifically profiles for application processors like RVA22U64 and RVA23U64, which the latter one makes a lot more sense IMHO.

Reply View 0 replies

wren6991 13 hours ago

> publish properly what will end up "standard instruction fusion patterns" (like the div/rem one).

The div/rem one is odd because I saw it suggested in the ISA manual, but I have yet to ever see that pattern crop up in compiled code. Usually it's just in library functions like C stdlib `div()` which returns a quotient and remainder, but why on earth are you calling that library function on a processor that has a divide instruction?

Reply View 2 replies

cpgxiii 10 hours ago

> but why on earth are you calling that library function on a processor that has a divide instruction?
Because they rightfully expect that div() compiles down to the fastest div/rem idiom for the target hardware. Mainstream compilers go to great lengths to optimize calls to the core C math functions.

Reply View | 1 reply
- wren6991 2 hours ago
  
  You still have the overhead of a function call. If you just use / % operators then you'll get a call inserted to the libgcc or compiler-rt routine if you don't have the M extension, and those routines are div or mod only. Using stdlib for integer division seems like an odd choice.
  If stdlib div() were promoted to a builtin one day (it currently is not in GCC afaict), and its implementation were inlined, then the compiler would recognise the common case of one side of the struct being dead, and you'd still end up with a single div/rem instruction.
  
  Reply View | 0 replies