Comment by danhor

Comment by danhor 2 months ago

RISC-V is even worse: The Cortex-M series have standardized interrupt handling and are built so you can avoid writing any assembly for the startup code.

Meanwhile the RISC-V spec only defines very basic interrupt functionality, with most MCU vendors adding different external interrupt controllers or changing their cores to more closely follow the faster Cortex-M style, where the core itself handles stashing/unstashing registers, exit of interrupt handler on ret, vectoring for external interrupts, ... .

The low knowledge/priority of embedded of RISC-V can be seen in how long it took to specify an extension tha only includes multiplication, not division.

Especially for smaller MCUs the debug situation is unfortunate: In ARM-World you can use any CMSIS-DAP debug probe to debug different MCUs over SWD. RISC-V MCUs either have JTAG or a custom pin-reduced variant (as 4 pins for debugging is quite a lot) which is usually only supported by very few debug probes.

RISC-V just standardizes a whole lot less (and not sensibly for small embedded) than ARM.

ryao 2 months ago

Being customizable is one of RISC-V’s strengths. Multiplication can be easily done in software by doing bit shifts and addition in a loop. If an embedded application does not make heavy use of multiplication, you can omit multiplication from the silicon for cost savings.

That said, ARM’s SWD is certainly nice. It appears to be possible to debug the Hazard3 cores in the RP2350 in the same way as the ARM cores:

https://gigazine.net/gsc_news/en/20241004-raspberry-pi-pico-...

Reply View 6 replies

magicalhippo 2 months ago

> If an embedded application does not make heavy use of multiplication, you can omit multiplication from the silicon for cost savings.
The problem was that the initial extension that included multiplication also included division[1]. A lot of small microcontrollers have multiplication hardware but not division hardware.
Thus it would make sense to have a multiplication-only extension.
IIRC the argument was that the CPU should just trap the division instructions and emulate them, but in the embedded world you'll want to know your performance envelopes so better to explicitly know if hardware division is available or not.
[1]: https://docs.openhwgroup.org/projects/cva6-user-manual/01_cv...

Reply View | 4 replies
- ryao 2 months ago
  
  Software division is often faster than hardware division, so your performance remark seems to be a moot point:
  https://libdivide.com/
  
  Reply View | 3 replies
  
  magicalhippo 2 months ago
  
  I don't think that library refutes anything of what I said.
  First of, that library requires you to fundamentally change the code, by moving some precomputation outside loops.
  Of course I can do a similar trick to move the division outside the loop without that library using simple fixed-point math, something which is a very basic optimization technique. So any performance comparison would have to be against that, not the original code.
  It is also much, much slower if your denominator changes for each invocation:
  In terms of processor time, pre-computing the proper magic number and shift is on the order of one to three hardware divides, for non-powers of 2.
  If you care about a fast hardware divider, then you're much more likely to have such code rather than the trivially-optimized code like the library example.
  
  Reply View | 1 reply
  
  ryao 2 months ago
  
  Good point. I withdraw my remark.
  
  Reply View | 0 replies
  
  [removed] 2 months ago
  
  [deleted]
  
  Reply View | 0 replies
danhor 2 months ago

> It appears to be possible to debug the Hazard3 cores in the RP2350 in the same way as the ARM cores:
It is, but (as far as I understood it), they're using ARM SWD IP (which is a fine choice). But since their connection between the SWD IP and RISC-V DM is custom, you're going to need your adjust your debug probe software quite a bit more than between different Cortex MCUs.
Other vendors with similar issues (for example WCH) build something similar but incompatible, requiring their own debug probe. This is a solved problem for ARM cortex.

Reply View | 0 replies