Comment by ryao

Comment by ryao 15 hours ago

6 replies

Being customizable is one of RISC-V’s strengths. Multiplication can be easily done in software by doing bit shifts and addition in a loop. If an embedded application does not make heavy use of multiplication, you can omit multiplication from the silicon for cost savings.

That said, ARM’s SWD is certainly nice. It appears to be possible to debug the Hazard3 cores in the RP2350 in the same way as the ARM cores:

https://gigazine.net/gsc_news/en/20241004-raspberry-pi-pico-...

magicalhippo 15 hours ago

> If an embedded application does not make heavy use of multiplication, you can omit multiplication from the silicon for cost savings.

The problem was that the initial extension that included multiplication also included division[1]. A lot of small microcontrollers have multiplication hardware but not division hardware.

Thus it would make sense to have a multiplication-only extension.

IIRC the argument was that the CPU should just trap the division instructions and emulate them, but in the embedded world you'll want to know your performance envelopes so better to explicitly know if hardware division is available or not.

[1]: https://docs.openhwgroup.org/projects/cva6-user-manual/01_cv...

  • ryao 14 hours ago

    Software division is often faster than hardware division, so your performance remark seems to be a moot point:

    https://libdivide.com/

    • magicalhippo 13 hours ago

      I don't think that library refutes anything of what I said.

      First of, that library requires you to fundamentally change the code, by moving some precomputation outside loops.

      Of course I can do a similar trick to move the division outside the loop without that library using simple fixed-point math, something which is a very basic optimization technique. So any performance comparison would have to be against that, not the original code.

      It is also much, much slower if your denominator changes for each invocation:

      In terms of processor time, pre-computing the proper magic number and shift is on the order of one to three hardware divides, for non-powers of 2.

      If you care about a fast hardware divider, then you're much more likely to have such code rather than the trivially-optimized code like the library example.

      • ryao 11 hours ago

        Good point. I withdraw my remark.

    • [removed] 13 hours ago
      [deleted]
danhor 3 hours ago

> It appears to be possible to debug the Hazard3 cores in the RP2350 in the same way as the ARM cores:

It is, but (as far as I understood it), they're using ARM SWD IP (which is a fine choice). But since their connection between the SWD IP and RISC-V DM is custom, you're going to need your adjust your debug probe software quite a bit more than between different Cortex MCUs.

Other vendors with similar issues (for example WCH) build something similar but incompatible, requiring their own debug probe. This is a solved problem for ARM cortex.