Comment by brucehoult
Comment by brucehoult 4 days ago
> some SiFive cores implement exactly this fusion.
I was not able to open the given link, but it's not true, at least for the U74.
Fusion means that one or more instructions are converted to one internal instruction (µop).
SiFive's optimisation [1] of a short forward conditional branch over exactly one instruction has both instructions executing as normal, the branch in pipe A and the other instruction simultaneously in pipe B. At the final stage if the branch turns out to be taken then it is not in fact physically taken, but is instead implemented by suppressing the register write-back of the 2nd instruction.
There are only a limited set of instructions that can be the 2nd instruction in this optimisation, and loads and stores do not qualify. Only simple register-register or register-immediate ALU operations are allowed, including `lui` and `auipc` as well as C aliases such as `c.mv` and `c.li`
> The whole premise of fusion is predicated on the idea that it is valid for a core to transform code similar to the branchy code on the left into code similar to the branch-free code on the right. I wish to cast doubt on this validity: it is true that the two instruction sequences compute the same thing, but details of the RISC-V memory consistency model mean that the two sequences are very much not equivalent, and therefore a core cannot blindly turn one into the other.
The presented code ...
mv rd, x0
beq rs2, x0, skip_next
mv rd, rs1
skip_next:
... vs ... czero.eqz rd, rs1, rs2
... requires that not only rd != rs2 (as stated) but also that rd != rs1. A better implementation is ... mv rd, rs1 // safe even if they are the same register
bne rs2, x0, skip
mv rd, x0
skip:
The RISC-V memory consistency model does not come into it, because there are no loads or stores.Then switching to code involving loads and stores is completely irrelevant:
lw x1, 0(x2)
bne x1, x0, next
next:
sw x3, 0(x4)
First of all, this code is completely crazy because the `bne` is fancy kind of `nop` and a core could convert it to a canonical `nop` (or simply drop it).Even putting the `sw` between the `bne` and the label is ludicrous. There is no branch-free code that does the same thing -- not only in RISC-V but also in arm64 or amd64. SiFive's optimisation will not trigger with a store in that position.
[1] SiFive materials consistently describe it as an optimisation not as fusion e.g. in the description of the chicken bits CSR in the U74 core complex manual.
Thanks for your input. I didn’t know what to make of the article.