brucehoult 2 days ago

Having taken a second look, this article does in fact have a point, but it is actually nothing at all to do with conditional moves in the RISC-V instruction set Zicond extension -- or amd64 or arm64 style conditional moves either, if they were added at some point.

It is not even about RISC-V but about instruction fusion in general in any ISA with a memory model at least as strong as RVWMO -- which includes x86. I'm not as familiar with the Aarch64 memory model, but I think this probably also applies to it.

The point here is that if an aggressive implementation wants to implement instruction fusion that removes conditional branches (or indirect branches) to make a branch-free µop -- for example, to turn a conditional branch over a move into something similar to the `czero` instruction -- then in order to maintain memory ordering AS SEEN BY A DIFFERENT CORE the fused µop has to also have `fence r,w` properties.

That is all.

It is irrelevant to this whether the actual RISC-V instruction set has a conditional move instruction, or the properties it has if it exists.

It is irrelevant to the situation where a human programmer or a compiler might choose to transform branchy code into branch-free code. They have a more global view of the program and can make sure things make sense. A CPU core implementing fusion has only a local view.

Finally, I'll note that instruction fusion is at present hypothetical in RISC-V processors that you can buy today while it has been used in both x86 and Arm chips for a long time.

Intel's "Core" µarch had fusion of e.g. `cmp;bCC` sequences in 2006, while AMD added it with Bulldozer in 2011. Arm introduced a limited capability -- `CMP r0, #0; BEQ label` is given as an example -- in A53 in 2012 and A57, A72 etc expanded the generality.

Upcoming RISC-V cores from companies such as Ventana and Tenstorrent are believed to implement instruction fusion for some cases.

Just for completeness, I'll again repeat that SiFive's U74 optimises execution of a condition branch and a following simple ALU instruction that execute simultaneously in two pipelines, but this is NOT fusion into a single µop.

  • phire 10 hours ago

    > but about instruction fusion in general in any ISA with a memory model at least as strong as RVWMO -- which includes x86

    No... It's kind of an artefact of RISC-V's memory model being weak. x86 side-steps the issue because it insists that stores always occur in program order, allowing it to fuse away conditional branches without issue.

    (Note: the actual hardware implementation of x86 cpus issues the stores anyway, and then rewinds if it later detects a memory ordering violation)

    RISC-V ran into this corner case because it wanted the best of both worlds: A Weak memory model, but still have strong ordering across branches.

    Looks like ARM avoided this issue because its memory model is weaker, branches don't force any ordering, which means the arm compiler might need to insert a few extra memory barrier instructions.

    ---------

    TBH, I don't think this fusing instructions edge case is a big deal. For smaller RISC-V cores, you aren't reordering memory operations in the first place.

    And for larger RISC-V cores, you already need a complex mechanism for dealing with store order violationss, so you just throw your fused come instruction at it. Your core already needs to deal with sync points that aren't proper branches, because non-taken branches also enforce ordering.