Comment by imtringued

>The main argument against using CMOV on OoO cores is that it creates data dependencies. The backend can't start executing downstream dependant μops until after the cmov executes.

That doesn't sound like a very well thought out argument. The moment you are conditional with respect to two independent conditions, you can run both conditional moves in parallel.

>At a first glance, it might seem insane to replace a simple data dependancy with a control-flow dependency, control-flow dependencies are way more expensive as they might lead to a miss-predict and pipeline flush.

The moment you have N parallel branches such as from unrolling a data parallel loop, you have a combinatorial explosion of 2^N possible paths to take. You have to successfully predict through all of them and you certainly can't execute them in parallel anymore.

Also, you're saying there is a miss predict and a pipeline flush, but those are concepts that relate to prefetching instructions and are completely irrelevant to conditional moves that do not change the instruction pointer. If you have nothing to execute, because you're waiting for a dependent instruction that is currently executing, then you're stalling the pipeline, which is equivalent to executing a NOP instruction. It's a waste of a cycle (not really, because you're waiting for a good reason), but it can't be more expensive than that.