Comment by phire
3-4 uops per cycle is more of an average throughput than a typical throughput.
The average is dragged down by many cycles that don't decoded/rename any uops. Either waiting for bytes to decode (icache miss, etc) or rename is blocked because the ROB is full (probably stalled on a dcache miss).
So you want a quite wide frontend so that whenever you are unblocked, you can drag the average up again.