Comment by sylware
Anybody with deep knowledge of current RISC-V opensource implementations here?
Do harts have store queue and load queue optimizations? Namely some kind of memory request fusion?
I asked this question because since I am writing rv64 assembly, and since rv64 is a load/store architecture, I tend to pack as much as I can memory ordered loads and stores.
I suppose everything that isn't a toy implementation has a store queue.
Even the U54 Core Complex (later U54-MC) manual from August 2018 states in Section 3.4 "Stores are pipelined and commit on cycles where the data memory system is otherwise idle. Loads to addresses currently in the store pipeline result in a five-cycle penalty."
It probably inherited this from Rocket.