Comment by sylware

Comment by sylware 3 days ago

4 replies

Anybody with deep knowledge of current RISC-V opensource implementations here?

Do harts have store queue and load queue optimizations? Namely some kind of memory request fusion?

I asked this question because since I am writing rv64 assembly, and since rv64 is a load/store architecture, I tend to pack as much as I can memory ordered loads and stores.

brucehoult 2 days ago

I suppose everything that isn't a toy implementation has a store queue.

Even the U54 Core Complex (later U54-MC) manual from August 2018 states in Section 3.4 "Stores are pipelined and commit on cycles where the data memory system is otherwise idle. Loads to addresses currently in the store pipeline result in a five-cycle penalty."

It probably inherited this from Rocket.

  • sylware a day ago

    huh, a load which happens to hit the store queue should be faster that usual since it does not even need to reach the cache fabric, shouldn't it?

    • brucehoult a day ago

      Nope. Very common. Making a FIFO also randomly content-addressable adds a lot to the complexity, and only code too unoptimised to care about loads a value within half a dozen instructions of storing it -- just use it directly from the register you stored it from.

IshKebab 3 days ago

I'm pretty sure XiangShan has a store queue. I expect the other chips mentioned do too - as I understand it it's a standard optimisation.