Comment by fooker

Comment by fooker a day ago

3 replies

Remat can produce a performance boost even when everything has a register.

Admittedly, this comes up more often in non-CPU backends.

pizlonator a day ago

> Remat can produce a performance boost even when everything has a register.

Can you give an example?

  • fooker a day ago

    Rematerializing 'safe' computation from across a barrier or thread sync/wait works wonders.

    Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.

    • pizlonator a day ago

      > Rematerializing 'safe' computation from across a barrier or thread sync/wait works wonders.

      While this is literally "rematerialization", it's such a different case of remat from what I'm talking about that it should be a different phase. It's optimizing for a different goal.

      Also feels very GPU specific. So I'd imagine this being a pass you only add to the pipeline if you know you're targeting a GPU.

      > Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.

      This also feels like it's gotta be GPU specific.

      No chance that doing this on a CPU would be a speed-up unless it saved you reg pressure.