Comment by Philpax
Hence the "if" :-)
ROCm is getting some adoption, especially as some of the world's largest public supercomputers have AMD GPUs.
Some of this is also being solved by working at a different abstraction layer; you can sometimes be ignorant to the hardware you're running on with PyTorch. It's still leaky, but it's something.
Look at the state of PyTorch’s CI pipelines and you’ll immediately see that ROCm is a nightmare. Especially nowadays when TPU and MPS, while missing features, rarely create cascading failures throughout the stack.