Comment by atemerev
HIP is now somewhat viable (and ROCm is now all HIP).
But — too late. First versions of ROCm were terrible. Too much boilerplate. 1200 lines of template-heavy C++ for a simple FFT. Can't just start hacking around.
Since then, the CUDA way is cemented in minds of developers. Intel now has oneAPI, and it is not too bad, and hackable, but there is no hardware and no one will learn it. And HIP is "CUDA-like", so why not CUDA, unless you _have to_ use AMD hardware.
Tl;dr first versions of ROCm were bad. Now they are better, but it is too late.