Comment by lhl
On the corp side you have FB w/ PyTorch, xformers (still pretty iffy on AMD support tbt) and MS w/ DeepSpeed. But let's see about some others:
Flash Attention: academia, 2y behind for AMD support
bitsandbytes: academia, 2y behind for AMD support
Marlin: academia, no AMD support
FlashInfer: acadedmia/startup, no AMD
ThunderKittens: academia, no AMD support
DeepGEMM, DeepEP, FlashMLA: ofc, nothing from China supports AMD
Without the long tail AMD will continue to always be in a position where they have to scramble to try to add second tier support years later themselves, while Nvidia continues to get all the latest and greatest for free.
This is just off the top of my head on the LLM side where I'm focused on, btw. Whenever I look at image/video it's even more grim.
Modular says Max/Mojo will change this and make refactoring between different vendors (and different lines of the same vendor) less of a showstopper but tbd for now