Comment by martinpw
> But I think more importantly, what is often missed in this analysis is that most programmers doing ML work aren't writing their own custom kernels. They're just using pytorch (or maybe something even more abstracted/multi-backend like keras 3.x) and let the library deal with implementation details related to their GPU.
That would imply that AMD could just focus on implementing good PyTorch support on their hardware and they would be able to start taking market share. Which doesn't sound like much work compared with writing a full CUDA competitor. But that does not seem to be the strategy, which implies it is not so simple?
I am not an ML engineer so don't have first hand experience, but those I have talked to say they depend on a lot more than just one or two key libraries. But my sample size is small. Interested in other perspectives...
> But that does not seem to be the strategy, which implies it is not so simple?
That is exactly what has been happening [1], and not just in pytorch. Geohot has been very dedicated in working with AMD to upgrade their station in this space [2]. If you hang out in the tinygrad discord, you can see this happening in real time.
> those I have talked to say they depend on a lot more than just one or two key libraries.
Theres a ton of libraries out there yes, but if we're talking about python and the libraries in question are talking to GPUs its going to be exceedingly rare that theyre not using one of these under the hood: pytorch, tensorflow, jax, keras, et al.
There are of course exceptions to this, particularly if you're not using python for your ML work (which is actually common for many companies running inference at scale and want better runtime performance, training is a different story). But ultimately the core ecosystem does work just fine with AMD GPUs, provided you're not doing any exotic custom kernel work.
(EDIT: just realized my initial comment unintentionally borrowed the "moat" commentary from geohot's blog. A happy accident in this case, but still very much rings true for my day to day ML dev experience)
[1] https://github.com/pytorch/pytorch/pulls?q=is%3Aopen+is%3Apr...
[2] https://geohot.github.io//blog/jekyll/update/2025/03/08/AMD-...