Comment by hyperbovine

Comment by hyperbovine a day ago

6 replies

I'm willing to bet almost nobody you know calls the CUDA API directly. What AMD needs to focus on is getting the ROCm backend going for XLA and PyTorch. That would unlock a big slice of the market right there.

They should also be dropping free AMD GPUs off helicopters, as Nvidia did a decade or so ago, in order to build up an academic userbase. Academia is getting totally squeezed by industry when it comes to AI compute. We're mostly running on hardware that's 2 or 3 generations out of date. If AMD came with a well supported GPU that cost half what an A100 sells for, voila you'd have cohort after cohort of grad students training models on AMD and then taking that know-how into industry.

bwfan123 a day ago

Indeed. the user-facing software stack componentry - pytorch and jax/xla - are owned by meta, and google and open sourced. Further, the open-source models (llama/deepseek) are largely hw agnostic. There is really no user or eco-system lock-in. Also, clouds are highly incentivized to have multiple hardware alternatives.

pjmlp a day ago

HN keeps forgetting game development and VFX exists.

  • hyperbovine a day ago

    What fraction of Nvidia revenue comes from those applications?

    • akshayt 11 hours ago

      About 0.1% from professional visualization in Q1 this year

    • pjmlp a day ago

      Lets put it this way, they need graphics cards, and CUDA is now relatively common.

      For example OTOY OctaneRender, one of the key renders in Hollywood.

aseipp a day ago

There already is ROCm support for PyTorch. Then there's stuff like this: https://semianalysis.com/2024/12/22/mi300x-vs-h100-vs-h200-b...

They have improved since that article, by a decent amount from my understanding. But by now, it isn't enough to have "a backend". The historical efforts have spoiled that narrative so badly that it won't be enough to just have a pytorch-rocm pypi package; some of that flak is unfair though not completely unsubstantiated. But frankly they need to deliver better software, across all their offerings, for multiple successive generations before the bad optics around their software stack will start fading. Their competitors are already on their next gen architecture since that article was written.

You are correct that people don't really invoke CUDA APIs much, but that's partially because those APIs actually work and deliver good performance, so things can actually be built on top of them.