Comment by tormeh

Comment by tormeh 6 days ago

I find it very hard to justify investing time into learning something that's neither open source nor has multiple interchangeable vendors. Being good at using Nvidia chips sounds a lot like being an ABAP consultant or similar to me. I realize there's a lot of money to be made in the field right now, but IIUC historically this kind of thing has not been a great move.

raincole 5 days ago

Yeah that was what I told myself a decade ago when I skipped CUDA class during college time.

Reply View 0 replies

the__alchemist 5 days ago

The principles of parallel computing, and how they work at hardware and driver levels are more broad. Some parts of it are provincial (Strong province though...), and others are more general.

It's hard to find skills that don't have a degree of provincialism. It's not a great feeling, but you more on. IMO, don't over-idealize the concept of general-knowledge to your detriment.

I think we can also untangle the open-source part from the general/provincial. There is more to the world worth exploring.

Reply View 0 replies

physicsguy 5 days ago

It really isn't that hard to pivot. It's worth saying that if you were already writing OpenMP and MPI code then learning CUDA wasn't particularly difficult to get started, and learning to write more performant CUDA code would also help you write faster CPU bound code. It's an evolution of existing models of compute, not a revolution.

Reply View 3 replies

Q6T46nT668w6i3m 5 days ago

I agree that “learning CUDA wasn’t particularly difficult to get started,” there are Grand Canyon sized chasms between CUDA and its alternatives when attempting to crank performance.

Reply View | 2 replies
- physicsguy 5 days ago
  
  Well, I think to a degree that depends what you're targeting.
  Single socket 8 core CPU? Yes.
  If you spent some time playing with trying to eke out performance on Xeon Phi and have done NUMA-aware code for multi socket boards and optimising for the memory hierarchy of L1/L2/L3 then it really isn't that different.
  
  Reply View | 0 replies
- j45 5 days ago
  
  It will improve for sure but this shouldn’t be downplayed.
  
  Reply View | 0 replies

saagarjha 6 days ago

Sure, but you can make money in the field and retire faster than it becomes irrelevant. FWIW none of the ideas here are novel or nontransferable–it's just the specific design that is proprietary. Understanding how to do an AllReduce has been of theoretical interest for decades and will probably remain worth doing far into the future.

Reply View 12 replies

j45 5 days ago

Tech is always like this.
You move from one thing to the next.
With your transferable skills, experience and thinking that is beyond one programming language.
Even Apple is simply exporting to CUDA now.

Reply View | 10 replies
- victor106 5 days ago
  
  > Even Apple is simply exporting to CUDA now.
  Really!!! Any resources you can share?
  
  Reply View | 1 reply
  
  j45 5 days ago
  
  It’s one way but still something.
  https://9to5mac.com/2025/07/15/apples-machine-learning-frame...
  
  Reply View | 0 replies
- almostgotcaught 5 days ago
  
  > Even Apple is simply exporting to CUDA now.
  This is like when journalists write clickbait article titles by omitting all qualifiers (eg "states banning fluoride" when it's only some states).
  One framework added a CUDA backend. You think all of Apple uses only one framework? Further what makes you think this even gets internal use?
  
  Reply View | 7 replies
  
  j45 5 days ago
  
  I didnt say any of those things at all.
  Only that Apple not only might use CUDA internally but made a public release available too.
  CUDA seems to be a trigger word in this thread for some.
  https://9to5mac.com/2025/07/15/apples-machine-learning-frame...
  
  Reply View | 6 replies
tormeh 5 days ago

Only in Silicon Valley. But if you can, definitely do.

Reply View | 0 replies

hackrmn 5 days ago

I grew up learning programming on a genuine IBM PC running MS-DOS, neither of which was FOSS but taught me plenty that I routinely rely on today in one form or another.

Reply View 1 reply

j45 5 days ago

Very true.
Stories of exploring DOS often ended up at hex editing and assembly.
Best to learn with whatever options are accessible, plenty is transferable.
While there is an embarrassment of options to learn from or with today the greatest gaffe can be overlooking learning.

Reply View | 0 replies

Philpax 6 days ago

There's more in common with other GPU architectures than there are differences, so a CUDA consultant should be able to pivot if/when the other players are a going concern. It's more about the mindset than the specifics.

Reply View 7 replies

dotancohen 6 days ago

I've been hearing that for over a decade. I can't even name off hand any CUDA competitors, none of them are likely to gain enough traction to upset CUDA in the coming decade.

Reply View | 6 replies
- Philpax 6 days ago
  
  Hence the "if" :-)
  ROCm is getting some adoption, especially as some of the world's largest public supercomputers have AMD GPUs.
  Some of this is also being solved by working at a different abstraction layer; you can sometimes be ignorant to the hardware you're running on with PyTorch. It's still leaky, but it's something.
  
  Reply View | 3 replies
  
  Q6T46nT668w6i3m 5 days ago
  
  Look at the state of PyTorch’s CI pipelines and you’ll immediately see that ROCm is a nightmare. Especially nowadays when TPU and MPS, while missing features, rarely create cascading failures throughout the stack.
  
  Reply View | 0 replies
  
  physicsguy 5 days ago
  
  I still don't see ROCm as that serious a threat, they're still a long way behind in library support.
  I used to use ROCFFT as an example, it was missing core functionality that cuFFT has had since like 2008. It looks like they've finally caught up now, but that's one library among many.
  
  Reply View | 0 replies
  
  j45 5 days ago
  
  Waiting just adds more dust to the skills pile.
  Programming languages are groups of syntax.
  
  Reply View | 0 replies
- einpoklum 5 days ago
  
  Talking about hardware rather than software, you have AMD and Intel. And - if your platform is not x86_64, NVIDIA is probably not even one of the competitors; and you have ARM, Qualcomm, Apple, Samsung and probably some others.
  
  Reply View | 0 replies
- sdenton4 5 days ago
  
  ...Well, the article compares GPUs to tpus, made by a competitor you probably know the name of...
  
  Reply View | 0 replies

qwertox 6 days ago

It's a valid point of view, but I don't see the value in sharing it.

There are enough people for who it's worth it, even if just for tinkering, and I'm sure you are aware of that.

It reads a bit like "You shouldn't use it because..."

Learning about Nvidia GPUs will teach you a lot about other GPUs as well, and there are a lot of tutorials about the former, so why not use it if it interests you?

Reply View 7 replies

woooooo 5 days ago

It's a useful bit of caution to remember transferrable fundamentals, I remember when Oracle wizards were in high demand.

Reply View | 6 replies
- sigbottle 5 days ago
  
  There are tons of ML compilers right now, FlashAttention brought back the cache-aware model to parallel programming, Moore's law hit is limit and heterogeneous hardware is taking taking off.
  Just some fundamentals I can think of off the top of my head. I'm surprised people saying that the lower level systems/hardware stuff are untransferable. These things are used everywhere. If anything, it's the AI itself that's potentially a bubble, but the fundamental need for understanding performance of systems & design is always there.
  
  Reply View | 1 reply
  
  woooooo 5 days ago
  
  Im actually doing a ton of research in the area myself, the caution was against becoming an Nvidia expert narrowly rather than a general low level programmer with Nvidia skills included.
  
  Reply View | 0 replies
- NikolaNovak 5 days ago
  
  I mean, I'm in Toronto Canada, a fairly big city and market, and have an open seat for a couple of good senior Oracle DBAs pretty much constantly. The market may have reduced over decades but there's still more demand than supply. And the core DBA skills are transferable to other RDBMS as well. While I agree that some niche technologies are fleeting, it's perhaps not the best example :-)
  
  Reply View | 3 replies
  
  woooooo 5 days ago
  
  That's actually interesting! My experience is different, especially compared to the late 90s and early 00s, most people avoid Oracle if they can. But yes, its always worth having someone who's job is to think about the database if it's your lynchpin.
  
  Reply View | 2 replies

pornel 5 days ago

There are two CUDAs – a hardware architecture, and a software stack for it.

The software is proprietary, and easy to ignore if you don't plan to write low-level optimizations for NVIDIA.

However, the hardware architecture is worth knowing. All GPUs work roughly the same way (especially on the compute side), and the CUDA architecture is still fundamentally the same as it was in 2007 (just with more of everything).

It dictates how shader languages and GPU abstractions work, regardless of whether you're using proprietary or open implementations. It's very helpful to understand peculiarities of thread scheduling, warps, different levels of private/shared memory, etc. There's a ridiculous amount of computing power available if you can make your algorithms fit the execution model.

Reply View 0 replies

deltaburnt 5 days ago

This is a JAX article, a parallel computation library that's meant to abstract away vendor specific details. Obviously if you want the most performance you need to know specifics of your hardware, but learning the high level of how a GPU vs TPU works seems like useful knowledge regardless.

Reply View 1 reply

behnamoh 5 days ago

> abstract away vendor specific details
Sounds good on paper but unfortunately I've had numerous issues with these "abstractors". For example, PyTorch had serious problems on Apple Silicon even though technically it should "just work" by hiding the implementation details.
In reality, what ends up happening is that some features in JAX, PyTorch, etc. are designed with CUDA in mind, and Apple Silicon is an afterthought.

Reply View | 0 replies

bee_rider 5 days ago

I think I’d rather get familiar with cupy or Jax or something. Blas/lapack wrappers will never go out of style. It is a subset of the sort of stuff you can do on a GPU but it seems like a nice effort:functionality reward ratio.

Reply View 0 replies

moralestapia 5 days ago

It's money. You would do it for money.

Reply View 1 reply

j45 5 days ago

Generally, earning an honest living seems to be a requirement of this world that individual words and beliefs won’t change.
Work keeps us humble enough to be open to learn.

Reply View | 0 replies

augment_me 5 days ago

You can write software for the hardware in a cross-compiled language like Triton. The hardware reality stays the same, a company like Cerebras might have the superior architecture, but you have server rooms filled with H100, A100, and MI300s whether you believe in the hardware or not.

Reply View 0 replies

WithinReason 6 days ago

What's in this article would apply to most other hardware, just with slightly different constants

Reply View 0 replies

j45 5 days ago

Nvidia also trotted along with a low share price for a long time financing and supporting what they believed in.

When cuda rose to prominence were there any viable alternatives?

Reply View 0 replies

rvz 5 days ago

> I find it very hard to justify investing time into learning something that's neither open source nor has multiple interchangeable vendors.

Better not learn CUDA then.

Reply View 0 replies

amelius 6 days ago

I mean it is similar to investing time in learning assembly language.

For most IT folks it doesn't make much sense.

Reply View 0 replies