Comment by tormeh

Comment by tormeh 6 days ago

48 replies

I find it very hard to justify investing time into learning something that's neither open source nor has multiple interchangeable vendors. Being good at using Nvidia chips sounds a lot like being an ABAP consultant or similar to me. I realize there's a lot of money to be made in the field right now, but IIUC historically this kind of thing has not been a great move.

raincole 5 days ago

Yeah that was what I told myself a decade ago when I skipped CUDA class during college time.

the__alchemist 5 days ago

The principles of parallel computing, and how they work at hardware and driver levels are more broad. Some parts of it are provincial (Strong province though...), and others are more general.

It's hard to find skills that don't have a degree of provincialism. It's not a great feeling, but you more on. IMO, don't over-idealize the concept of general-knowledge to your detriment.

I think we can also untangle the open-source part from the general/provincial. There is more to the world worth exploring.

physicsguy 5 days ago

It really isn't that hard to pivot. It's worth saying that if you were already writing OpenMP and MPI code then learning CUDA wasn't particularly difficult to get started, and learning to write more performant CUDA code would also help you write faster CPU bound code. It's an evolution of existing models of compute, not a revolution.

  • Q6T46nT668w6i3m 5 days ago

    I agree that “learning CUDA wasn’t particularly difficult to get started,” there are Grand Canyon sized chasms between CUDA and its alternatives when attempting to crank performance.

    • physicsguy 5 days ago

      Well, I think to a degree that depends what you're targeting.

      Single socket 8 core CPU? Yes.

      If you spent some time playing with trying to eke out performance on Xeon Phi and have done NUMA-aware code for multi socket boards and optimising for the memory hierarchy of L1/L2/L3 then it really isn't that different.

    • j45 5 days ago

      It will improve for sure but this shouldn’t be downplayed.

saagarjha 6 days ago

Sure, but you can make money in the field and retire faster than it becomes irrelevant. FWIW none of the ideas here are novel or nontransferable–it's just the specific design that is proprietary. Understanding how to do an AllReduce has been of theoretical interest for decades and will probably remain worth doing far into the future.

  • j45 5 days ago

    Tech is always like this.

    You move from one thing to the next.

    With your transferable skills, experience and thinking that is beyond one programming language.

    Even Apple is simply exporting to CUDA now.

  • tormeh 5 days ago

    Only in Silicon Valley. But if you can, definitely do.

hackrmn 5 days ago

I grew up learning programming on a genuine IBM PC running MS-DOS, neither of which was FOSS but taught me plenty that I routinely rely on today in one form or another.

  • j45 5 days ago

    Very true.

    Stories of exploring DOS often ended up at hex editing and assembly.

    Best to learn with whatever options are accessible, plenty is transferable.

    While there is an embarrassment of options to learn from or with today the greatest gaffe can be overlooking learning.

Philpax 6 days ago

There's more in common with other GPU architectures than there are differences, so a CUDA consultant should be able to pivot if/when the other players are a going concern. It's more about the mindset than the specifics.

  • dotancohen 6 days ago

    I've been hearing that for over a decade. I can't even name off hand any CUDA competitors, none of them are likely to gain enough traction to upset CUDA in the coming decade.

    • Philpax 6 days ago

      Hence the "if" :-)

      ROCm is getting some adoption, especially as some of the world's largest public supercomputers have AMD GPUs.

      Some of this is also being solved by working at a different abstraction layer; you can sometimes be ignorant to the hardware you're running on with PyTorch. It's still leaky, but it's something.

      • Q6T46nT668w6i3m 5 days ago

        Look at the state of PyTorch’s CI pipelines and you’ll immediately see that ROCm is a nightmare. Especially nowadays when TPU and MPS, while missing features, rarely create cascading failures throughout the stack.

      • physicsguy 5 days ago

        I still don't see ROCm as that serious a threat, they're still a long way behind in library support.

        I used to use ROCFFT as an example, it was missing core functionality that cuFFT has had since like 2008. It looks like they've finally caught up now, but that's one library among many.

      • j45 5 days ago

        Waiting just adds more dust to the skills pile.

        Programming languages are groups of syntax.

    • einpoklum 5 days ago

      Talking about hardware rather than software, you have AMD and Intel. And - if your platform is not x86_64, NVIDIA is probably not even one of the competitors; and you have ARM, Qualcomm, Apple, Samsung and probably some others.

    • sdenton4 5 days ago

      ...Well, the article compares GPUs to tpus, made by a competitor you probably know the name of...

qwertox 6 days ago

It's a valid point of view, but I don't see the value in sharing it.

There are enough people for who it's worth it, even if just for tinkering, and I'm sure you are aware of that.

It reads a bit like "You shouldn't use it because..."

Learning about Nvidia GPUs will teach you a lot about other GPUs as well, and there are a lot of tutorials about the former, so why not use it if it interests you?

  • woooooo 5 days ago

    It's a useful bit of caution to remember transferrable fundamentals, I remember when Oracle wizards were in high demand.

    • sigbottle 5 days ago

      There are tons of ML compilers right now, FlashAttention brought back the cache-aware model to parallel programming, Moore's law hit is limit and heterogeneous hardware is taking taking off.

      Just some fundamentals I can think of off the top of my head. I'm surprised people saying that the lower level systems/hardware stuff are untransferable. These things are used everywhere. If anything, it's the AI itself that's potentially a bubble, but the fundamental need for understanding performance of systems & design is always there.

      • woooooo 5 days ago

        Im actually doing a ton of research in the area myself, the caution was against becoming an Nvidia expert narrowly rather than a general low level programmer with Nvidia skills included.

    • NikolaNovak 5 days ago

      I mean, I'm in Toronto Canada, a fairly big city and market, and have an open seat for a couple of good senior Oracle DBAs pretty much constantly. The market may have reduced over decades but there's still more demand than supply. And the core DBA skills are transferable to other RDBMS as well. While I agree that some niche technologies are fleeting, it's perhaps not the best example :-)

      • woooooo 5 days ago

        That's actually interesting! My experience is different, especially compared to the late 90s and early 00s, most people avoid Oracle if they can. But yes, its always worth having someone who's job is to think about the database if it's your lynchpin.

pornel 5 days ago

There are two CUDAs – a hardware architecture, and a software stack for it.

The software is proprietary, and easy to ignore if you don't plan to write low-level optimizations for NVIDIA.

However, the hardware architecture is worth knowing. All GPUs work roughly the same way (especially on the compute side), and the CUDA architecture is still fundamentally the same as it was in 2007 (just with more of everything).

It dictates how shader languages and GPU abstractions work, regardless of whether you're using proprietary or open implementations. It's very helpful to understand peculiarities of thread scheduling, warps, different levels of private/shared memory, etc. There's a ridiculous amount of computing power available if you can make your algorithms fit the execution model.

deltaburnt 5 days ago

This is a JAX article, a parallel computation library that's meant to abstract away vendor specific details. Obviously if you want the most performance you need to know specifics of your hardware, but learning the high level of how a GPU vs TPU works seems like useful knowledge regardless.

  • behnamoh 5 days ago

    > abstract away vendor specific details

    Sounds good on paper but unfortunately I've had numerous issues with these "abstractors". For example, PyTorch had serious problems on Apple Silicon even though technically it should "just work" by hiding the implementation details.

    In reality, what ends up happening is that some features in JAX, PyTorch, etc. are designed with CUDA in mind, and Apple Silicon is an afterthought.

bee_rider 5 days ago

I think I’d rather get familiar with cupy or Jax or something. Blas/lapack wrappers will never go out of style. It is a subset of the sort of stuff you can do on a GPU but it seems like a nice effort:functionality reward ratio.

moralestapia 5 days ago

It's money. You would do it for money.

  • j45 5 days ago

    Generally, earning an honest living seems to be a requirement of this world that individual words and beliefs won’t change.

    Work keeps us humble enough to be open to learn.

augment_me 5 days ago

You can write software for the hardware in a cross-compiled language like Triton. The hardware reality stays the same, a company like Cerebras might have the superior architecture, but you have server rooms filled with H100, A100, and MI300s whether you believe in the hardware or not.

WithinReason 6 days ago

What's in this article would apply to most other hardware, just with slightly different constants

j45 5 days ago

Nvidia also trotted along with a low share price for a long time financing and supporting what they believed in.

When cuda rose to prominence were there any viable alternatives?

rvz 5 days ago

> I find it very hard to justify investing time into learning something that's neither open source nor has multiple interchangeable vendors.

Better not learn CUDA then.

amelius 6 days ago

I mean it is similar to investing time in learning assembly language.

For most IT folks it doesn't make much sense.