Comment by throwaway81523

Comment by throwaway81523 8 days ago

I looked at the CUDA code for Leela Chess Zero and found it pretty understandable, though that was back when Leela used a DCNN instead of transformers. DCNN's are fairly simple and are explained in fast.ai videos that I watched a few years ago, so navigating the Leela code wasn't too difficult. Transformers are more complicated and I want to bone up on them, but I haven't managed to spend any time understanding them.

CUDA itself is just a minor departure from C++, so the language itself is no big deal if you've used C++ before. But, if you're trying to get hired programming CUDA, what that really means is they want you implementing AI stuff (unless it's game dev). AI programming is a much wider and deeper subject than CUDA itself, so be ready to spend a bunch of time studying and hacking to come up to speed in that. But if you do, you will be in high demand. As mentioned, the fast.ai videos are a great introduction.

In the case of games, that means 3D graphics which these days is another rabbit hole. I knew a bit about this back in the day, but it is fantastically more sophisticated now and I don't have any idea where to even start.

upmind 8 days ago

This is a great idea! This is the code right' https://github.com/leela-zero/leela-zero

I have two beginner (and probably very dumb) questions, why do they have heavy c++/cuda usage rather than using only pytorch/tensorflow. Are they too slow for training Leela? Second, why is there tensorflow code?

Reply View 3 replies

henrikf 8 days ago

That's Leela Zero (plays Go instead of Chess). It was good for its time (~2018) but it's quite outdated now. It also uses OpenCL instead of Cuda. I wrote a lot of that code including Winograd convolution routines.
Leela Chess Zero (https://github.com/LeelaChessZero/lc0) has much more optimized Cuda backend targeting modern GPU architectures and it's written by much more knowledgeable people than me. That would be a much better source to learn.

Reply View | 0 replies
throwaway81523 8 days ago

As I remember, the CUDA code was about 3x faster than the tensorflow code. The tensorflow stuff is there for non-Nvidia GPU's. This was in the era of the GTX 1080 or 2080. No idea about now.

Reply View | 1 reply
- upmind 8 days ago
  
  Ah I see, thanks a lot!
  
  Reply View | 0 replies

robotnikman 8 days ago

>But if you do, you will be in high demand

So I'm guessing trying to find a job as a CUDA programmer is nowhere as big of a headache compared to other software engineering jobs right now? I'm thinking maybe learning CUDA and more about AI might be a good pivot from the current position as a Java middleware developer.

Reply View 1 reply

randomNumber7 7 days ago

It is likely much more focused on mathematics compared to what a usual java dev does.

Reply View | 0 replies