Comment by dist-epoch
Comment by dist-epoch 8 days ago
As they typically say: Just Do It (tm).
Start writing some CUDA core to sort an array or find the maximum element.
Comment by dist-epoch 8 days ago
As they typically say: Just Do It (tm).
Start writing some CUDA core to sort an array or find the maximum element.
I'd rather learn to use a library that works on any brand of GPU.
If that is not an option, I'll wait!
Then learn PyTorch.
The hardware between brands is fundamentally different. There isn't a standard like x86 for CPUs.
So, while you may use something like HIPIFY to translate your code between APIs, at least with GPU programming, it makes sense to learn how they differ from each other or just pick one of them and work with it knowing that the others will just be some variation of the same idea.
the jobs requiring cuda experience are most of the times because torch is not good enough
Isn't this basically what Mojo is attempting? "Vendor independent GPU programmability", according to Modular.
This is continuously a point of frustration! Vulkan compute is... suboptimal. I use Cuda because it feels like the only practical option. I want Vulkan or something else to compete seriously, but until that happens, I will use Cuda.
Is https://github.com/KomputeProject/kompute + https://shader-slang.org/ getting there?
Runs on anything + auto-differentiatation.
It took until Vulkanised 2025, to acknowledge Vulkan became the same mess as OpenGL, and to put an action plan into action to try to correct this.
Had it not been for Apple with OpenCL initial contribution, regardless of how it went from there, AMD with Mantle as starting point for Vulkan, NVidia with Vulkan-Hpp and Slang, and the ecosystem of Khronos standards would be much worse.
Also Vulkan isn't as bad as OpenGL tooling, because LunarG exists, and someone pays them for the whole Vulkan SDK.
The attitude "we put paper standards" and the community should step in for the implementations and tooling, hardly comes to the productivity from private APIs tooling.
Also all GPU vendors, including Intel and AMD, also rather push their own compute APIs, even if based on top of Khronos ones.
K, bud.
Perhaps you haven't noticed, but you're in a thread that asked about CUDA, explicitly.
I concur with this. Then supplement with resources A/R. Ideally, find some tasks in your programs that are parallelize. (Learning what these are is important too!), and switch them to Cuda. If you don't have any, make a toy case, e.g. an n-body simulation.