Comment by cogman10

Comment by cogman10 6 hours ago

1 reply

The problem is the OpenCL development model is just garbage.

Compare the hello world of OpenCL [1] vs CUDA [2]. So much boilerplate and low level complexity for doing OpenCL whereas the CUDA example is just a few simple lines using the cuda compiler.

And what really sucks is it's pretty hard to get away from that complexity the way OpenCL is structured. You simply have to know WAY too much about the hardware of the machine you are running on, which means having the intel/amd/nvidia routes in your application logic when trying to make an OpenCL app.

Meanwhile, CUDA, because it's unapologetically just for nVidia cards, completely does away with that complexity in the happy path.

For something to be competitive with CUDA, the standard needs something like a platform agnostic bytecode to target so a common accelerated platform can scoop up the bytecode and run it on a given platform.

[1] https://github.com/intel/compute-samples/blob/master/compute...

[2] https://github.com/premprakashp/cuda-hello-world

winwang 4 hours ago

Yeah, not just OpenCL, but even "newer" standards like WebGPU. I considered making a blog post where I just put the two hello worlds side-by-side and say nothing else.

I was severely disappointed after seeing people praise WebGPU (I believe for being better than OpenGL).

As for the platform-agnostic bytecode, that's where something like MLIR would work too (kind of). But we could also simply just start with transpiling that bytecode into CUDA/PTX.

Better UX with wider platform compatibility: CuPy, Triton.