Comment by alecco

Comment by alecco 8 days ago

Ignore everybody else. Start with CUDA Thrust. Study carefully their examples. See how other projects use Thrust. After a year or two, go deeper to cub.

Do not implement algorithms by hand. Recent architectures are extremely hard to reach decent occupancy and such. Thrust and cub solve 80% of the cases with reasonable trade-offs and they do most of the work for you.

https://developer.nvidia.com/thrust

bee_rider 8 days ago

It looks quite nice just from skimming the link.

But, I don’t understand the comparison to TBB. Do they have a version of TBB that runs on the GPU natively? If the TBB implementation is on the CPU… that’s just comparing two different pieces of hardware. Which would be confusing, bordering on dishonest.

Reply View 1 reply

alecco 7 days ago

The TBB comparison is a marketing leftover from 10 years ago when they were trying to convince people that NVIDIA GPUs were much faster than Intel CPUs for parallel problems.

Reply View | 0 replies