Comment by SonOfLilit
Comment by SonOfLilit 8 days ago
Prefix scan is a great intro to GPU programming:
https://developer.download.nvidia.com/compute/cuda/2_2/sdk/w...
After this you should be able to tell whether you enjoy this kind of work.
If you do, try to do a reasonably optimized GEMM, and then try to follow the FlashAttention paper and implement a basic version of what they're doing.