Comment by qeternity
Second line of the post:
> The main objective is to learn writing attention in CUDA C++, since many features are not available in Triton, such as MXFP8 / NVFP4 MMA for sm120.
Second line of the post:
> The main objective is to learn writing attention in CUDA C++, since many features are not available in Triton, such as MXFP8 / NVFP4 MMA for sm120.
How many PRs do you have landed in Triton that you can just blithely say "contribute it"?
Yes… I read it. If the feature is missing, why not contribute it instead?