Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it
(github.com)71 points by mmastrac 2 hours ago
71 points by mmastrac 2 hours ago
I much prefer this over those AI generated commit messages that just say "reactored X" every single commit.
This was discussed before at the time the PR was created and there's nothing new that I can see.
Someone really needs to learn to use `git commit --amend`. Almost 100 commits with pointless commit messages like "wip" or "x"? Be kinder to your reviewers...
It really depends on details.
If intentionally slowing non CUTLASS shaders, sure pitchfork time.
If it's an option that /technically/ breaks the CUDA shader compatibility contract, then enabling it in specific "known good" situations is just business as usual for GPU drivers.
That can be for all kinds of reasons - straightforward bugs or incomplete paths in the optimization implementation, the app not actually needing the stricter parts of the contract so can have a faster path, or even bugs in apps that need workarounds.
Though piggybacking into these without understanding can be extremely fragile - you don't know why they've limited it, and you run the risk of tripping over some situation that will simply fail, either with incorrect results or something like a crash. And possibly in rather unexpected, unpredictable situations.
https://github.com/triton-lang/triton/pull/7298#discussion_r...
> By disassembly of ptxas, it is indeed hard-coded that they have logic like: strstr(kernel_name, "cutlass").
> it is likely that, this is an unstable, experimental, aggressive optimization by NVIDIA, and blindly always enabling it may produce some elusive bugs.