nulld3v an hour ago

https://github.com/triton-lang/triton/pull/7298#discussion_r...

> By disassembly of ptxas, it is indeed hard-coded that they have logic like: strstr(kernel_name, "cutlass").

> it is likely that, this is an unstable, experimental, aggressive optimization by NVIDIA, and blindly always enabling it may produce some elusive bugs.

  • temp0826 an hour ago

    Thanks for a little context, this is not my wheelhouse at all (never even heard of this project) and I could not make heads or tails of the title or the linked PR.

kilpikaarna 35 minutes ago

Keeping it real with the commit msgs

  • RestartKernel 19 minutes ago

    I much prefer this over those AI generated commit messages that just say "reactored X" every single commit.

    • rkomorn 15 minutes ago

      So AI really does learn from humans...

  • IshKebab a few seconds ago

    I think it's fine if you squash it. I have no idea why they didn't squash it before pushing to GitHub though.

globular-toast 14 minutes ago

Someone really needs to learn to use `git commit --amend`. Almost 100 commits with pointless commit messages like "wip" or "x"? Be kinder to your reviewers...

  • Zardoz84 4 minutes ago

    and git commit --fixup git rebase -i --autosquash

fooker 43 minutes ago

When intel did it, the pitchforks came out.

Nvidia seems to get a pass. Whys that?

  • kimixa 19 minutes ago

    It really depends on details.

    If intentionally slowing non CUTLASS shaders, sure pitchfork time.

    If it's an option that /technically/ breaks the CUDA shader compatibility contract, then enabling it in specific "known good" situations is just business as usual for GPU drivers.

    That can be for all kinds of reasons - straightforward bugs or incomplete paths in the optimization implementation, the app not actually needing the stricter parts of the contract so can have a faster path, or even bugs in apps that need workarounds.

    Though piggybacking into these without understanding can be extremely fragile - you don't know why they've limited it, and you run the risk of tripping over some situation that will simply fail, either with incorrect results or something like a crash. And possibly in rather unexpected, unpredictable situations.

  • m00x 27 minutes ago

    This isn't the same thing

  • whatevaa 41 minutes ago

    Intel did this to consumers, nvidia does this to enterprises.