Comment by cedws
Around the time DeepSeek R2 released there was chatter about how DeepSeek had had an “undocumented” PTX instruction to squeeze as much performance as possible from their hardware. My understanding is that it wasn’t any kind of secret instruction but just a novel way that they put the instruction together.
Would Luminal be capable of rediscovering this trick?
hopefully! i dont know the exact trick they used, but the idea is to design the search space such that that trick is discoverable.