Comment by rkangel
Shipping binary artifacts isn't inherently a bad thing - that's what (most) Linux Distros do after all! The important distinction is how that binary package was arrived at. If it's a mystery and the source isn't available then that's bad. If it's all in the open source repo and part of the Python package build and is completely reproducible then that's great.
GP's solution is the one that I (and I think most people) ultimately wind up using. But it's not a nice time in the usual "oh we'll just compile it" sense of a typical package.
flash-attn in particular has its build so badly misconfigured and is so heavy that it will lock up a modern Zen 5 machine with 128GB of DDR5 if you don't re-nice ninja (assuming of course that you remembered it just won't work without a pip-visible ninja). It can't build a wheel (at least not obviously) that will work correctly on Ampere and Hopper both, it incorrectly declares it's dependencies so it will demand torch even if torch is in your pyproject.toml and you end up breaking build isolation.
So now you've got your gigabytes of fragile wheel that won't run on half your cards, let's make a wheel registry. Oh, and machine learning everything needs it: half of diffusers crashes at runtime without it. Runtime.
The dirty little secret of these 50MM offers at AI companies is that way more people understand the math (which is actually pretty light compared to say graduate physics) than can build and run NVIDIA wheels at scale. The wizards who Zuckerberg will fellate are people who know some math and can run Torch on a mixed Hopper/Blackwell fleet.
And this (I think) is Astral's endgame. I think pyx is going to fix this at scale and they're going to abruptly become more troublesome to NVIDIA than George Hotz or GamersNexus.