Comment by jafioti

Comment by jafioti 3 days ago

4 replies

yup! we build a search space by iteratively applying rewrite rules in every possible order (using e-graphs to do this efficiently). the rewrites alter stuff like looping / tiling structures, as well as algebraic rewrites like softmax to online softmax (and then flash attention).

yes optimized kernels for one system will work on other systems with the same hardware. its fine to take a long time compiling if you just compile once and run a lot.

_0ffh 3 days ago

Is/will it be possible to just write a model component with Luminal and then use that as a building block in e.g. Torch or JAX?

almostgotcaught 3 days ago

> take a long time compiling

Lol np-hard is still np-hard no matter how you slice it (especially given vague objective functions).