Comment by helltone

Comment by helltone 3 days ago

3 replies

I have a background in program analysis, but I'm less familiar with the kind of kernels you are optimising.

- Can you give some more insight on why 12 ops suffice for representing your input program?

- With such a small number of ops, isn't your search space full of repeat patterns? I understand the will to have no predefined heuristics, but it seems that learning some heuristics/patterns would massively help reduce the space.

jafioti 3 days ago

we're just optimizing linear algebra, which is mostly made up of patterns of simple ops. for instance, matmul is just broadcasted multiply -> sum reduce.

the search does common subexpression elimination by default. if two patterns are unioned in the search space, it applies that union to every occurrence of that pattern at the same time, so using e-graphs it helps keep the search space smaller.

  • helltone 3 days ago

    Right I think I see it.

    This is insanely cool.

    But then there are performance tradeoffs in reusing intermediates vs recomputing that I think you can't represent.

    Some of these may affect numerical stability btw. See eg https://herbie.uwplse.org/

    There is so much potential in this project.

    • jafioti 3 days ago

      ah i see the confusion. we do common subexpression elimination of the terms in the search space (which allows single application of rewrites to apply to many repeat patterns) but the search can choose to re-use patterns of terms when we extract dags after the search space is built. so various levels of recomputation are searched.

      right now since we're profiling kernels, and we have a reference output of the unoptimised version, we can directly measure deviation of profiled outputs "for free" since we're already computing them for runtime. tbh this isn't what i want long term, i want to bake numerical stability natively into the search space to only extract dags that would produce stable outputs. hopefully that'll be solved soon.