Comment by warangal

Pretty cool project!, I have been also trying to do something similar with very limited (abstract) OPs akin to fundamental computer instructions. Just using the numpy backend for now to test theory, but neat thing is that most of complexity lies in the abstract space like deciding which memory accesses could be coalesced even before generating the final code for a specific backend! As far as i know most of DL compilers struggle to generate optimum code, as model starts getting bigger and bigger . Halide project was/is a very cool project that speed up many kernels just by finding better cache/memory access pattern. If you happen to share more insights about your projects through blog-posts or whitepaper that would be really helpful.