ZML - High performance AI inference stack
(github.com)35 points by msoad 3 days ago
35 points by msoad 3 days ago
ZML the zig library is mostly a wrapper of StableHLO/pjrt. But it's a high quality wrapper, and the tagged tensor syntax is really helpful to write complex ops like dot, or gather.
And ZML the framework also resolve issues with the complex dependency graph of stablehlo/pjrt.
First of all, great job! I think the inference will become more and more important.
That being said, I have a question regarding the ease of use. How difficult it is for someone with python/c++ background to get used to zig and (re)write a model to use with zml?
Hi co-author here. Zig is way simpler than C++. Simple like in an afternoon I was able to onboard in the language and rewrote the core meat of a C++ algorithm and see speed gains (fastBPE for reference).
Coming from Python, the hardest part is learning memory management. What helps with ZML is that the model code is mostly meta programming, so we can be a bit flexible there.
We have a high level API, that should feel familiar to Pytorch user (as myself), but improves in a few ways
It' s a bit early to compare directly to TensorRT because we don't have a full-blown equivalent.
Note that our focus is being platform agnostic, easy to deploy/integrate, good performance all-around, and ease of tweaking. We are using the same compiler than Jax, so our performances are on par. But generally we believe we can gain on overall "tok/s/$" by having shorter startup time, choosing the most efficient hardware available, and easily implementing new tricks like multi-token prediction.
my dreams have come true. hardware-agnostic ml primitives in a typed, compiled language.
my only question is: is zig stable enough to base such a project on?
Zig has been relatively stable for the past few years for the main Zig code. What has changed the most is the `build.zig` build system (which we aren't using).
We are also looking ahead at Zig roadmap, and trying to anticipate upcoming breaking changes, and isolate our users from that.
What would be the benefit of using ZML instead of relying on StableHLO/PJRT? Because the cost of porting models is for sure high.