Comment by gwenzek

Comment by gwenzek 2 days ago

0 replies

It' s a bit early to compare directly to TensorRT because we don't have a full-blown equivalent.

Note that our focus is being platform agnostic, easy to deploy/integrate, good performance all-around, and ease of tweaking. We are using the same compiler than Jax, so our performances are on par. But generally we believe we can gain on overall "tok/s/$" by having shorter startup time, choosing the most efficient hardware available, and easily implementing new tricks like multi-token prediction.