Comment by zackangelo
Comment by zackangelo a day ago
Their inference server is written in Rust using huggingface’s Candle crate. One of the Moshi authors is also the primary author of Candle.
We’ve also been building our inference stack on top of Candle, I’m really happy with it.
Super interested. Do you have an equivalent of vLLM? Did you have to rewrite batching, paged attention…?