Comment by zackangelo
Comment by zackangelo a day ago
Yeah, I’ve had to rewrite continuous batching and other scheduling logic. That and multi-GPU inference have been the hardest things to build.
I’ll need to get paged attention working as well, but I think I can launch without it.
This is awesome, are you contributing this to candle or is it a standalone package?