Comment by zackangelo
Comment by zackangelo 10 months ago
Yeah, I’ve had to rewrite continuous batching and other scheduling logic. That and multi-GPU inference have been the hardest things to build.
I’ll need to get paged attention working as well, but I think I can launch without it.
Are you aiming for Nvidia hardware with rust-cuda, or looking to integrate with non-Nvidia hardware?