Comment by embedding-shape
Comment by embedding-shape 20 hours ago
> If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size
Isn't that true for any LLM, MoE or not? In fact, doesn't that apply to most concepts within ML, as long as it's possible to do batching at all, you can scale it up and utilize more of the GPU, until you saturate some part of the process.