Comment by embedding-shape

> If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size

Isn't that true for any LLM, MoE or not? In fact, doesn't that apply to most concepts within ML, as long as it's possible to do batching at all, you can scale it up and utilize more of the GPU, until you saturate some part of the process.