Comment by arcanemachiner
Comment by arcanemachiner 3 days ago
The easiest way would be to quantize the model, and serve different quants based on the current demand. Higher volumes == worse quant == more customers served per GPU
Comment by arcanemachiner 3 days ago
The easiest way would be to quantize the model, and serve different quants based on the current demand. Higher volumes == worse quant == more customers served per GPU