HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by arcanemachiner

Comment by arcanemachiner 3 days ago

0 replies

View on Hacker News

The easiest way would be to quantize the model, and serve different quants based on the current demand. Higher volumes == worse quant == more customers served per GPU