Comment by tracker1
At what cost though? Most AI operations are losing money, using a lot of power, including massive infrastructure costs, not to mention the hardware costs to get going, and that isn't even covering the level of usage many/most want, and certainly aren't going to pay even $100s/month per person that it currently costs to operate.
This is a really basic way to look at unit economics of inference.
I did some napkin math on this.
32x H100s cost 'retail' rental prices about $2/hr. I would hope that the big AI companies get it cheaper than this at their scale.
These 32 H100s can probably do something on the order of >40,000 tok/s on a frontier scale model (~700B params) with proper batching. Potentially a lot more (I'd love to know if someone has some thoughts on this).
So that's $64/hr or just under $50k/month.
40k tok/s is a lot of usage, at least for non-agentic use cases. There is no way you are losing money on paid chatgpt users at $20/month on these.
You'd still break even supporting ~200 Claude Code-esque agentic users who were using it at full tilt 40% of the day at $200/month.
Now - this doesn't include training costs or staff costs, but on a pure 'opex' basis I don't think inference is anywhere near as unprofitable as people make out.