Comment by jl6
If the explanation really is, as many comments here suggest, that prompts can be run in parallel in batches at low marginal additional cost, then that feels like bad news for the democratization and/or local running of LLMs. If it’s only cost-effective to run a model for ~thousands of people at the same time, it’s never going to be cost-effective to run on your own.
Sure, but that's how most of human society works already.
It's more cost effective to farm eggs from a hundred thousand chickens than it is for individuals to have chickens in their yard.
You CAN run a GPT-class model on your own machine right now, for several thousand dollars of machine... but you can get massively better results if you spend those thousands of dollars on API credits over the next five years or so.
Some people will choose to do that. I have backyard chickens, they're really fun! Most expensive eggs I've ever seen in my life.