Comment by NBJack

This assumes your model is static and never needs to be improved or updated.

Inference is cheap because the final model, despite its size, is ridiculously less resource intensive to use than it is to produce.

ChatGPT in its latest form isn't bad by any means, but it is falling behind. And that requires significant overhead, both to train and to iterate on model architecture. It is often a variable cost as well.