Comment by Zababa

Comment by Zababa 4 days ago

0 replies

>To get to the point of executing a successful training run like that, you have to count every failed experiment and experiment that gets you to the final training run.

I get the sentiment, but then, do you count all the other experiments that were done by that company before specifically trying to train this model? All the experiments done by people in that company at other companies? Since they rely on that experience to train models.

You could say "count everything that has been done since the last model release", but then for the same amount of effort/GPU, if you release 3 models does that divide each model cost by 3?

Genuinely curious in how you think about this, I think saying "the model cost is the final training run" is fine as it seems standard ever since DeepSeek V3, but I'd be interested if you have alternatives. Possibly "actually don't even talk about model cost as it will always be misleading and you can never really spend the same amount of money to get the same model"?