Comment by chermi

It's pretty well accepted now that for pre-training LLMs the curve is S not an exponential, right? Maybe it's all in RL post-training now, but my understanding(?) is that it's not nearly as expensive as pre-training. I don't think 3-6 months is the time to 10X improvement anymore (however that's measured), it seems closer to a year and growing assuming the plateau is real. I'd love to know if there are solid estimates on "doubling times" these days.

With the marginal gains diminishing, do we really think they're (all of them) are going to continue spending that much more for each generation? Even the big guys with the money like google can't justify increasing spending forever given this. The models are good enough for a lot of useful tasks for a lot of people. With all due respect to the amazing science and engineering, OpenAI (and probably the rest) have arrived at their performance with at least half of the credit going to brute-force compute, hence the cost. I don't think they'll continue that in the face of diminishing returns. Someone will ramp down and get much closer to making money, focusing on maximizing token cost efficiency to serve and utility to users with a fixed model(s). GPT-5 with it's auto-routing between different performance models seems like a clear move in this direction. I bet their cost to serve the same performance as say gemini 2.5 is much lower.

Naively, my view is that there's some threshold raw performance that's good enough for 80% of users, and we're near it. There's always going to be demand for bleeding edge, but money is in mass market. So if you hit that threshold, you ramp down training costs and focus on tooling + ease of use and token generation efficiency to match 80% of use cases. Those 80% of users will be happy with slowly increasing performance past the threshold, like iphone updates. Except they probably won't charge that much more since the competition is still there. But anyway, now they're spending way less on R&D and training, and the cost to serve tokens @ the same performance continues to drop.

All of this is to say, I don't think they're in that dreadful of a position. I can't even remember why I chose you to reply to, I think the "10x cheaper models in 3-6 months" caught me. I'm not saying they can drop R&D/training to 0. You wouldn't want to miss out on the efficiency of distillation, or whatever the latest innovations I don't know about are. Oh and also, I am confident that whatever the real number N is for NX cheaper in 3-6 months, a large fraction of that will come from hardware gains that are common to all of the labs.

necovek 9 hours ago

Someone brought up an interesting point: to get the latest data (news, scientific breakthroughs...) into the model, you need to constantly retrain it.

Reply View 3 replies

Ianjit 3 hours ago

The incremental compute costs will scale with the incremental data added, therefore training costs will grow at a much slower rate compared to when training was GPU limited.

Reply View | 0 replies
fennecbutt 3 hours ago

Or, you know, use rag. Which is far better and more accurate than regurgitating compressed training knowledge.

Reply View | 1 reply
- gmerc an hour ago
  
  Oh please
  
  Reply View | 0 replies

Spooky23 15 hours ago

Google has the best story imo. Gemini > Azure - it will accelerate GCP growth.

Reply View 0 replies