Comment by MikeTheGreat
Comment by MikeTheGreat 14 hours ago
(My apologies if this was already asked - this thread is huge and Find-In-Page-ing for variations of "pre-train", "pretrain", and "train" turned up nothing about this. If this was already asked I'd super-appreciate a pointer to the discussion :) )
Genuine question: How is it possible for OpenAI to NOT successfully pre-train a model?
I understand it's very difficult, but they've already successfully done this and they have a ton of incredibly skilled and knowledgeable, well-paid and highly knowledgeable employees.
I get that there's some randomness involved but it seems like they should be able to (at a minimum) just re-run the pre-training from 2024, yes?
Maybe the process is more ad-hoc (and less reproducible?) than I'm assuming? Is the newer data causing problems for the process that worked in 2024?
Any thoughts or ideas are appreciated, and apologies again if this was asked already!
> Genuine question: How is it possible for OpenAI to NOT successfully pre-train a model?
The same way everyone else fails at it.
Change some hyper parameters to match the new hardware (more params), maybe implement the latest improvements in papers after it was validated in a smaller model run. Start training the big boy, loss looks good, 2 months and millions of dollars later loss plateaus, do the whole SFT/RL shebang, run benchmarks.
It's not much better than the previous model, very tiny improvements, oops.