Comment by MikeTheGreat

Comment by MikeTheGreat 14 hours ago

12 replies

(My apologies if this was already asked - this thread is huge and Find-In-Page-ing for variations of "pre-train", "pretrain", and "train" turned up nothing about this. If this was already asked I'd super-appreciate a pointer to the discussion :) )

Genuine question: How is it possible for OpenAI to NOT successfully pre-train a model?

I understand it's very difficult, but they've already successfully done this and they have a ton of incredibly skilled and knowledgeable, well-paid and highly knowledgeable employees.

I get that there's some randomness involved but it seems like they should be able to (at a minimum) just re-run the pre-training from 2024, yes?

Maybe the process is more ad-hoc (and less reproducible?) than I'm assuming? Is the newer data causing problems for the process that worked in 2024?

Any thoughts or ideas are appreciated, and apologies again if this was asked already!

nodja 12 hours ago

> Genuine question: How is it possible for OpenAI to NOT successfully pre-train a model?

The same way everyone else fails at it.

Change some hyper parameters to match the new hardware (more params), maybe implement the latest improvements in papers after it was validated in a smaller model run. Start training the big boy, loss looks good, 2 months and millions of dollars later loss plateaus, do the whole SFT/RL shebang, run benchmarks.

It's not much better than the previous model, very tiny improvements, oops.

  • yalok 6 hours ago

    add to it multiple iterations of having to restart pretraining from some earlier checkpoint when loss plateaus too early or starts increasing due to some bugs…

  • thefourthchime 11 hours ago

    Isn't that what GPT 4.5 was?

    • wrsh07 10 hours ago

      That was a large model that iiuc was too expensive to serve profitably

      Many people thought it was an improvement though

encomiast 13 hours ago

I’m not sure what ‘successfully’ means in this context. If it means training a model that is noticeably better than previous models, it’s not hard to see how that is challenging.

  • MikeTheGreat 11 hours ago

    Ah. Thanks for posting - this makes a lot of sense.

    I can totally see how they're able to pre-train models no problem, but are having trouble with the "noticeably better" part.

    Thanks!

  • mudkipdev 13 hours ago

    OpenAI allegedly has not completed a successful pretraining run since 4o

cherioo 13 hours ago

GPT4.5 was allegedly such a pre-train. It just didn’t perform good enough to announce and product it as such.

  • htrp 13 hours ago

    it wasn't economical to deploy but i expect it wasn't wasted, expect the openai team to pick that back up at some point

    • mips_avatar 12 hours ago

      The scoop Dylan Patel got was that part way through the gpt4.5 pretraining run the results were very very good, but it leveled off and they ended up with a huge base model that really wasn't any better on their evals.

octoberfranklin 13 hours ago

You don't train the next model by starting with the previous one.

A company's ML researchers are constantly improving model architecture. When it's time to train the next model, the "best" architecture is totally different from the last one. So you have to train from scratch (mostly... you can keep some small stuff like the embeddings).

The implication here is that they screwed up bigly on the model architecture, and the end result was significantly worse than the mid-2024 model, so they didn't deploy it.

  • MikeTheGreat 11 hours ago

    Huh - I did not know that, and that makes a lot of sense.

    I guess "Start software Vnext off the current version (or something pretty close)" is such a baseline assumption of mine that it didn't occur to me that they'd be basically starting over each time.

    Thanks for posting this!