Comment by octoberfranklin
Comment by octoberfranklin 13 hours ago
You don't train the next model by starting with the previous one.
A company's ML researchers are constantly improving model architecture. When it's time to train the next model, the "best" architecture is totally different from the last one. So you have to train from scratch (mostly... you can keep some small stuff like the embeddings).
The implication here is that they screwed up bigly on the model architecture, and the end result was significantly worse than the mid-2024 model, so they didn't deploy it.
Huh - I did not know that, and that makes a lot of sense.
I guess "Start software Vnext off the current version (or something pretty close)" is such a baseline assumption of mine that it didn't occur to me that they'd be basically starting over each time.
Thanks for posting this!