Comment by mr_00ff00
What is a pre-training run?
What is a pre-training run?
If pre-training is just training, then how on earth can OpenAI not have "a successful pre-training run"? The word successful indicates that they tried, but failed.
It might be me misunderstanding how this works, but I assumed that the training phase was fairly reproducible. You might get different results on each run, do to changes in the input, but not massively so. If OpenAI can't continuously and reliably train new models, then they are even more overvalued that I previously assumed.
Want to also add that the model doesn’t know how to respond in a user-> assistant style conversation after it’s pretraining, and it’s a pure text predictor (look at the open source base models)
There’s also what is being called mid-training where the model is trained on high(er) quality traces and acts as a bridge between pre and post training
If you've an hour to spare this Karpathy video is good at explaining how it all works https://youtu.be/7xTGNNLPyMI
Pre-training is just training, it got the name because most models have a post-training stage so to differentiate people call it pre-training.
Pre-training: You train on a vast amount of data, as varied and high quality as possible, this will determine the distribution the model can operate with, so LLMs are usually trained on a curated dataset of the whole internet, the output of the pre-training is usually called the base model.
Post-training: You narrow down the task by training on the specific model needs you want. You can do this through several ways:
- Supervised Finetuning (SFT): Training on a strict high quality dataset of the task you want. For example if you wanted a summarization model, you'd finetune the model on high quality text->summary pairs and the model would be able to summarize much better than the base model.
- Reinforcement Learning (RL): You train a separate model that ranks outputs, then use it to rate the output of the model, then use that data to train the model.
- Direct Preference Optimizaton (DPO): You have pairs of good/bad generations and use them to align the model towards/away the kinds of responses you want.
Post-training is what makes the models able to be easily used, the most common is instruction tuning that teaches to model to talk in turns, but post-training can be used for anything. E.g. if you want a translation model that always translates a certain way, or a model that knows how to use tools, etc. you'd achieve all that through post-training. Post-training is where most of the secret sauce in current models is nowadays.