Comment by mr_00ff00

Comment by mr_00ff00 13 hours ago

What is a pre-training run?

nodja 12 hours ago

Pre-training is just training, it got the name because most models have a post-training stage so to differentiate people call it pre-training.

Pre-training: You train on a vast amount of data, as varied and high quality as possible, this will determine the distribution the model can operate with, so LLMs are usually trained on a curated dataset of the whole internet, the output of the pre-training is usually called the base model.

Post-training: You narrow down the task by training on the specific model needs you want. You can do this through several ways:

- Supervised Finetuning (SFT): Training on a strict high quality dataset of the task you want. For example if you wanted a summarization model, you'd finetune the model on high quality text->summary pairs and the model would be able to summarize much better than the base model.

- Reinforcement Learning (RL): You train a separate model that ranks outputs, then use it to rate the output of the model, then use that data to train the model.

- Direct Preference Optimizaton (DPO): You have pairs of good/bad generations and use them to align the model towards/away the kinds of responses you want.

Post-training is what makes the models able to be easily used, the most common is instruction tuning that teaches to model to talk in turns, but post-training can be used for anything. E.g. if you want a translation model that always translates a certain way, or a model that knows how to use tools, etc. you'd achieve all that through post-training. Post-training is where most of the secret sauce in current models is nowadays.

Reply View 5 replies

mrweasel 2 hours ago

If pre-training is just training, then how on earth can OpenAI not have "a successful pre-training run"? The word successful indicates that they tried, but failed.
It might be me misunderstanding how this works, but I assumed that the training phase was fairly reproducible. You might get different results on each run, do to changes in the input, but not massively so. If OpenAI can't continuously and reliably train new models, then they are even more overvalued that I previously assumed.

Reply View | 3 replies
- nodja an hour ago
  
  Because success for them doesn't mean it works, it means it works much better than what they currently have. If a 1% improvement comes at the cost of spending 10x more on training and 2x more on inference then you're failing at runs. (numbers out of ass)
  
  Reply View | 1 reply
  
  mrweasel 13 minutes ago
  
  That makes sense. It's not that the training didn't complete or returned a moronic model, but the capabilities have plateaued.
  
  Reply View | 0 replies
- immibis an hour ago
  
  Maybe this has something to do with why they're declaring "code red".
  
  Reply View | 0 replies
cocogoatmain 11 hours ago

Want to also add that the model doesn’t know how to respond in a user-> assistant style conversation after it’s pretraining, and it’s a pure text predictor (look at the open source base models)
There’s also what is being called mid-training where the model is trained on high(er) quality traces and acts as a bridge between pre and post training

Reply View | 0 replies

tim333 12 hours ago

If you've an hour to spare this Karpathy video is good at explaining how it all works https://youtu.be/7xTGNNLPyMI

Reply View 0 replies

abixb 13 hours ago

The first step in building a large language model. That's when the model is initiated and trained on a huge dataset to learn patterns and whatnot. The "P" in "GPT" stands for "pre-trained."

Reply View 0 replies

bckr 13 hours ago

That’s where they take their big pile of data and train the model to do next-token-prediction.

Reply View 0 replies