Comment by scaredginger
Comment by scaredginger 3 days ago
Bit of a nitpick, but I think his terminology is wrong. Like RL, pretraining is also a form of *un*supervised learning
Comment by scaredginger 3 days ago
Bit of a nitpick, but I think his terminology is wrong. Like RL, pretraining is also a form of *un*supervised learning
SL and SSL are very similar "algorithmically": both use gradient descent on a loss function of predicting labels, human-provided (SL) or auto-generated (SSL). Since LLMs are pretrained on human texts, you might say that the labels (i.e., next token to predict) were in fact human provided. So, I see how pretraining LLMs blurs the line between SL and SSL.
In modern RL, we also train deep nets on some (often non trivial) loss function. And RL is generating its training data. Hence, it blurs the line with SSL. I'd say, however, it's more complex and more computationally expensive. You need many / long rollouts to find a signal to learn from. All of this process is automated. So, from this perspective, it blurs the line with UL too :-) Though it dependence on the reward is what makes the difference.
Overall, going from more structured to less structured, I'd order the learning approaches: SL, SSL (pretraining), RL, UL.
A “pretrained” ResNet could easily have been trained through a supervised signal like ImageNet labels.
“Pretraining” is not a correlate of the learning paradigms, it is a correlate of the “fine-tuning” process.
Also LLM pretraining is unsupervised. Dwarkesh is wrong.
You could think of supervised learning as learning against a known ground truth, which pretraining certainly is.
a large number of breakthroughs in AI are based on turning unsupervised learning into supervised learning (alphazero style MCTS as policy improvers are also like this). So the confusion is kind of intrinsic.
Usual terminology for the three main learning paradigms:
- Supervised learning (e.g. matching labels to pictures)
- unsupervised learning / self-supervised learning (pretraining)
- reinforcement learning
Now the confusing thing is that Dwarkesh Patel instead calls pretraining "supervised learning" and you call reinforcement learning a form of unsupervised learning.