Comment by intalentive

A “pretrained” ResNet could easily have been trained through a supervised signal like ImageNet labels.

“Pretraining” is not a correlate of the learning paradigms, it is a correlate of the “fine-tuning” process.

Also LLM pretraining is unsupervised. Dwarkesh is wrong.