Comment by brcmthrowaway
Comment by brcmthrowaway 2 days ago
Do LLMs still use backprop?
Comment by brcmthrowaway 2 days ago
Do LLMs still use backprop?
Gradient descent doesn't matter here. Second order and higher methods still use lower order derivatives.
Back propagation is reverse mode auto differentiation. They are the same thing.
And for those who don't understand what back propagation is, it is just an efficient method to calculate the gradient for all parameters.
Yes. Pretraining and fine-tuning use standard Adam optimizers (usually with weight-decay). Reinforcement learning has been the odd-man out historically, but these days almost all RL algorithms also use backprop and gradient descent.