Comment by sva_
> From my understanding, RL is a tuning approach on LLMs,
What you're referring to is actually just one application of RL (RLHF). RL itself is much more than that
> From my understanding, RL is a tuning approach on LLMs,
What you're referring to is actually just one application of RL (RLHF). RL itself is much more than that
Actually I didn't. Correct me if I am wrong, but my understanding is that RL is still an LLM tuning approach, i.e. an optimization of its parameter set, no matter if it's done at scale or via HF.