Comment by ACCount37

Comment by ACCount37 3 days ago

1 reply

RL is very important - because while it's inefficient, and sucks at creating entirely new behaviors or features in LLMs, it excels at bringing existing features together and tuning them to perform well.

It's a bit like LLM glue. The glue isn't the main material - but it's the one that holds it all together.

elchananHaas 3 days ago

RL before LLMs can very much learn new behaviors. Take a look at AlphaGo for that. It can also learn to drive in simulated environments. RL in LLMs is not learning the same way, so it can't create it's own behaviors.