Comment by ACCount37

Comment by ACCount37 3 days ago

RL is very important - because while it's inefficient, and sucks at creating entirely new behaviors or features in LLMs, it excels at bringing existing features together and tuning them to perform well.

It's a bit like LLM glue. The glue isn't the main material - but it's the one that holds it all together.

elchananHaas 3 days ago

RL before LLMs can very much learn new behaviors. Take a look at AlphaGo for that. It can also learn to drive in simulated environments. RL in LLMs is not learning the same way, so it can't create it's own behaviors.

Reply View 0 replies