Comment by ACCount37
RL is very important - because while it's inefficient, and sucks at creating entirely new behaviors or features in LLMs, it excels at bringing existing features together and tuning them to perform well.
It's a bit like LLM glue. The glue isn't the main material - but it's the one that holds it all together.
RL before LLMs can very much learn new behaviors. Take a look at AlphaGo for that. It can also learn to drive in simulated environments. RL in LLMs is not learning the same way, so it can't create it's own behaviors.