Comment by Legend2440
Comment by Legend2440 2 days ago
It doesn’t appear that anyone at OpenAI sat down and thought “let’s make our model more sycophantic so that people engage with it more”.
Instead it emerged automatically from RLHF, because users rated agreeable responses more highly.
Not precisely RLHF, probably a policy model trained on user responses.
RL works on responses from the model you're training, which is not the one you have in production. It can't directly use responses from previous models.