Comment by Legend2440

Comment by Legend2440 2 days ago

It doesn’t appear that anyone at OpenAI sat down and thought “let’s make our model more sycophantic so that people engage with it more”.

Instead it emerged automatically from RLHF, because users rated agreeable responses more highly.

astrange 2 days ago

Not precisely RLHF, probably a policy model trained on user responses.

RL works on responses from the model you're training, which is not the one you have in production. It can't directly use responses from previous models.

Reply View 0 replies

tsunamifury 2 days ago

I can tell you’ve never worked in big tech before.

Dark patterns are often “discovered” and very consciously not shut off because the reverse cost would be too high to stomach. Esp in a delicate growth situation.

See Facebook at its adverse mental health studies

Reply View 0 replies