Comment by ACCount37

It's not just OpenAI's fuckup with the specific training method - although yes, training on raw user feedback is spectacularly dumb, and it's something even the teams at CharacterAI learned the hard way at least a year before OpenAI shoot its foot off with the same genius idea.

It's also a bit of a failure to understand that many LLM behaviors are self-reinforcing across context, and keep tabs on that.

When the AI sees its past behavior, that shapes its future behavior. If an AI sees "I'm doing X", it may also see that as "I should be doing X more". And at long enough contexts, this can drastically change AI behavior. Small random deviations can build up to crushing behavioral differences.

And if AI has a strong innate bias - like a sycophancy bias? Oh boy.

This applies to many things, some of which we care about (errors, hallucinations, unsafe behavior) and some of which we don't (specific formatting, message length, terminology and word choices).