Comment by danshapiro

Comment by danshapiro 2 days ago

2 replies

Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.

diamond559 2 days ago

It's bc they are programmed to be agreeable and friendly so that you'll keep using them.

make3 2 days ago

isn't that just instruction fine tuning and rlhf inducing style & deference? why is that surprising