Comment by danshapiro
Comment by danshapiro 2 days ago
Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.
It's bc they are programmed to be agreeable and friendly so that you'll keep using them.