Comment by bcoates

Comment by bcoates 3 days ago

3 replies

Also the persuasion paper he links isn't at all about what he's talking about.

That paper is about using persuasion prompts to overcome trained in "safety" refusals, not to improve prompt conformance.

danshapiro 2 days ago

Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.

  • diamond559 2 days ago

    It's bc they are programmed to be agreeable and friendly so that you'll keep using them.

  • make3 2 days ago

    isn't that just instruction fine tuning and rlhf inducing style & deference? why is that surprising