Comment by tcdent

Comment by tcdent 3 days ago

10 replies

This style of prompting, where you set up a dire scenario in order to try to evoke some "emotional" response from the agent, is already dated. At some point, putting words like IMPORTANT in all uppercase had some measurable impact, but at the present time, models just follow instructions.

Save yourself the experience of having to write and maintain prompts like this.

bcoates 3 days ago

Also the persuasion paper he links isn't at all about what he's talking about.

That paper is about using persuasion prompts to overcome trained in "safety" refusals, not to improve prompt conformance.

  • danshapiro 2 days ago

    Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.

    • diamond559 2 days ago

      It's bc they are programmed to be agreeable and friendly so that you'll keep using them.

    • make3 2 days ago

      isn't that just instruction fine tuning and rlhf inducing style & deference? why is that surprising

kasey_junk 3 days ago

What’s irritating is that the llms haven’t learned this as bout themselves yet. If you ask an llm to improve its instructions those sort of improvements are what it will suggest.

It is the thing I find most irritating about working with llms and agents. They seem forever a generation behind in capabilities that are self referential.

  • danielbln 3 days ago

    LLMs will also happily put time estimates on work packages that are based on ore-LLM turn around times.

    "Phase 2 will take about one week"

    No, Claude, it won't, because you you and I will bang this thing out in a few hours.

    • mceachen 3 days ago

      "Refrain from including estimated task completion times." has been in my ~/.claude/CLAUDE.md for a while. It helps.

      • no-name-here 2 days ago

        Do such instructions take up a tiny bit more attention/context from LLMs, and consequentially is it better to leave it off and just ignore such output?

        • mceachen 2 days ago

          I have to balance this with what I know about my reptile brain. It’s distracting to me when Claude declares that I’m “absolutely right!” or making a “brilliant insight,” so it’s worth it to me to spend the couple context tokens and tell them to avoid these cliches.

          (The latest Claude has a `/context` command that’s great at measuring this stuff btw)

  • conorcleary 2 days ago

    Comments like yours on posts like these by humans like us will create a philosophical lens out of the ether that future LLMs will harvest for free and then paywall.