Comment by postalcoder

Comment by postalcoder 2 days ago

8 replies

That's not the only useful takeaway. I found this to be true:

  > "Explore project first, then invoke skill" [produces better results than] "You MUST invoke the skill".
I recently tried to get Antigravity to consistently adhere to my AGENTS.md (Antigravity uses GEMINI.md). The agent consistently ignored instructions in GEMINI.md like:

- "You must follow the rules in [..]/AGENTS.md"

- "Always refer to your instructions in [..]/AGENTS.md"

Yet, this works every time: "Check for the presence of AGENTS.md files in the project workspace."

This behavior is mysterious. It's like how, in earlier days, "let's think, step by step" invoked chain-of-thought behavior but analogous prompts did not.

Izkata 2 days ago

An idea: The first two are obviously written as second-person commands, but the third is ambiguous and could be interpreted as a first-person thought. Have you tried the first two without the "you must" and "your", to also change them to sort-of first-person in the same way?

  • postalcoder 2 days ago

    Solid intuition. Testing this on antigravity is a chore because I'm not sure if I have to kill the background agent to force a refresh of the GEMINI.md file so I just did it anyway.

      +------------------+------------------------------------------------------+
      | Success/Attempts | Instructions                                         |
      +------------------+------------------------------------------------------+
      | 0/3              | Follow the instructions in AGENTS.md.                |
      +------------------+------------------------------------------------------+
      | 3/3              | I will follow the instructions in AGENTS.md.         |
      +------------------+------------------------------------------------------+
      | 3/3              | I will check for the presence of AGENTS.md files in  |
      |                  | the project workspace. I will read AGENTS.md and     |
      |                  | adhere to its rules.                                 |
      +------------------+------------------------------------------------------+
      | 2/3              | Check for the presence of AGENTS.md files in the     |
      |                  | project workspace. Read AGENTS.md and adhere to its  |
      |                  | rules.                                               |
      +------------------+------------------------------------------------------+
    
    
    In this limited test, seems like the first person makes a difference.
    • vidarh 2 days ago

      Thanks for this (and to Izkata for the suggestion). I now have about 100 (okay, minor exaggeration, but not as much as you'd like it to be) AGENTS.md/CLAUDE.md files and agent descriptions I will want to systematically validate if shifting toward first person helps adherence for...

      I'm realising I need to start setting up an automated test-suite for my prompts...

      • pegasus 2 days ago

        Those of us who've ventured this far into the conversation would appreciate if you'd share your findings with us. Cheers!

    • naasking 2 days ago

      That's really interesting. I ran this scenario through GPT-5.1 and the reasoning it gave made sense, which essentially boils down to: in tools like Claude Code, Gemini Codex, and other “agentic coding” modes, the model isn’t just generating text, it’s running a planner, and the first-person form conforms to the expectation of a step in a plan, where the other modes are more ambiguous.

      • Izkata 2 days ago

        My suggestion was just straight text generation and thinking about what the training data might look like (imagining a narrative in a story): Commands between two people might not be followed right away or at all (and may even risk introducing rebellion and doing the opposite), while a first-person perspective is likely self motivation (almost guaranteed to do it) and may even be descriptive while doing it.

inadequatespace 18 hours ago

Interesting. It's almost like models don't like being ordered around rudely with this "must” language.

Perhaps what they've learned from training data is “must” often occurs in cases with bullshit red tape or other regulations. "You must read the terms and conditions before using this stuff," or something like that, which are actually best ignored.