Comment by athrowaway3z

> One of the weird things I found out about agents is that they actually give up on fixing test failures and just disable tests. They’ll try once or twice and then give up.

Its important to not think in terms of generalities like this. How they approach this depends on your tests framework, and even on the language you use. If disabling tests is easy and common in that language / framework, its more likely to do it.

For testing a cli, i currently use run_tests.sh and never once has it tried to disable a test. Though that can be its own problem when it hits 1 it can't debug.

# run_tests.sh # Handle multiple script arguments or default to all .sh files

scripts=("${@/#/./examples/}")

[ $# -eq 0 ] && scripts=(./examples/*.sh)

for script in "${scripts[@]}"; do

    [ -n "$LOUD" ] && echo $script

    output=$(bash -x "$script" 2>&1) || {

        echo ""

        echo "Error in $script:"

        echo "$output"

        exit 1

    }

done

echo " OK"

----

Another tip. For a specific tasks don't bother with "please read file x.md", Claude Code (and others) accept the @file syntax which puts that into context right away.