Comment by lukebechtel

Comment by lukebechtel a day ago

3 replies

1. Start with a plan. Get AI to help you make it, and edit.

2. Part of the plan should be automated tests. AI can make these for you too, but you should spot check for reasonable behavior.

3. Use Claude 4.5 Opus

4. Use Git, get the AI to check in its work in meaningful chunks, on its own git branch.

5. Ask the AI to keep am append-only developer log as a markdown file, and to update it whenever its state significantly changes, or it makes a large discovery, or it is "surprised" by anything.

baal80spam a day ago

> Use Claude 4.5 Opus

In my org we are experimenting with agentic flows, and we've noticed that model choice matters especially for autonomy.

GPT-5.2 performed much better for long-running tasks. It stayed focused, followed instructions, and completed work more reliably.

Opus 4.5 tended to stop earlier and take shortcuts to hand control back sooner.

  • 8note 18 hours ago

    a ralph loop can make claude go til the end, or to a rate limit at least.

    opus closes the task and ralph opens it right back up again.

    i imagine there's something to the harness for that, too

  • lukebechtel a day ago

    Interesting! Was kinda disappointed with Codex last time I tried it ~2m ago, but things change fast.