Comment by sirwhinesalot

Comment by sirwhinesalot a day ago

5 replies

The only approach I've tried that seems to work reasonably well, and consistently, was the following:

Make a commit.

Give Claude a task that's not particularly open ended, the closer to pure "monkey work" boilerplate nonsense the task is, the better (which is also the sort of code I don't want do deal with myself).

Preferably it should be something that only touches a file or two in the codebase unless it is a trivial refactor (like changing the same method call all over the place)

Make sure it is set to planning mode and let it come up with a plan.

Review the plan.

Let it implement the plan.

If it works, great, move on to review. I've seen it one-shot some pretty annoying tasks like porting code from one platform to another.

If there are obvious mistakes (program doesn't build, tests don't pass, etc.) then a few more iterations usually fix the issue.

If there are subtle mistakes, make a branch and have it try again. If it fails, then this is beyond what it can do, abort the branch and solve the issue myself.

Review and cleanup the code it wrote, it's usually a lot messier than it needs to be. This also allows me to take ownership of the code. I now know what it does and how it works.

I don't bother giving it guidelines or guardrails or anything of the sort, it can't follow them reliably. Even something as simple as "This project uses CMake, build it like this" was repeatedly ignored as it kept trying to invoke the makefile directly and in the wrong folder.

This doesn't save me all that much time since the review and cleanup can take long, but it serves a great unblocker.

I also use it as a rubber duck that can talk back and documentation source. It's pretty good for that.

This idea of having an army of agents all working together on the codebase is hilarious to me. Replace "agents" with "juniors I hired on fiverr with anterograde amnesia" and it's about how well it goes.

dwd 12 hours ago

+1 for the Rubber duck, and as an unblocker.

My personal use is very much one function at a time. I know what I need something to do, so I get it to write the function which I then piece together.

It can even come back with alternatives I may not have considered.

I might give it some context, but I'm mainly offloading a bunch of typing. I usually debug and fix it's code myself rather than trying to get it to do better.

crq-yml a day ago

TBH I think the greatest benefit is on the documentation/analysis side. The "write the code" part is fine when it sits in the envelope of things that are 100% conventional boilerplate. Like, as a frontend to ffmpeg you can get a ton of value out of LLMs. As soon as things go open-ended and design-centric, brace yourself.

I get the sense that the application of armies of agents is actually a scaled-up Lisp curse - Gas Town's entire premise is coding wizardry, the emphasis on abstract goals and values, complete with cute, impenetrable naming schemes. There's some corollary with "programs are for humans to read and computers to incidentally execute" here. Ultimately the program has to be a person addressing another person, or nature, and as such it has to evolve within the whole.

theshrike79 8 hours ago

> I don't bother giving it guidelines or guardrails or anything of the sort

Where do you give these guardrails? In the chat or CLAUDE.md?

Basic level information like how to build and test the project belong in CLAUDE.md, it knows to re-check that now and then.

  • sirwhinesalot 6 hours ago

    Yeah, CLAUDE.md. Sometimes it just ignores what was in there after the context window gets big enough (as it tends to with planning mode).