Comment by egorfine
I have tried full-on agentic coding twice in the last month.
1) I needed a tool to consolidate *.dylib on macOS into the app bundle. I wanted this tool to be in JS because of some additional minor logic which would be a hassle to implement in pure bash.
2) I needed a simple C wrapper to parallelize /usr/bin/codesign over cores. Split list of binaries in batches and run X parallel codesigns over a batch.
Arguably, both tools are junior-level tasks.
I have used Claude Code and Opus 4.5. I have used AskUserTool to interview me and create a detailed SPEC.md. I manually reviewed and edited the final spec. I then used the same model to create the tool according to that very detailed spec.
The first tool, the dylib consolidation one, was broken horrendously. I did recurse into subdirs where no folder structure is expected or needed and did not recurse into folders where it was needed. It created a lot of in-memory structures which were never read. Unused parameters in functions. Unused functions. Incredible, illogical code that is impossible to understand. Quirks, "clever code". Variable scope all over the place. It appeared to work, but only in one single case on my dev workstation and failed on almost every requirement in the spec. I ended up rewriting it from scratch, because the only things worst saving from this generated code were one-liners for string parsing.
The second tool did not even work. You know this quirk of AI models that once they find a wrong solution they keep coming back at it, because the context was poisoned? So, this. Completely random code, not even close. I rewrote the thing from scratch [1].
Curiously, the second tool took way more time and tokens to create despite being quite simpler.
So yeah. We're definitely at most 6 month away from replacing programmers with AI.
After you created a spec did you ask the Claude to break down the spec into epics and tasks?
I've found that helps a lot.