Comment by manmal

Comment by manmal 12 hours ago

2 replies

My experience with both Opus and GPT-codex is that they both just forget to implement big chunks of specs, unless you give them the means to self-validate their spec conformance. I’m finding myself sometimes spending more time coming up with tooling to enable this, than the actual work.

wenc 10 hours ago

The key is generating a task list from the spec. Kiro IDE (not cli) generates tasks.md automatically. This is a checklist that Opus has to check off.

Try Kiro. It's just an all-round excellent spec-driven IDE.

You can still use Claude Code to implement code from the spec, but Kiro is far better at generating the specs.

p.s. if you don't use Kiro (though I recommend it), there’s a new way too — Yegge’s beads. After you install, prompt Claude Code to `write the plan in epics, stories and tasks in beads`. Opus will -- through tool use -- ensure every bead is implemented. But this is a more high variance approach -- whereas Kiro is much more systematic.

  • manmal an hour ago

    I’ve even built my own todo tool in zig, which is backed by SQLite and allows arbitrary levels of todo hierarchy. Those clankers just start ignoring tasks or checking them off with a wontfix comment the first time they hit adversity. Codex is better at this because it keeps going at hard problems. But then it compacts so many times over that it forgets the todo instructions.