Comment by manmal

Comment by manmal 21 hours ago

My experience is that such one-shotted projects never survive the collision with reality. Even with extremely detailed specs, the end result will not be what people had in mind, because human minds cannot fully anticipate the complexity of software, and all the edge cases it needs to handle. "Oh, I didn't think that this scheduled alarm is super annoying, I'd actually expect this other alarm to supersede it. It's great we've built this prototype, because this was hard to anticipate on paper."

I'm not saying I don't believe your report - maybe you are working in a domain where everything is super deterministic. Anyway, I don't.

wenc 19 hours ago

I've been doing spec-driven development for the past 2 months, and it's been a game changer (especially with Opus 4.5).

Writing a spec is akin to "working backwards" (or future backwards thinking, if you like) -- this is the outcome I want, how do I get there?

The process of writing the spec actually exposes the edge cases I didn't think of. It's very much in the same vein as "writing as a tool of thought". Just getting your thoughts and ideas onto a text file can be a powerful thing. Opus 4.5 is amazing at pointing out the blind spots and inconsistencies in a spec. The spec generator that I use also does some reasoning checks and adds property-based test generation (Python Hypothesis -- similar to Haskell's Quickcheck), which anchors the generated code to reality.

Also, I took to heart Grant Slatton's "Write everything twice" [1] heuristic -- write your code once, solve the problem, then stash it in a branch and write the code all over again.

> Slatton: A piece of advice I've given junior engineers is to write everything twice. Solve the problem. Stash your code onto a branch. Then write all the code again. I discovered this method by accident after the laptop containing a few days of work died. Rewriting the solution only took 25% the time as the initial implementation, and the result was much better. So you get maybe 2x higher quality code for 1.25x the time — this trade is usually a good one to make on projects you'll have to maintain for a long time.

This is effective because initial mental models of a new problem are usually wrong.

With a spec, I can get a version 1 out quickly and (mostly) correctly, poke around, and then see what I'm missing. Need a new feature? I tell the Opus to first update the spec then code it.

And here's the thing -- if you don't like version 1 of your code, throw it away but keep the spec (those are your learnings and insights). Then generate a version 2 free of any sunk-cost bias, which, as humans, we're terrible at resisting.

Spec-driven development lets you "write everything twice" (throwaway prototypes) faster, which improves the quality of your insights into the actual problem. I find this technique lets me 2x the quality of my code, through sheer mental model updating.

And this applies not just to coding, but most knowledge work, including certain kinds of scientific research (s/code/LaTeX/).

[1] https://grantslatton.com/software-pathfinding

Reply View 3 replies

manmal 12 hours ago

My experience with both Opus and GPT-codex is that they both just forget to implement big chunks of specs, unless you give them the means to self-validate their spec conformance. I’m finding myself sometimes spending more time coming up with tooling to enable this, than the actual work.

Reply View | 2 replies
- wenc 10 hours ago
  
  The key is generating a task list from the spec. Kiro IDE (not cli) generates tasks.md automatically. This is a checklist that Opus has to check off.
  Try Kiro. It's just an all-round excellent spec-driven IDE.
  You can still use Claude Code to implement code from the spec, but Kiro is far better at generating the specs.
  p.s. if you don't use Kiro (though I recommend it), there’s a new way too — Yegge’s beads. After you install, prompt Claude Code to `write the plan in epics, stories and tasks in beads`. Opus will -- through tool use -- ensure every bead is implemented. But this is a more high variance approach -- whereas Kiro is much more systematic.
  
  Reply View | 1 reply
  
  manmal an hour ago
  
  I’ve even built my own todo tool in zig, which is backed by SQLite and allows arbitrary levels of todo hierarchy. Those clankers just start ignoring tasks or checking them off with a wontfix comment the first time they hit adversity. Codex is better at this because it keeps going at hard problems. But then it compacts so many times over that it forgets the todo instructions.
  
  Reply View | 0 replies

nl 21 hours ago

I think there's a difference between people getting a system a d realising it isn't actually what they wanted and "never survive collision with reality".

They survive by being modified and I don't think that invalidates the process that got them in front of people faster than would otherwise have been possible.

This isn't a defence of waterfall though. It's really about increasing the pace of agile and the size of the loop that is possible.

Reply View 1 reply

manmal 12 hours ago

I think I agree with what you’re saying? But that’s not the waterfall approach GP pitched.

Reply View | 0 replies