Comment by 8note

try other harnesses than codex.

ive had more success with review tools, rather than the agent getting the code quality right the first time.

current workflow

1. specs/requirements/design, outputting tasks 2. implementation, outputting code and tests 3. run review scripts/debug loops, outputting tasks 4. implement tasks 5. go back to 3

the quality of specs, tasks, and review scripts make a big difference

one of the biggest things that gets the results better is if you can get a feedback loop in from what the app actually does back to the agent. good logs, being able to interact/take screenshots a la playwright etc

guidelines and guardrails are best if theyre tools that the agent runs, or that run automatically to give feedback.