Comment by zingar
Personally, I've had great gains in terms of small personal tools built on top of CLIs, and Emacs config. I'm also able to deliver PRs that demonstrate a general principle to someone I manage even if they work outside of my most comfortable stacks. But that's not what you asked about.
I don't have direct evidence of exactly what you're looking for (particularly the part about "someone responsible for the architecture to sign off"). Sticking strictly to that last caveat may prevent you from receiving some important signal.
> the claim that we should move from “validating architecture” to “validating behavior.”
I think these people are on the right track and the reason I think that is because of how I work with people right now.
I manage the work of ~10 developers pretty closely and am called on for advice to another ~10, while also juggling demanding stakeholders. For a while now, I've only been able to do spot checks on PRs myself. I don't consider that a major part of my job anymore. The management that is most valuable is:
1) Teaching developers about quality so that they start with better code, and give better reviews to each other 2) Teaching people to focus and move in small steps 3) Investing in guardrails 4) Metrics, e.g. it doesn't matter what code is merged, it doesn't matter if a "task" is "shipped", what matters is if the metrics say that we've had the result we expected.
As I acknowledge how flimsy my review process is, my impulse is to worry about architecture and security. But metrics and guardrails exist for those things too. Opinionated stacks help, for instance SQL injection opportunities look different enough from "normal" Rails to mean that there are linters that can catch many problems, and the linters are better than I am at this job.
Some of these tools are available for agents just as they are for humans. Some of them are woefully bad or lack good ergonomics for agents, but I wouldn't bet against them becoming better.
I agree that agentic coding changes code review, but I don't think that has to inevitably / long-term mean worse.
> half of my time went into fixing the subtle mistakes it made or the duplication it introduced
A cold hard evaluation of the effectiveness of agentic coding doesn't care about what percentage of time went into fixing bad code; it cares about the total time.
That said, I find that making an agent move in many small steps (just how I would advise a human) creates far less rework.