Ask HN: Do you have any evidence that agentic coding works?

320 points by terabytest a day ago

I've been trying to get agentic coding to work, but the dissonance between what I'm seeing online and what I'm able to achieve is doing my head in.

Is there real evidence, beyond hype, that agentic coding produces net-positive results? If any of you have actually got it to work, could you share (in detail) how you did it?

By "getting it to work" I mean: * creating more value than technical debt, and * producing code that’s structurally sound enough for someone responsible for the architecture to sign off on.

Lately I’ve seen a push toward minimal or nonexistent code review, with the claim that we should move from “validating architecture” to “validating behavior.” In practice, this seems to mean: don’t look at the code; if tests and CI pass, ship it. I can’t see how this holds up long-term. My expectation is that you end up with "spaghetti" code that works on the happy path but accumulates subtle, hard-to-debug failures over time.

When I tried using Codex on my existing codebases, with or without guardrails, half of my time went into fixing the subtle mistakes it made or the duplication it introduced.

Last weekend I tried building an iOS app for pet feeding reminders from scratch. I instructed Codex to research and propose an architectural blueprint for SwiftUI first. Then, I worked with it to write a spec describing what should be implemented and how.

The first implementation pass was surprisingly good, although it had a number of bugs. Things went downhill fast, however. I spent the rest of my weekend getting Codex to make things work, fix bugs without introducing new ones, and research best practices instead of making stuff up. Although I made it record new guidelines and guardrails as I found them, things didn't improve. In the end I just gave up.

I personally can't accept shipping unreviewed code. It feels wrong. The product has to work, but the code must also be high-quality.

fhd2 5 hours ago

Bear in mind that there is a lot of money riding on LLMs leading to cost savings, and development (seen as expensive and a common bottleneck) is a huge opportunity. There are paid (micro) influencer campaigns going on and what not.

Also bear in mind that a lot of folks want to be seen as being on the bleeding edge, including famous people. They get money from people booking them for courses and consulting, buying their books, products and stuff. A "personal brand" can have a lot of value. They can't be seen as obsolete. They're likely to talk about what could or will be, more than about what currently is. Money isn't always the motive for sure, people also want to be considered useful, they want to genuinely play around and try and see where things are going.

All that said, I think your approach is fine. If you don't inspect what the agent is doing, you're down to faith. Is it the fastest way to get _something_ working? Probably not. Is it the best way to build an understanding of the capabilities and pit falls? I'd say so.

This stuff is relatively new, I don't think anyone has truly figured out how to best approach LLM assisted development yet. A lot of folks are on it, usually not exactly following the scientific method. We'll get evidence eventually.

Reply View 9 replies

embedding-shape 3 hours ago

> There are paid (micro) influencer campaigns going on and what not.
Extremely important to keep in mind when you read about LLMs, agents and what not both here, on reddit and elsewhere.
Just the other day I got offered 200 USD if I posted about some new version of a "agentic coding platform" on HN, which obviously is too little for me to compromise my ethics and morals, but makes it very clear how much of this must be going on, if me, some random user, gets offered money to just post about their platform. If I was offered that 15-20 years ago when I was broke and cleaning hotels, I'd probably take them up on their offer.

Reply View | 3 replies
- jsksdkldld 3 hours ago
  
  [flagged]
  
  Reply View | 2 replies
  
  fhd2 3 hours ago
  
  Parent didn't mention Simon Willinson, and neither me nor parent appear to imply that _all_ people posting positively about LLMs are paid influencers, that'd be a ridiculous claim. It's just that there _are_ paid influencers, at every level, down to non-famous people getting a few bucks, and that's worth knowing.
  Here's one thing I quickly found on one of Anthropic's campaigns on LinkedIn: https://www.favikon.com/blog/inside-anthropic-influencer-mar...
  
  Reply View | 0 replies
  
  embedding-shape 3 hours ago
  
  Stop what? Posting on HN? :| What does this have to do with me?
  
  Reply View | 0 replies
dust42 4 hours ago

> This stuff is relatively new, I don't think anyone has truly figured out how to best approach LLM assisted development yet.
Exactly. But as you say, there are so many people riding the hype wave that it is difficult to come to a sober discussion. LLMs are a new tool that is a quantum leap but they are not a silver bullet for fully autonomous development.
It can be a joy to work with LLMs if you have to write the umpteenth javascript CRUD boilerplate. And it can be deeply frustrating once projects are more complex.
Unfortunately I think benchmaxxing and lm-arena are currently pushing into the wrong direction. But trillions of VC money are at stake and leaning back, digesting, reflecting and discussing things is not an option right now.

Reply View | 3 replies
- closewith 3 hours ago
  
  > But as you say, there are so many people riding the hype wave that it is difficult to come to a sober discussion. LLMs are a new tool that is a quantum leap but they are not a silver bullet for fully autonomous development.
  While I agree with the latter, I actually think on former point - that hype is making sober discussion impossible - is actually directionally incorrect. Like a lot of people I speak to privately, I'm making a lot of money directly from software largely written by LLMs (roadmaps compressed from 1-2 years to months since Claude Code was released), but the company has never mentioned LLMs or AI in any marketing, client communications, or public releases. We all very aware that we need to be able to retire before LLMs swamp or obsolete our niche, and don't want to invite competition.
  Outside of tech companies, I think this is extremely common.
  > It can be a joy to work with LLMs if you have to write the umpteenth javascript CRUD boilerplate.
  There is so much latent demand for slightly customised enterprise CRUD apps. An enormous swathe of corporate jobs are humans performing CRUD and task management. Even if LLMs top out here, the economic disruption from this alone is going to be immense.
  
  Reply View | 2 replies
  
  fatherwavelet an hour ago
  
  It is delusional to believe the current frontier models can only write CRUD apps.
  I would think someone would have to only write CRUD apps themselves to believe this.
  It doesn't matter anyway what a person "believes". If anything, I am having the opposite experience that conversing with people is becoming a bigger and bigger waste of time instead of just talking to Gemini. It is not Gemini that is hallucinating all kinds of nonsense vs the average person. It is the opposite.
  
  Reply View | 1 reply
  
  bsenftner 21 minutes ago
  
  There is a critical failure in education - people are not being taught how to debate without degenerating into a dominance debate or merely an echo chamber of talking points. It's a real problem, people literally do not understand that a question is not an opportunity to show off or to dominate, but is a request for an exchange of information.
  And that problem is not just between people, this lack of communication skill continues with people's internal self conversations. Many a bully personality is that way because they bully themselves and terrorize themselves.
  It's no wonder that people can use AI at all, with how poorly people communicate. So the cluster of nonsense that is all the shallow thinkers directing people down incorrect paths is completely understandable. They are learning by doing, which with any other technology would be fine, but with AI to learn how to use it by using it can serious damage one's cognitive ability, as well as leave a junkyard of failed projects behind.
  
  Reply View | 0 replies
[removed] 5 hours ago

[deleted]

Reply View | 0 replies

sockopen 9 minutes ago

https://arxiv.org/abs/2507.09089

“Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%--AI tooling slowed developers down.”