Comment by benrutter

Comment by benrutter 2 days ago

13 replies

I'm so curious around what people's median experience is of AI coding tools.

I've tried agents every now and then, recently for something very simple- add an option to request csb format in a data api.

The results were, well, not good. . . I ended up undoing literally all changes because writing from scratch was a lot easier than trying to refactor the total mess it has made from what I'd have thought was a trivial feature.

I haven't done loads of prompt engineering etc, in all honesty it seems a lot of work when I haven't seen promise yet in the tool.

I see articles like this, and I always wonder, am I the outlier or is the writer? My experience of agentic AI is so hugely different to what some people are finding.

aydyn 2 days ago

Think of this: whats the likelihood that what you are asking for would be found in some public github repo? If its high then you are good to go.

  • a123b456c 2 days ago

    I think you're pointing in the right direction, but I would rephrase as,

    what's the likelihood that the solution exists in the github repo in a way that the machine can recognize as relevant to your prompt?

    If many versions of the solution exist, due to the problem's common occurrence, and if you can evaluate the LLM's output, then you're good to go.

sfn42 2 days ago

As someone who has been fairly negative towards AI until recently, the problem is how you use it.

If you just tell it some vague feature to make, it's gonna do whatever it's gonna do and maybe it will be good, maybe it won't. It probably won't. The more specific you are the better it will do.

Instead of trying to 100x or 1000x your effort, try to just 2x or 3x it. Give it small specific tasks and check the work thoroughly, use it as an extension of yourself rather than a separate "agent".

I can tell it to write a function and it'll do pretty well. I can ask it to fix things if it doesn't do it the way I want. This is all easy. Maybe I can even get it to write a whole class at once or maybe I can get it to write a class in a few iterations.

The key here is I'm in control, I'm doing the design, I'm making the decisions. I can ask it how I should approach a problem and often it'll have great suggestions. I can ask it to improve a function I've written and it'll do pretty well. Some times really well.

The point is I'm using it as a tool I'm not using it to do my job for me. I use it to help me think I don't use it to think for me. I don't let it run away from me and edit a whole bunch of files etc, I keep it on a tight leash.

I'm sold now. I am, indisputably, a better software developer with LLMs in my toolbelt. They help me write better code, faster, while learning things faster and easier, it's really good. Reliability isn't a problem when I keep a close eye on it. It's only a problem if you try to get it to do a whole big task on it's own.

sothatsit 2 days ago

Agent performance depends massively on the work you do.

For example, I have found Claude Code and Codex to be tremendously helpful for my web development work. But my results for writing Zig are much worse. The gap in usefulness of agents between tasks is very big.

The skill ceiling for using agents is also surprisingly high. Planning before coding, learning agent capabilities, environment setup, and context engineering can make a pretty massive difference to results. This can all be a big time sink though, and I'm not sure if it's really worth it if agents don't already work decently well for the work you do.

But with the performance gaps between domains, and the skill curve, I can definitely understand why there is such a divide between people claiming agents are ridiculously overhyped, and people who claim coding is fundamentally changing.

  • mattmanser 2 days ago

    I feel there's a third reason.

    When I see a pro-AI person insisting that they are fully automated, I often scour their recent comments to find code or git repos they have shared. You find something every now and again.

    My thinking is that I want to use this stuff, but don't find the agentic AI at all effective. I must be doing something wrong! So I should learn from the real world success of others.

    A regular pattern is they say they're using vibe coding for complex problems. You check, and they're trivial features.

    One egregious example was a basic randomizer to pick a string from a predetermined set, and save that value into an existing table to re-use later.

    To me that's a trivial feature, a 15-30 minute task in a codebase I'm familiar with.

    For this extremely AI bullish developer it was described as a major feature. The prompts were timestamped and it took them 1/2 day using coding agents.

    They were sharing their .claude folder. It had 50 odd md files in it. I sampled a bunch of them and most of them boiled down to:

    'You are an expert [dev/QA/architect/PM/tester]. Ultrathink. Be good'.

    Worse, I looked at their linkedin, and on paper they looked experienced. Seeing their code, they were not.

    There's a subset of the "fully automated" coders who are just bad. They are incapable of judging how bad AI code is. But vocally, and often aggressively, advocate for it.

    Some are good, but I just can't replicate their success. And they're clearly also still hand-writing a lot of the code.

    • sothatsit 2 days ago

      Yeah, I definitely see this as well. These are the people with seven MCP servers, 5000-line AGENTS.md files, their own "memory systems" for the agents, and who try to hit their rate-limits on all their agents every 5 hours (regardless of whether or not they are actually getting useful work done). Having tried some of this stuff when I was trying to learn about agents, it almost always made their performance worse...

      In web development, where I get the most out of agents, I am still only using them for implementing basic things. I will write anything even moderately complex, as agents often make the wrong assumptions somewhere. And then there's also manual work required to review and tidy up agent output. But there's just so much grunt work in web development from adding to a DB schema, writing a migration, adding the data to your model, exposing it in an API endpoint, and finally showing it on a page. None of that is complicated, so agents are pretty good at it.

    • theshrike79 2 hours ago

      Yea, these are the NFT/Crypto bros of the AI world. They don't really understand anything.

      The best of them are rediscovering basic software project management and post about it on every social media site and their substack like they discovered something brand new :)

      "Turns out if you plan first, then iterate on the plan and split the plan into manageable chunks, development is a lot smoother!!!11 (subscribe to my AI podcast)"

      No shit, Sherlock. I wish they read a book once or twice.

danielbarla 2 days ago

I think a lot if comes down to the domain, language and frameworks, your expectations, as well as prompt engineering. Having said that, I have had a number of excellent experiences in the past few weeks:

- Case 1 was troubleshooting what turned out to be a complex and messy dependency injection issue. I got pulled in to unblock a team member, who was struggling with the issue. My efforts were a dead-end, but Claude (Code) managed to spot a very odd configuration issue. The codebase is a large, legacy one.

- Case 2 was the same codebase, I again got pulled in to unblock a team mate, investigating why some integration tests were running individually, but not when run as a group. Clearly there was a pretty obvious smoking gun, and I managed to isolate the issue after about 15-30 minutes of debugging. I had set Claude on the goose chase as well, and as I closed the call with my teammate, I noticed it had found the same exact two lines that were causing the issue.

Clearly, it occasionally does insane stuff, or lies its little pants off. The number of times where it "got me" are fairly low, however, and its usefulness to me is extreme. In the cases above, it out-did a teammate who has at least 10 years of experience, and equalled me in the one case and outdid me in the other, with over 25 years now. I have a similar wonderment to your situation, but the opposite: "how are people NOT finding value in this?".

lazarus01 2 days ago

AI coding works amazingly well

But only on micro tasks, coming with explicit instructions, inside a very well documented architecture.

Give AI freedom of expression and they will never find first principals in their training data. You will receive code that is not performant and when analyzing the output, AI will try to convince you that it is. If the task goes beyond your domain, you may believe the wrong principals are ok.

cosmodust 2 days ago

It's very use case specific, I find them really good in simple repetitive tasks as long as you guide them at low level. Although you do need to keep a close eye as they easily spoil your existing work.

x0x0 2 days ago

I'm the same, with the same question if it's me.

I've had success with eg spitting out templated html; sometimes with css; sometimes with writing tests where I'm very specific about what I want (set up these structures, test this condition), etc. It's mediocre (good start, very far from production) with writing screens in react native. It does slightly better on rails, but far from production ready.

After that, it kinda works, but my effort level to turn the output into working code is higher than just writing it myself.

vijucat 2 days ago

They're great at creating test cases out of code and/or log file excerpts. They're good at run-of-the-mill tasks whose answer one can reasonably expect to find on StackOverflow. I'm using GPT-4.1 and Clause Sonnet Thinking 3.7 with vscode + GitHub Copilot

ekidd 2 days ago

> I'm so curious around what people's median experience is of AI coding tools.

My experience is coding agents work best for either absolute beginners, or for lead engineers who have experience building and training teams. Getting good results out of coding agents is a lot like getting good results out of interns: You need to explain clearly what you want, ask them to explain what they plan to do, give feedback on the plan, and then very carefully review the results. You need to write up your preferred coding style, you need a document that explains "how to work on this project", you need to establish rigorous automated quality checks, etc. Using a coding agent heavily is a lot like being promoted to "technical lead", with all the tradeoffs that entails.

Here's a recent discussion of a good blog post on the subject: https://news.ycombinator.com/item?id=45503867

I have gotten some very nice results out of Sonnet 4.5 this past week. But it required using my "technical management" skills very heavily. And it required lots extremely careful code review. Clear documentation, robust QA, and code review are the main bottlenecks.

I mean, the time I spent writing AGENTS.md wasn't wasted. I'm writing down a lot of stuff I used to teach in pairing sessions.