kranner 19 hours ago

> If you ask AI to write a document for you, you might get 80% of the deep quality you’d get if you wrote it yourself for 5% of the effort. But, now you’ve also only done 5% of the thinking.

This, but also for code. I just don't trust new code, especially generated code; I need time to sit with it. I can't make the "if it passes all the tests" crowd understand and I don't even want to. There are things you think of to worry about and test for as you spend time with a system. If I'm going to ship it and support it, it will take as long as it will take.

  • jdjdjssh 17 hours ago

    Yep, this is the big sticking point. Reviewing code properly is and was the bottle neck. However, with humans I trusted, I could ignore most of their work and focus on where they knew they needed a review. That kind of trust is worth a lot of money and lets you move really fast.

    > I need time to sit with it

    Everyone knows doing the work yourself is faster than reviewing somebody elses if you don’t trust them. I’d argue if AI ever gets to the point where you fully trust it, all white collar jobs are gone.

  • layer8 18 hours ago

    Yes, regression tests are not enough. One generally has to think through code repeatedly, with different aspects in mind, to convince oneself that it is correct under all circumstances. Tests only point-check, they don’t ensure correct behavior under all conceivable scenarios.

    • doug_durham 15 hours ago

      Unless you are in the business of writing flight control software, OS kernels, or critical financial software, I don't think your own code will reach the standards you mention. The only way we get "correct under all conceivable scenarios" software is to have a large team with long time horizons and large funding working on a small piece of software. It is beyond an individual to reach that standard for anything beyond code at the function level.

  • slfreference 18 hours ago

    I think what LLMs do with words is similar to what artists do with software like cinema4d.

    We have control points (prompts + context) and we ask LLMs to draw a 3D surface which passes through those points satisfying some given constraints. Subsequent chats are like edit operations.

    https://youtu.be/-5S2qs32PII

    • catdog 18 hours ago

      An LLM is an impressive, yet still imperfect and unpredictable translation machine. The code it outputs can only be as good as your prompt is precise, minus the often blatant mistakes it makes.

  • simianwords 18 hours ago

    Honest question: why is this not enough?

    If the code passes tests, and also works at the functionality level - what difference does it make if you’ve read the code or not?

    You could come up with pathological cases like: it passed the tests by deleting them. And the code written by it is extremely messy.

    But we know that LLMs are way smarter than this. There’s very very low chance of this happening and even if it does - it quick glance at code can fix it.

    • kranner 17 hours ago

      You can't test everything. The input space may be infinite. The app may feel janky. You can't even be sure you're testing all that can be tested.

      The code may seem to work functionally on day 1. Will it continue to seem to work on day 30? Most often it doesn't.

      And in my experience, the chances of LLMs fucking up are hardly very very low. Maybe it's a skill issue on my part, but it's also the case that the spec is sometimes discovered as the app is being built. I'm sure this is not the case if you're essentially summoning up code that exists in the test set, even if the LLM has to port it from another language, and they can be useful in parts here and there. But turning the controls over to the infinite monkey machine has not worked out for me so far.

      • CuriouslyC 16 hours ago

        If you care about performance, test it (stress test).

        If you care about security, test it (red teaming).

        If you care about maintainability, test it (advanced code analysis)

        Your eyeballs are super fallible, this is why bad engineers exist. Get rigorous.

    • gengstrand 14 hours ago

      Good question. Several reasons.

      1. Since the same AI writes both the code and the unit tests, it stands to reason that both could be influenced by the same hallucinations.

      2. Having a dev on call reduces time to restore service because the dev is familiar with the code. If developers stop reviewing code, they won't be familiar with it and won't be as effective. I am currently unaware of any viable agentic AI substitute for a dev on call capability.

      3. There may be legal or compliance standards regarding due diligence which won't get met if developers are no longer familiar with the code.

      I have blogged about this recently at https://www.exploravention.com/blogs/soft_arch_agentic_ai/

    • throwup238 17 hours ago

      It depends on the scale of complexity you’re working at and who your users are going to be. I’ve found that it’s trivial to have Claude Code spit out so much functionality that even just proper manually verifying it becomes a gargantuan task. I end up just manually testing the pieces I’m familiar with which is fine if there’s a QA department who can do a full run through of the feature and are prepared to deal with vibe coding pitfalls, but not so much on open source projects where slop gets shipped and unfamiliar users get stuck with bugs they can’t possibly troubleshoot. Writing the code from scratch The Old Way™ leaves a lot less room for shipping convincing but non functional slop because the dev has to work through it before shipping.

      The most immediate example I can think of is the beans LLM workflow tracker. It’s insane that its measured in the 100s of thousands of LoC and getting that thing setup in a repo is a mess. I had to use Github copilot to investigate the repo to get the latest method. This wouldn’t fly at my employer but a lot of projects are going to be a lot less scrupulous.

      You can see the effects in popular consumer facing apps too: Anthropic has drunk way too much of its own koolaid and now I get 10-50% failure rates on messages in their iOS app depending on the day. Some of their devs have publicly said that Claude writes 100% of their code and its starting to show. Intermittent network failures and retries have been a solved problem for decades, ffs!

    • jdjdjssh 17 hours ago

      > If the code passes tests, and also works at the functionality level

      Why doesn’t outsourcing work if this is all that is needed?

      • jmathai 17 hours ago

        We haven’t fully proven that it is any different. Not at scale anyway. It took a decade for the seams of outsourcing to break.

        But I have a hypothesis.

        The quality of the output, when you don’t own the long term outcome or maintenance, is very poor.

        This is not the case with AI in the same sense it is with human contractors.

      • simianwords 17 hours ago

        Why do we have managers if managers don’t have accountability?

        • jdjdjssh 17 hours ago

          I’m not sure what you’re getting at. I’m saying there’s a lot more to creating useful software than “tests pass / limited functionality checks work” from a purely technical perspective.

  • CuriouslyC 16 hours ago

    You're countering vibes with vibes.

    If the tests aren't good enough, break them. Red team your own software. Exploit your systems. "Sitting with the code" is some Henry David Thoreau bullshit, because it provides exactly 0 value to anyone else, whereas red teamed exploits are objective.

    • kranner 15 hours ago

      The way you come up with ideas on how to break, red team and exploit; when to do this and how to stop: that part is not objective. The machine can't do this for you sufficiently well. There is a subjective process in there that you're not acknowledging.

      It's a good approach! It's just more 'negative space' than direct.

      • CuriouslyC 15 hours ago

        People who pentest spend more time running a playbook than puzzling over the logical problem of how to break a piece of software. Even a lot of zero days are more about knowing a pattern and mass scanning for it across a lot of code than playing chess vs a codebase and winning.

        • kranner 15 hours ago

          Fine, but is that the entirely of software development? It even seems a waste of time by your own reasoning if it's so automatable already.

    • nkohari 14 hours ago

      You're over-rotating on security. Not that it isn't important, but there are other dimensions to software that benefit heavily from the author having a deep understanding of the code that's being created.

randusername 16 hours ago

No more AI thought pieces until you tell us what you build!

AI is a general-purpose tool, but that doesn't mean best-practices and wisdom are generalizable. Web dev is different than compilers which is different than embedded and all the differences of opinion in the comments never explain who does what.

That said, I would take this up a notch:

> If you ask AI to write a document for you, you might get 80% of the deep quality you’d get if you wrote it yourself for 5% of the effort. But, now you’ve also only done 5% of the thinking.

Writing _is_ the thinking. It's a critical input in developing good taste. I think we all ought to consider a maintenance dose. Write your own code without assistance on whatever interval makes sense to you, otherwise you'll atrophy those muscles. Best-practices are a moving train, not something that you learned once and you're done.

  • bodge5000 16 hours ago

    > No more AI thought pieces until you tell us what you build!

    Absolutely agree with this, the ratio of talk to output is insane, especially when the talk is all about how much better output is. So far the only example I've seen is Claude Code which is mired in its own technical problems and is literally built by an AI company.

    > Write your own code without assistance on whatever interval makes sense to you, otherwise you'll atrophy those muscles

    This is the one thing that concerns me, for the same reason as "AI writes the code, humans review it" does. The fact of the matter is, most people will get lazy and complacent pretty quickly, and the depth of which they review the code/ the frequency they "go it alone" will get less and less until eventually it just stops happening. We all (most of us anyway) do it, its just part of being human, for the same reason that thousands of people start going to the gym in January and stop by March.

    Arguably, AI coding was at its best when it was pretty bad, because you HAD to review it frequently and there were immediate incentives to just take the keyboard and do it yourself sometimes. Now, we still have some serious faults, they're just not as immediate, which will lead to complacency for a lot of people.

    Maybe one day AI will be able to reliably write the 100% of the code without review. The worry is that we stop paying attention first, which all in all looks quite likely

    • Kerrick 14 hours ago

      > Absolutely agree with this, the ratio of talk to output is insane, especially when the talk is all about how much better output is.

      Those of us building are having so much fun we aren't slowing down to write think pieces.

      I don't mean this flippantly. I'm a blogger. I love writing! But since a brief post on December 22 I haven't blogged because I have been too busy implementing incredible amounts of software with AI.

      Since you'll want receipts, here they are:

      - https://git.sr.ht/~kerrick/ratatui_ruby/tree/trunk/item/READ...

      - https://git.sr.ht/~kerrick/rooibos/tree/trunk/item/README.rd...

      - https://git.sr.ht/~kerrick/tokra/tree

      Between Christmas and New Year's Day I was on vacation, so I had plenty of time. Since then, it's only been nights & weekends (and some early mornings and lunch breaks).

      • yojat661 7 hours ago

        Are these software popular? Are these maintainable long term? Are you getting feedback from users?

    • g8oz 15 hours ago

      But simonw said that dark factories are the way forward /s

      • malfist 12 hours ago

        Lights out manufacturing is always the boogieman that's being built or coming tomorrow. Never seems to happen though. The Wikipedia article for it only cites two such factories, and at least one of them requires humans still and isn't fully lights out.

  • bandrami 8 hours ago

    I'm really losing patience with AI fluff pieces mostly because I'm a sysadmin and the guys in dev cannot stop talking about how much they're producing with their agents or MCPs or whatever they're called this week and yet they're simply not getting me software to put into production any faster (it's actually slowed down slightly but that's consistent with previous product lifecycles). Neither are any of our vendors, at least as far as I can tell. Even Agile a quarter of a century ago actually produced software at a somewhat faster rate than the team had before (at the cost of more breakage, but there's a business case to be made for that) but I'm literally just not seeing an uptick here.

  • bartread 14 hours ago

    > No more AI thought pieces until you tell us what you build!

    Let me fix that for you:

    No more AI thought pieces until you SHOW us what you build!

    And I think it can safely be generalised to:

    No more thought pieces until you show us what you build!

    And that goes double for posts on LinkedIn.

    • strange_quark 13 hours ago

      I’d take it even further: I want to see the prompts! I’ll concede we’ve arrived at a point where the coding agents work, but I’m still unconvinced they’re actually saving anyone time. From my AI-pilled coworkers, it appears they’re all spending half an hour creating a plan with hyper-specific prompts, like “make this specific change to lines XXX-YYY in this file”, “move this function over here and change the args”, etc. As far as I can tell, this is current best practice, and I’ve tried it, but I’m always disappointed with the either the quality or the speed, usually both.

      • dullcrisp 12 hours ago

        Look learning vim is hard some people just want to describe their edits using natural language.

      • tempodox 12 hours ago

        Yea, if I were interested in that level of micromanagement, I would have become a manager.

  • gchamonlive 15 hours ago

    Even if writing is thinking, which I don't think it's the case as writing is also feeling, but thinking isn't exclusively writing. Form is very important and AI can very well help you materialize an insight in a way that makes sense for a wider audience. The thinking is still all yours, only the aesthetics is refined by the AI with your guidance, depending on how you use AI.

  • fourthrigbt 13 hours ago

    The product they build is literally mentioned in the post? It’s one of the more popular personal finance/budgeting apps, and it’s a pretty good one in my opinion as someone who has used a variety of them.

  • Ozzie_osman 16 hours ago

    > No more AI thought pieces until you tell us what you build!

    We build a personal finance tool (referenced in the article). It's a web/mobile/backend stack (mostly React and Python). That said, I think a lot of the principles are generalizable.

    > Writing _is_ the thinking. It's a critical input in developing good taste.

    Agree, but I'll add that _good_ prompt-writing actually requires a lot of thought (which is what makes it so easy to write bad prompts, which are much more likely to produce slop).

  • [removed] 15 hours ago
    [deleted]
  • doug_durham 15 hours ago

    Ample evidence of production software being produced with the aid of AI tools has been provided here on HN over the last year or more. This is a tiresome response. A later response says exactly what they produce.

    • daveguy 14 hours ago

      Most of what I see are toys. Could you point us to the examples of production software from AI? I feel like I see more "stop spamming us with AI slop" stories from open source than production software from AI. Would love some concrete examples. Specifically of major refactors or ground up projects. Not, "we just stared using AI in our production software." Because it can take a while to change the quality (better or worse) of a whole existing code base.

      • doug_durham 10 hours ago

        "Us"??? Most of "us" don't need to be convinced that AI as a software development tool has merit. The comment literally two below my comment says that they develop banking software. At this point you can be confident that most of the software that you use that has had recent updates has been developed with the aid of AI. Its use is ubiquitous.

        • daveguy 7 hours ago

          I didn't say AI as a software development doesn't have merit. I asked what production software was being produced from or predominantly with AI tools. I just see a lot more examples of "stop the slop" than I do of positive stories about AI being used to build something from scratch. I was hoping you had a concrete example in all of the hay. Are my expectations based on the hype too high?

          That wasn't supposed to be an opportunity for you to get defensive, but an opportunity for you to show off awesome projects.

      • willtemperley 13 hours ago

        I imagine people who are shipping with AI aren’t talking about it. Doing so makes no business sense.

        Those not shipping are talking about it.

      • jama211 14 hours ago

        Sounds like you’ve got multiple ways to write off any example you’re given charged up and at the ready.

        • daveguy 12 hours ago

          I was just asking for a non-confounded example of what was claimed. But okay.

  • jama211 13 hours ago

    “Writing is the thinking” is a controversial and open to interpretation take, so everyone’s gonna argue about it. You muddied the water with that one.

  • CuriouslyC 16 hours ago

    I don't get why people think AI takes the thought out of writing. I write a lot, and when I start hitting keys on keyboards, I already know what I'm going to say 100% of the time, the struggle is just getting the words out in a way that maximizes reader comprehension/engagement.

    I'd go so far as to say that for the vast majority of people, if you don't know what you're going to say when you sit down to write, THAT is your problem. Writing is not thinking, thinking is thinking, and you didn't think. If you're trying to think when you should be writing, that's a process failure. If you're not Stephen King or Dean Koontz, trying to be a pantser with your writing is a huge mistake.

    What AI is amazing for is taking a core idea/thesis you provide it, and asking you a ton of questions to extract your knowledge/intent, then crystallizing that into an outline/rough draft.

    • kranner 15 hours ago

      And how do you know you're not as good as Stephen King or Dean Koontz if you never even try? What AI seems to be amazing for is to persuade people to freeze themselves in place and accept their overlord-assigned stations in life.

      You're free to adopt this cynical and pessimistic outlook if you like but you're going a bit far trying to force it on others. Gawd.

      • CuriouslyC 15 hours ago

        "How do you know Michael Phelps's training routine isn't right for you if you don't spend 6 months following it?"

        If your argument starts with "how do you know you're not the best in the world unless you try" you fucked up.

simianwords 18 hours ago

This is one of the more true and balanced articles.

On the verification loop: I think there’s so much potential here. AI is pretty good at autonomously working on tasks that have a well defined and easy to process verification hook.

A lot of software tasks are “migrate X to Y” and this is a perfect job for AI.

The workflow is generally straightforward - map the old thing to the new thing and verify that the new thing works the same way. Most of this can be automated using AI.

Wanna migrate codebase from C to Rust? I definitely think it should be possible autonomously if the code base is small enough. You do have to ask the AI to intelligently come up with extensive way to verify that they work the same. Maybe UI check, sample input and output check on API and functionality check.

  • akiselev 18 hours ago

    > On the verification loop: I think there’s so much potential here. AI is pretty good at autonomously working on tasks that have a well defined and easy to process verification hook.

    It's scary how good it's become with Opus 4.5. I've been experimenting with giving it access to Ghidra and a debugger [1] for reverse engineering and it's just been plowing through crackmes (from sites like crackmes.one where new ones are released constantly). I haven't bothered trying to have it crack any software but I wouldn't be surprised if it was effective at that too.

    I'm also working through reverse engineering several file formats by just having it write CLI scripts to export them to JSON then recreate the input file byte by byte with an import command, using either CLI hex editors or custom diff scripts (vibe coded by the agent).

    I still get routinely frustrated trying to use it for anything complicated but whole classes of software development problems have been reduced to vibe coding that feedback loop and then blowing through Claude Max rate limits.

    [1] Shameless plug: https://github.com/akiselev/ghidra-cli https://github.com/akiselev/debugger-cli

    • rubenflamshep 13 hours ago

      I'm in the same loop where I find the more access I give it to systems and feedback mechanisms the more powerful it is. There's a lot of leverage in building those feedback systems. With the obvious caveat about footguns :P

      Gave one of the repos a star as it's a cool example of what people are building with AI. Most common question on HN seems to be "what are people building". Well, stuff like this.

      • akiselev 13 hours ago

        > Most common question on HN seems to be "what are people building". Well, stuff like this.

        Hear, hear! I’ve got my altium-cli repo open source in Github as well, which is a vibe coded CLI for editing vibe reverse engineered Altium PCB projects. It’s not yet ready for primetime (I’m finishing up the file format reverse engineering this weekend) and the code quality is probably something twelve year old me would have been embarrassed by, but I can already use it and Claude/Gemini to automate a lot of the tedious parts of PCB design like part selection and footprints. I’m almost to the point where Claude Code can use it for the entire EE workflow from part selection to firmware, minus the PCB routing which I still do by hand.

        I just ain’t wasting time blogging about it so unless someone stumbles onto it randomly by lurking on HN, they won’t know that Claude Code can now work on PCBs.

willtemperley 20 hours ago

I'm very happy with the chat interface thanks.

* The interface is near identical across bots

* I can switch bots whenever I like. No integration points and vendor lock-in.

* It's the same risk as any big-tech website.

* I really don't need more tooling in my life.

  • simianwords 18 hours ago

    I think the agents are also becoming fungible at the integration layer.

    Any coding agent should be easily to whatever IDE or workflow you need.

    The agents are not full fungible though. Each have their own characteristics.

srinath693 14 hours ago

This resonates a lot. I’ve found that staying slightly behind the bleeding edge with AI tools actually leads to more consistent productivity. The early-stage tools often look impressive in demos but add cognitive overhead and unpredictability in real workflows.

Waiting until patterns stabilize, better UX, clearer failure modes, and community best practices, tends to give a much better long-term payoff.

dehrmann 13 hours ago

This is a very sound take:

> Will AI replace my job?

> If you consider your job to be “typing code into an editor”, AI will replace it (in some senses, it already has). On the other hand, if you consider your job to be “to use software to build products and/or solve problems”, your job is just going to change and get more interesting.

andrethegiant 14 hours ago

Love Monarch. I would love to see the team apply AI to build missing institutions (Robinhood CC, Accrue) faster.

OsamaJaber 18 hours ago

The hardest part of (a step behind) is knowing when something is crossed over How do you decide when a tool is mature enough to adopt?

  • simianwords 17 hours ago

    This requires good intuition and being unemotional.

    Lots of people who become successful are the ones who can get this prediction correct.

    • OsamaJaber 12 hours ago

      Agreed but being unemotional easy to say than to do :D

Animats 9 hours ago

The distance between manager-speak and LLM-speak is very small. Who wrote that?

piker 19 hours ago

> “Their (ie the document’s) value stems from the discipline and the thinking the writer is forced to impose upon himself as [she] identifies and deals with trouble spots”.

Real quote

> "Hence their value stems from the discipline and the thinking the writer is forced to impose upon himself as he identifies and deals with trouble spots in his presentation."

I mean seriously?

satisfice 18 hours ago

The one thing disagree with is having the AI do its own verification. I explicitly instruct it never to check anything unless I ask it to.

This is better because I use my own test as a forcing function to learn and understand what the AI has done. Only after primary testing might I tell it to do checking for itself.

578_Observer 15 hours ago

Your point about avoiding the "bleeding edge" touches on a fundamental principle of endurance that is often ignored in the current AI gold rush. This philosophy is a calculated defense of a legacy—the invisible ledger of trust built over generations.

As a former local banker in Japan who spent decades appraising the intangible assets of businesses that have survived for centuries, I’ve learned that true mastery is found in stability, not novelty. In an era of rapid AI acceleration, the real risk is gambling your institutional reputation on unproven, volatile tools.

By 2026, when every “How” is a cheap commodity, the only thing that commands a premium is the “Why”—the core of human judgment. Staying a step behind the hype allows you to keep your hands on the steering wheel while the rest of the market is consumed by the noise. Stability is the ultimate luxury.