A Step Behind the Bleeding Edge: A Philosophy on AI in Dev

145 points by Ozzie_osman 3 days ago

kranner 19 hours ago

> If you ask AI to write a document for you, you might get 80% of the deep quality you’d get if you wrote it yourself for 5% of the effort. But, now you’ve also only done 5% of the thinking.

This, but also for code. I just don't trust new code, especially generated code; I need time to sit with it. I can't make the "if it passes all the tests" crowd understand and I don't even want to. There are things you think of to worry about and test for as you spend time with a system. If I'm going to ship it and support it, it will take as long as it will take.

Reply View 21 replies

jdjdjssh 17 hours ago

Yep, this is the big sticking point. Reviewing code properly is and was the bottle neck. However, with humans I trusted, I could ignore most of their work and focus on where they knew they needed a review. That kind of trust is worth a lot of money and lets you move really fast.
> I need time to sit with it
Everyone knows doing the work yourself is faster than reviewing somebody elses if you don’t trust them. I’d argue if AI ever gets to the point where you fully trust it, all white collar jobs are gone.

Reply View | 0 replies
layer8 18 hours ago

Yes, regression tests are not enough. One generally has to think through code repeatedly, with different aspects in mind, to convince oneself that it is correct under all circumstances. Tests only point-check, they don’t ensure correct behavior under all conceivable scenarios.

Reply View | 1 reply
- doug_durham 15 hours ago
  
  Unless you are in the business of writing flight control software, OS kernels, or critical financial software, I don't think your own code will reach the standards you mention. The only way we get "correct under all conceivable scenarios" software is to have a large team with long time horizons and large funding working on a small piece of software. It is beyond an individual to reach that standard for anything beyond code at the function level.
  
  Reply View | 0 replies
slfreference 18 hours ago

I think what LLMs do with words is similar to what artists do with software like cinema4d.
We have control points (prompts + context) and we ask LLMs to draw a 3D surface which passes through those points satisfying some given constraints. Subsequent chats are like edit operations.
https://youtu.be/-5S2qs32PII

Reply View | 1 reply
- catdog 18 hours ago
  
  An LLM is an impressive, yet still imperfect and unpredictable translation machine. The code it outputs can only be as good as your prompt is precise, minus the often blatant mistakes it makes.
  
  Reply View | 0 replies
simianwords 18 hours ago

Honest question: why is this not enough?
If the code passes tests, and also works at the functionality level - what difference does it make if you’ve read the code or not?
You could come up with pathological cases like: it passed the tests by deleting them. And the code written by it is extremely messy.
But we know that LLMs are way smarter than this. There’s very very low chance of this happening and even if it does - it quick glance at code can fix it.

Reply View | 10 replies
- kranner 17 hours ago
  
  You can't test everything. The input space may be infinite. The app may feel janky. You can't even be sure you're testing all that can be tested.
  The code may seem to work functionally on day 1. Will it continue to seem to work on day 30? Most often it doesn't.
  And in my experience, the chances of LLMs fucking up are hardly very very low. Maybe it's a skill issue on my part, but it's also the case that the spec is sometimes discovered as the app is being built. I'm sure this is not the case if you're essentially summoning up code that exists in the test set, even if the LLM has to port it from another language, and they can be useful in parts here and there. But turning the controls over to the infinite monkey machine has not worked out for me so far.
  
  Reply View | 3 replies
  
  CuriouslyC 16 hours ago
  
  If you care about performance, test it (stress test).
  If you care about security, test it (red teaming).
  If you care about maintainability, test it (advanced code analysis)
  Your eyeballs are super fallible, this is why bad engineers exist. Get rigorous.
  
  Reply View | 2 replies
- gengstrand 14 hours ago
  
  Good question. Several reasons.
  1. Since the same AI writes both the code and the unit tests, it stands to reason that both could be influenced by the same hallucinations.
  2. Having a dev on call reduces time to restore service because the dev is familiar with the code. If developers stop reviewing code, they won't be familiar with it and won't be as effective. I am currently unaware of any viable agentic AI substitute for a dev on call capability.
  3. There may be legal or compliance standards regarding due diligence which won't get met if developers are no longer familiar with the code.
  I have blogged about this recently at https://www.exploravention.com/blogs/soft_arch_agentic_ai/
  
  Reply View | 0 replies
- throwup238 17 hours ago
  
  It depends on the scale of complexity you’re working at and who your users are going to be. I’ve found that it’s trivial to have Claude Code spit out so much functionality that even just proper manually verifying it becomes a gargantuan task. I end up just manually testing the pieces I’m familiar with which is fine if there’s a QA department who can do a full run through of the feature and are prepared to deal with vibe coding pitfalls, but not so much on open source projects where slop gets shipped and unfamiliar users get stuck with bugs they can’t possibly troubleshoot. Writing the code from scratch The Old Way™ leaves a lot less room for shipping convincing but non functional slop because the dev has to work through it before shipping.
  The most immediate example I can think of is the beans LLM workflow tracker. It’s insane that its measured in the 100s of thousands of LoC and getting that thing setup in a repo is a mess. I had to use Github copilot to investigate the repo to get the latest method. This wouldn’t fly at my employer but a lot of projects are going to be a lot less scrupulous.
  You can see the effects in popular consumer facing apps too: Anthropic has drunk way too much of its own koolaid and now I get 10-50% failure rates on messages in their iOS app depending on the day. Some of their devs have publicly said that Claude writes 100% of their code and its starting to show. Intermittent network failures and retries have been a solved problem for decades, ffs!
  
  Reply View | 0 replies
- jdjdjssh 17 hours ago
  
  > If the code passes tests, and also works at the functionality level
  Why doesn’t outsourcing work if this is all that is needed?
  
  Reply View | 3 replies
  
  jmathai 17 hours ago
  
  We haven’t fully proven that it is any different. Not at scale anyway. It took a decade for the seams of outsourcing to break.
  But I have a hypothesis.
  The quality of the output, when you don’t own the long term outcome or maintenance, is very poor.
  This is not the case with AI in the same sense it is with human contractors.
  
  Reply View | 0 replies
  
  simianwords 17 hours ago
  
  Why do we have managers if managers don’t have accountability?
  
  Reply View | 1 reply
  
  jdjdjssh 17 hours ago
  
  I’m not sure what you’re getting at. I’m saying there’s a lot more to creating useful software than “tests pass / limited functionality checks work” from a purely technical perspective.
  
  Reply View | 0 replies
CuriouslyC 16 hours ago

You're countering vibes with vibes.
If the tests aren't good enough, break them. Red team your own software. Exploit your systems. "Sitting with the code" is some Henry David Thoreau bullshit, because it provides exactly 0 value to anyone else, whereas red teamed exploits are objective.

Reply View | 4 replies
- kranner 15 hours ago
  
  The way you come up with ideas on how to break, red team and exploit; when to do this and how to stop: that part is not objective. The machine can't do this for you sufficiently well. There is a subjective process in there that you're not acknowledging.
  It's a good approach! It's just more 'negative space' than direct.
  
  Reply View | 2 replies
  
  CuriouslyC 15 hours ago
  
  People who pentest spend more time running a playbook than puzzling over the logical problem of how to break a piece of software. Even a lot of zero days are more about knowing a pattern and mass scanning for it across a lot of code than playing chess vs a codebase and winning.
  
  Reply View | 1 reply
  
  kranner 15 hours ago
  
  Fine, but is that the entirely of software development? It even seems a waste of time by your own reasoning if it's so automatable already.
  
  Reply View | 0 replies
- nkohari 14 hours ago
  
  You're over-rotating on security. Not that it isn't important, but there are other dimensions to software that benefit heavily from the author having a deep understanding of the code that's being created.
  
  Reply View | 0 replies

randusername 16 hours ago

No more AI thought pieces until you tell us what you build!

AI is a general-purpose tool, but that doesn't mean best-practices and wisdom are generalizable. Web dev is different than compilers which is different than embedded and all the differences of opinion in the comments never explain who does what.

That said, I would take this up a notch:

> If you ask AI to write a document for you, you might get 80% of the deep quality you’d get if you wrote it yourself for 5% of the effort. But, now you’ve also only done 5% of the thinking.

Writing _is_ the thinking. It's a critical input in developing good taste. I think we all ought to consider a maintenance dose. Write your own code without assistance on whatever interval makes sense to you, otherwise you'll atrophy those muscles. Best-practices are a moving train, not something that you learned once and you're done.

Reply View 38 replies

bodge5000 16 hours ago

> No more AI thought pieces until you tell us what you build!
Absolutely agree with this, the ratio of talk to output is insane, especially when the talk is all about how much better output is. So far the only example I've seen is Claude Code which is mired in its own technical problems and is literally built by an AI company.
> Write your own code without assistance on whatever interval makes sense to you, otherwise you'll atrophy those muscles
This is the one thing that concerns me, for the same reason as "AI writes the code, humans review it" does. The fact of the matter is, most people will get lazy and complacent pretty quickly, and the depth of which they review the code/ the frequency they "go it alone" will get less and less until eventually it just stops happening. We all (most of us anyway) do it, its just part of being human, for the same reason that thousands of people start going to the gym in January and stop by March.
Arguably, AI coding was at its best when it was pretty bad, because you HAD to review it frequently and there were immediate incentives to just take the keyboard and do it yourself sometimes. Now, we still have some serious faults, they're just not as immediate, which will lead to complacency for a lot of people.
Maybe one day AI will be able to reliably write the 100% of the code without review. The worry is that we stop paying attention first, which all in all looks quite likely

Reply View | 6 replies
- Kerrick 14 hours ago
  
  > Absolutely agree with this, the ratio of talk to output is insane, especially when the talk is all about how much better output is.
  Those of us building are having so much fun we aren't slowing down to write think pieces.
  I don't mean this flippantly. I'm a blogger. I love writing! But since a brief post on December 22 I haven't blogged because I have been too busy implementing incredible amounts of software with AI.
  Since you'll want receipts, here they are:
  - https://git.sr.ht/~kerrick/ratatui_ruby/tree/trunk/item/READ...
  - https://git.sr.ht/~kerrick/rooibos/tree/trunk/item/README.rd...
  - https://git.sr.ht/~kerrick/tokra/tree
  Between Christmas and New Year's Day I was on vacation, so I had plenty of time. Since then, it's only been nights & weekends (and some early mornings and lunch breaks).
  
  Reply View | 2 replies
  
  yojat661 7 hours ago
  
  Are these software popular? Are these maintainable long term? Are you getting feedback from users?
  
  Reply View | 0 replies
  
  dingnuts 13 hours ago
  
  [dead]
  
  Reply View | 0 replies
- g8oz 15 hours ago
  
  But simonw said that dark factories are the way forward /s
  
  Reply View | 2 replies
  
  malfist 12 hours ago
  
  Lights out manufacturing is always the boogieman that's being built or coming tomorrow. Never seems to happen though. The Wikipedia article for it only cites two such factories, and at least one of them requires humans still and isn't fully lights out.
  
  Reply View | 1 reply
  
  kranner 5 hours ago
  
  It's a reference to this article about entirely automated software production (eventually and hypothetical): https://news.ycombinator.com/item?id=46739117
  
  Reply View | 0 replies
bandrami 8 hours ago

I'm really losing patience with AI fluff pieces mostly because I'm a sysadmin and the guys in dev cannot stop talking about how much they're producing with their agents or MCPs or whatever they're called this week and yet they're simply not getting me software to put into production any faster (it's actually slowed down slightly but that's consistent with previous product lifecycles). Neither are any of our vendors, at least as far as I can tell. Even Agile a quarter of a century ago actually produced software at a somewhat faster rate than the team had before (at the cost of more breakage, but there's a business case to be made for that) but I'm literally just not seeing an uptick here.

Reply View | 0 replies
bartread 14 hours ago

> No more AI thought pieces until you tell us what you build!
Let me fix that for you:
No more AI thought pieces until you SHOW us what you build!
And I think it can safely be generalised to:
No more thought pieces until you show us what you build!
And that goes double for posts on LinkedIn.

Reply View | 3 replies
- strange_quark 13 hours ago
  
  I’d take it even further: I want to see the prompts! I’ll concede we’ve arrived at a point where the coding agents work, but I’m still unconvinced they’re actually saving anyone time. From my AI-pilled coworkers, it appears they’re all spending half an hour creating a plan with hyper-specific prompts, like “make this specific change to lines XXX-YYY in this file”, “move this function over here and change the args”, etc. As far as I can tell, this is current best practice, and I’ve tried it, but I’m always disappointed with the either the quality or the speed, usually both.
  
  Reply View | 2 replies
  
  dullcrisp 12 hours ago
  
  Look learning vim is hard some people just want to describe their edits using natural language.
  
  Reply View | 0 replies
  
  tempodox 12 hours ago
  
  Yea, if I were interested in that level of micromanagement, I would have become a manager.
  
  Reply View | 0 replies
gchamonlive 15 hours ago

Even if writing is thinking, which I don't think it's the case as writing is also feeling, but thinking isn't exclusively writing. Form is very important and AI can very well help you materialize an insight in a way that makes sense for a wider audience. The thinking is still all yours, only the aesthetics is refined by the AI with your guidance, depending on how you use AI.

Reply View | 0 replies
fourthrigbt 13 hours ago

The product they build is literally mentioned in the post? It’s one of the more popular personal finance/budgeting apps, and it’s a pretty good one in my opinion as someone who has used a variety of them.

Reply View | 0 replies
solidasparagus 13 hours ago

The first sentence of the blog post has a link to the product he is building - https://www.monarch.com/

Reply View | 1 reply
- twism 12 hours ago
  
  with the AI
  
  Reply View | 0 replies
Ozzie_osman 16 hours ago

> No more AI thought pieces until you tell us what you build!
We build a personal finance tool (referenced in the article). It's a web/mobile/backend stack (mostly React and Python). That said, I think a lot of the principles are generalizable.
> Writing _is_ the thinking. It's a critical input in developing good taste.
Agree, but I'll add that _good_ prompt-writing actually requires a lot of thought (which is what makes it so easy to write bad prompts, which are much more likely to produce slop).

Reply View | 0 replies
[removed] 15 hours ago

[deleted]

Reply View | 0 replies
doug_durham 15 hours ago

Ample evidence of production software being produced with the aid of AI tools has been provided here on HN over the last year or more. This is a tiresome response. A later response says exactly what they produce.

Reply View | 10 replies
- daveguy 14 hours ago
  
  Most of what I see are toys. Could you point us to the examples of production software from AI? I feel like I see more "stop spamming us with AI slop" stories from open source than production software from AI. Would love some concrete examples. Specifically of major refactors or ground up projects. Not, "we just stared using AI in our production software." Because it can take a while to change the quality (better or worse) of a whole existing code base.
  
  Reply View | 9 replies
  
  doug_durham 10 hours ago
  
  "Us"??? Most of "us" don't need to be convinced that AI as a software development tool has merit. The comment literally two below my comment says that they develop banking software. At this point you can be confident that most of the software that you use that has had recent updates has been developed with the aid of AI. Its use is ubiquitous.
  
  Reply View | 1 reply
  
  daveguy 7 hours ago
  
  I didn't say AI as a software development doesn't have merit. I asked what production software was being produced from or predominantly with AI tools. I just see a lot more examples of "stop the slop" than I do of positive stories about AI being used to build something from scratch. I was hoping you had a concrete example in all of the hay. Are my expectations based on the hype too high?
  That wasn't supposed to be an opportunity for you to get defensive, but an opportunity for you to show off awesome projects.
  
  Reply View | 0 replies
  
  willtemperley 13 hours ago
  
  I imagine people who are shipping with AI aren’t talking about it. Doing so makes no business sense.
  Those not shipping are talking about it.
  
  Reply View | 3 replies
  
  jama211 14 hours ago
  
  Sounds like you’ve got multiple ways to write off any example you’re given charged up and at the ready.
  
  Reply View | 1 reply
  
  daveguy 12 hours ago
  
  I was just asking for a non-confounded example of what was claimed. But okay.
  
  Reply View | 0 replies
  
  szundi 14 hours ago
  
  [dead]
  
  Reply View | 0 replies
jama211 13 hours ago

“Writing is the thinking” is a controversial and open to interpretation take, so everyone’s gonna argue about it. You muddied the water with that one.

Reply View | 0 replies
CuriouslyC 16 hours ago

I don't get why people think AI takes the thought out of writing. I write a lot, and when I start hitting keys on keyboards, I already know what I'm going to say 100% of the time, the struggle is just getting the words out in a way that maximizes reader comprehension/engagement.
I'd go so far as to say that for the vast majority of people, if you don't know what you're going to say when you sit down to write, THAT is your problem. Writing is not thinking, thinking is thinking, and you didn't think. If you're trying to think when you should be writing, that's a process failure. If you're not Stephen King or Dean Koontz, trying to be a pantser with your writing is a huge mistake.
What AI is amazing for is taking a core idea/thesis you provide it, and asking you a ton of questions to extract your knowledge/intent, then crystallizing that into an outline/rough draft.

Reply View | 7 replies
- kranner 15 hours ago
  
  And how do you know you're not as good as Stephen King or Dean Koontz if you never even try? What AI seems to be amazing for is to persuade people to freeze themselves in place and accept their overlord-assigned stations in life.
  You're free to adopt this cynical and pessimistic outlook if you like but you're going a bit far trying to force it on others. Gawd.
  
  Reply View | 6 replies
  
  CuriouslyC 15 hours ago
  
  "How do you know Michael Phelps's training routine isn't right for you if you don't spend 6 months following it?"
  If your argument starts with "how do you know you're not the best in the world unless you try" you fucked up.
  
  Reply View | 5 replies

simianwords 18 hours ago

This is one of the more true and balanced articles.

On the verification loop: I think there’s so much potential here. AI is pretty good at autonomously working on tasks that have a well defined and easy to process verification hook.

A lot of software tasks are “migrate X to Y” and this is a perfect job for AI.

The workflow is generally straightforward - map the old thing to the new thing and verify that the new thing works the same way. Most of this can be automated using AI.

Wanna migrate codebase from C to Rust? I definitely think it should be possible autonomously if the code base is small enough. You do have to ask the AI to intelligently come up with extensive way to verify that they work the same. Maybe UI check, sample input and output check on API and functionality check.

Reply View 5 replies

akiselev 18 hours ago

> On the verification loop: I think there’s so much potential here. AI is pretty good at autonomously working on tasks that have a well defined and easy to process verification hook.
It's scary how good it's become with Opus 4.5. I've been experimenting with giving it access to Ghidra and a debugger [1] for reverse engineering and it's just been plowing through crackmes (from sites like crackmes.one where new ones are released constantly). I haven't bothered trying to have it crack any software but I wouldn't be surprised if it was effective at that too.
I'm also working through reverse engineering several file formats by just having it write CLI scripts to export them to JSON then recreate the input file byte by byte with an import command, using either CLI hex editors or custom diff scripts (vibe coded by the agent).
I still get routinely frustrated trying to use it for anything complicated but whole classes of software development problems have been reduced to vibe coding that feedback loop and then blowing through Claude Max rate limits.
[1] Shameless plug: https://github.com/akiselev/ghidra-cli https://github.com/akiselev/debugger-cli

Reply View | 4 replies
- rubenflamshep 13 hours ago
  
  I'm in the same loop where I find the more access I give it to systems and feedback mechanisms the more powerful it is. There's a lot of leverage in building those feedback systems. With the obvious caveat about footguns :P
  Gave one of the repos a star as it's a cool example of what people are building with AI. Most common question on HN seems to be "what are people building". Well, stuff like this.
  
  Reply View | 3 replies
  
  akiselev 13 hours ago
  
  > Most common question on HN seems to be "what are people building". Well, stuff like this.
  Hear, hear! I’ve got my altium-cli repo open source in Github as well, which is a vibe coded CLI for editing vibe reverse engineered Altium PCB projects. It’s not yet ready for primetime (I’m finishing up the file format reverse engineering this weekend) and the code quality is probably something twelve year old me would have been embarrassed by, but I can already use it and Claude/Gemini to automate a lot of the tedious parts of PCB design like part selection and footprints. I’m almost to the point where Claude Code can use it for the entire EE workflow from part selection to firmware, minus the PCB routing which I still do by hand.
  I just ain’t wasting time blogging about it so unless someone stumbles onto it randomly by lurking on HN, they won’t know that Claude Code can now work on PCBs.
  
  Reply View | 2 replies

willtemperley 20 hours ago

I'm very happy with the chat interface thanks.

* The interface is near identical across bots

* I can switch bots whenever I like. No integration points and vendor lock-in.

* It's the same risk as any big-tech website.

* I really don't need more tooling in my life.

Reply View 3 replies

simianwords 18 hours ago

I think the agents are also becoming fungible at the integration layer.
Any coding agent should be easily to whatever IDE or workflow you need.
The agents are not full fungible though. Each have their own characteristics.

Reply View | 0 replies
audience_mem 13 hours ago

You don't know what you're missing.

Reply View | 0 replies
jama211 13 hours ago

Ok?

Reply View | 0 replies

srinath693 14 hours ago

This resonates a lot. I’ve found that staying slightly behind the bleeding edge with AI tools actually leads to more consistent productivity. The early-stage tools often look impressive in demos but add cognitive overhead and unpredictability in real workflows.

Waiting until patterns stabilize, better UX, clearer failure modes, and community best practices, tends to give a much better long-term payoff.

Reply View 0 replies

dehrmann 13 hours ago

This is a very sound take:

> Will AI replace my job?

> If you consider your job to be “typing code into an editor”, AI will replace it (in some senses, it already has). On the other hand, if you consider your job to be “to use software to build products and/or solve problems”, your job is just going to change and get more interesting.

Reply View 0 replies

andrethegiant 14 hours ago

Love Monarch. I would love to see the team apply AI to build missing institutions (Robinhood CC, Accrue) faster.

Reply View 0 replies

OsamaJaber 18 hours ago

The hardest part of (a step behind) is knowing when something is crossed over How do you decide when a tool is mature enough to adopt?

Reply View 2 replies

simianwords 17 hours ago

This requires good intuition and being unemotional.
Lots of people who become successful are the ones who can get this prediction correct.

Reply View | 1 reply
- OsamaJaber 12 hours ago
  
  Agreed but being unemotional easy to say than to do :D
  
  Reply View | 0 replies

Animats 9 hours ago

The distance between manager-speak and LLM-speak is very small. Who wrote that?

Reply View 0 replies

piker 19 hours ago

> “Their (ie the document’s) value stems from the discipline and the thinking the writer is forced to impose upon himself as [she] identifies and deals with trouble spots”.

Real quote

> "Hence their value stems from the discipline and the thinking the writer is forced to impose upon himself as he identifies and deals with trouble spots in his presentation."

I mean seriously?

Reply View 0 replies

satisfice 18 hours ago

The one thing disagree with is having the AI do its own verification. I explicitly instruct it never to check anything unless I ask it to.

This is better because I use my own test as a forcing function to learn and understand what the AI has done. Only after primary testing might I tell it to do checking for itself.

Reply View 0 replies

lighthouse1212 8 hours ago

[dead]

Reply View 0 replies

asyncadventure 19 hours ago

[flagged]

Reply View 1 reply

whatevermom5 18 hours ago

Would be nice if we had a dedicated button to flag AI comments.

Reply View | 0 replies

578_Observer 15 hours ago

Your point about avoiding the "bleeding edge" touches on a fundamental principle of endurance that is often ignored in the current AI gold rush. This philosophy is a calculated defense of a legacy—the invisible ledger of trust built over generations.

As a former local banker in Japan who spent decades appraising the intangible assets of businesses that have survived for centuries, I’ve learned that true mastery is found in stability, not novelty. In an era of rapid AI acceleration, the real risk is gambling your institutional reputation on unproven, volatile tools.

By 2026, when every “How” is a cheap commodity, the only thing that commands a premium is the “Why”—the core of human judgment. Staying a step behind the hype allows you to keep your hands on the steering wheel while the rest of the market is consumed by the noise. Stability is the ultimate luxury.

Reply View 0 replies