Comment by Avicebron

Comment by Avicebron 3 days ago

I often feel these types of blogposts would be more helpful if they demonstrated someone using the tools to build something non-trivial.

Is Claude really "learning new skills" when you feed it a book, or does it present it like that because you're prompting encourages that sort of response-behavior. I feel like it has to demo Claude with the new skills and Claude without.

Maybe I'm a curmudgeon but most of these types of blogs feel like marketing pieces with the important bit is that so much is left unsaid and not shown, that it comes off like a kid trying to hype up their own work without the benefit of nuance or depth.

simonw 3 days ago

Here's one from today: https://mitchellh.com/writing/non-trivial-vibing

Reply View 50 replies

qsort 3 days ago

> Important: there is a lot of human coding, too.
I'm not highlighting this to gloat or to prove a point. If anything in the past I have underestimated how big LLMs were going to be. Anyone so inclined can take the chance to point and laugh at how stupid and wrong that was. Done? Great.
I don't think I've been intentionally avoiding coding assistants and as a matter of fact I have been using Claude Code since the literal day it first previewed, and yet it doesn't feel, not even one bit, that you can take your hands off the wheel. Many are acting as if writing any code manually means "you're holding it wrong", which I feel it's just not true.

Reply View | 17 replies
- simonw 3 days ago
  
  Yeah, my current opinion on this is that AI tools make development harder work. You can get big productivity boosts out of them but you have to be working at the top of your game - I often find I'm mentally exhausted after just a couple of hours.
  
  Reply View | 13 replies
  
  dotinvoke 3 days ago
  
  My experience with AI tools is the opposite. The biggest energy thieves for me are configuration issues, library quirks, or trivial mistakes that are hard to spot. With AI I can often just bulldoze past those things and spend more time on tangible results.
  When using it for code or architecture or design, I’m always watching for signs that it is going off the rails. Then I usually write code myself for a while, to keep the structure and key details of whatever I’m doing correct.
  
  Reply View | 3 replies
  
  james_marks 3 days ago
  
  100%. It’s like managing an employee that always turns their work in 30 seconds later; you never get a break.
  I also have to remember all of the new code that’s coming together, and keep it from re-inventing other parts of the codebase, etc.
  More productive, but hard work.
  
  Reply View | 0 replies
  
  sawmurai 3 days ago
  
  I have a similar experience. It feels like riding your bike in a higher gear - you can go faster but it will take more effort and you need the potential (stronger legs) to make use of it
  
  Reply View | 2 replies
  
  jstummbillig 3 days ago
  
  Considering the last 2 years, has it become harder or easier?
  
  Reply View | 2 replies
  
  Fuzzwah 3 days ago
  
  Copilot is the perfect name.
  
  Reply View | 0 replies
  
  truetraveller 3 days ago
  
  Woah, that's huge coming from you. This comment itself is worth an article. Do it. Call it "AI tools make development harder work".
  P.s. always thought you were one of those irrational AI bros. Later, found that you were super reasonable. That's the way it should be. And thank you!
  
  Reply View | 0 replies
- Pannoniae 3 days ago
  
  In fact, I've been writing more code myself since these tools exist - maybe I'm not a real developer but in the past I might have tried to either find a library online or try to find something on the internet to copypaste and adapt, nowadays I give it a shot myself with Claude.
  For context, I mainly do game development so I'm viewing it through that lens - but I find it easier to debug something bad than to write it from scratch. It's more intensive than doing it yourself but probably more productive too.
  
  Reply View | 0 replies
- scuff3d 2 days ago
  
  > Many are acting as if writing any code manually means "you're holding it wrong", which I feel it's just not true.
  It's funny because not far below this comment there is someone doing literally this.
  
  Reply View | 0 replies
- oblio 3 days ago
  
  LLMs are autonomous driving level 2.
  
  Reply View | 0 replies
j_bum 3 days ago

This was a fun read.
I’ve similarly been using spec.md and running to-do.md files that capture detailed descriptions of the problems and their scoped history. I mark each of my to-do’s with informational tags: [BUG], [FEAT], etc.
I point the LLM to the exact to-do (or section of to-do’s) with the spec.md in memory and let it work.
This has been working very well for me.

Reply View | 5 replies
- lcnPylGDnU4H9OF 3 days ago
  
  Do you mind linking to example spec/to-do files?
  
  Reply View | 4 replies
  
  j_bum 3 days ago
  
  Sure thing. Here is an example set of the agent/spec/to-do files for a hobby project I'm actively working on.
  https://gist.github.com/JacobBumgarner/d29b660cb81a227885acc...
  
  Reply View | 2 replies
  
  SteveJS 3 days ago
  
  Here is a (3 month old) repo where i did something like that and all the tasks are checked into the linear git history — https://github.com/KnowSeams/KnowSeams
  
  Reply View | 0 replies
nightski 3 days ago

Even though the author refers to it as "non-trivial", and I can see why that conclusion is made, I would argue it is in fact trivial. There's very little domain specific knowledge needed, this is purely a technical exercise integrating with existing libraries for which there is ample documentation online. In addition, it is a relatively isolated feature in the app.
On top of that, it doesn't sound enjoyable. Anti slop sessions? Seriously?
Lastly, the largest problem I have with LLMs is that they are seemingly incapable of stopping to ask clarifying questions. This is because they do not have a true model of what is going on. Instead they truly are next token generators. A software engineer would never just slop out an entire feature based on the first discussion with a stakeholder and then expect the stakeholder to continuously refine their statement until the right thing is slopped out. That's just not how it works and it makes very little sense.

Reply View | 25 replies
- simonw 3 days ago
  
  The hardest problem in computer science in 2025 is presenting an example of AI-assisted programming that somebody won't call "trivial".
  
  Reply View | 3 replies
  
  nightski 3 days ago
  
  If all I did was call it trivial that would be a fair critique. But it was followed up with a lot more justification than that.
  
  Reply View | 2 replies
- kannanvijayan 3 days ago
  
  I've wondered about exposing this "asking clarifying questions" as a tool the AI could use. I'm not building AI tooling so I haven't done this - but what if you added an MCP endpoint whose description was "treat this endpoint as an oracle that will answer questions and clarify intent where necessary" (paraphrased), and have that tool just wire back to a user prompt.
  If asking clarifying questions is plausible output text for LLMs, this may work effectively.
  
  Reply View | 9 replies
  
  simonw 3 days ago
  
  I think the asking clarifying questions thing is solved already. Tell a coding agent to "ask clarifying questions" and watch what it does!
  
  Reply View | 8 replies
- antonvs 3 days ago
  
  > A software engineer would never just slop out an entire feature based on the first discussion with a stakeholder and then expect the stakeholder to continuously refine their statement until the right thing is slopped out. That's just not how it works and it makes very little sense.
  Didn’t you just describe Agile?
  
  Reply View | 10 replies
  
  [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
  
  Retric 3 days ago
  
  Who hurt you?
  Sorry couldn’t resist. Agile’s point was getting feedback during the process rather than after something is complete enough to be shipped thus minimizing risk and avoiding wasted effort.
  Instead people are splitting up major projects into tiny shippable features and calling that agile while missing the point.
  
  Reply View | 8 replies

causal 3 days ago

Using LLMs for coding complex projects at scale over a long time is really challenging! This is partly because defining requirements alone is much more challenging than most people want to believe. LLMs accelerate any move in the wrong direction.

Reply View 4 replies

dexwiz 3 days ago

My analogy is LLMs are a gas pedal. Makes you go fast, but you still have to know when to turn.

Reply View | 1 reply
- sreekanth850 3 days ago
  
  True
  
  Reply View | 0 replies
SteveJS 3 days ago

Having the llm write the spec/workunit from a conversation works well. Exploring a problem space with a (good) coding agent is fantastic.
However for complex projects IMO one must read what was written by the llm … every actual word.
When it ‘got away’ from me, in each case I left something in the llm written markdown that I should have removed.
99% “I can ask for that later” and 1% “that’s a good idea i hadn’t considered” might be the right ratio when reading an llm generated plan/spec/workunit.
Breaking work into single context passes … 50-60k tokens in sonnet 4.5 has had typically fantastic results for me.
My side project is using lean 4 and a carelessly left in ‘validate’ rather than ‘verify’ lead down a hilariously complicated path equivalent to matching an output against a known string.
I recovered, but it wasn’t obvious to me that was happening. I however would not be able to write lean proofs myself, so diagnosing the problem and fixing it is a small price to be able to mechanically verify part of my software is correct.

Reply View | 0 replies
sreekanth850 3 days ago

One should know theend to end design and architecture. Should stop llm when adding complex fancy things.

Reply View | 0 replies

khaledh 3 days ago

Agreed. The methodology needed here is something like an A/B test, with quantifiable metrics that demonstrate the effectiveness of the tool. And to do it not just once, but many times under different scenarios so that it demonstrates statistical significance.

The most challenging part when working with coding agents is that they seem to do well initially on a small code base with low complexity. Once the codebase gets bigger with lots of non-trivial connections and patterns, they almost always experience tunnel vision when asked to do anything non-trivial, leading to increased tech debt.

Reply View 13 replies

mwigdahl 3 days ago

The problem is that you're talking about a multistep process where each step beyond the first depends on the particular path the agent starts down, along with human input that's going to vary at each step.
I made a crude first stab at an approach that at least uses similar steps and structure to compare the effectiveness of AI agents. My approach was used on a small toy problem, but one that was complex enough the agents couldn't one-shot and required error correction.
It was enough to show significant differences, but scaling this to larger projects and multiple runs would be pretty difficult.
https://mattwigdahl.substack.com/p/claude-code-vs-codex-cli-...

Reply View | 11 replies
- potatolicious 3 days ago
  
  What you're getting at is the heart of the problem with the LLM hype train though, isn't it?
  "We should have rigorous evaluations of whether or not [thing] works." seems like an incredibly obvious thought.
  But in the realm of LLM-enabled use cases they're also expensive. You'd need to recruit dozens, perhaps even hundreds of developers to do this, with extensive observation and rating of the results.
  So rather than actually try to measure the efficacy, we just get blog posts with cherry-picked example of "LLM does something cool". Everything is just anecdata.
  This is also the biggest barrier to actual LLM adoption for many, many applications. The gap between "it does something REALLY IMPRESSIVE 40% of the time and shits the bed otherwise" and "production system" is a yawning chasm.
  
  Reply View | 10 replies
  
  marcosdumay 3 days ago
  
  It's the heart of the problem with all software engineer research. That's why we have so little reliable knowledge.
  It applies to using LLMs too. I guess the one largest difference here is that LLM has few enough companies with abundant enough money pushing it to make it trivial for them to run a test like this. So the fact that they aren't doing that also says a lot.
  
  Reply View | 0 replies
  
  oblio 3 days ago
  
  > What you're getting at is the heart of the problem with the LLM hype train though, isn't it?
  > "We should have rigorous evaluations of whether or not [thing] works." seems like an incredibly obvious thought.
  Heh, I'd rephrase the first part to:
  > What you're getting at is the heart of the problem with software development though, isn't it?
  
  Reply View | 0 replies
  
  simonw 3 days ago
  
  The UK government ran a study with thousands of developers quite recently: https://www.gov.uk/government/publications/ai-coding-assista...
  
  Reply View | 5 replies
  
  troupo 3 days ago
  
  Before you get into the expensive part, how do you get past "non-deterministic blackbox with unknown layers in between imposed by vendors"
  
  Reply View | 1 reply
  
  potatolicious 3 days ago
  
  You can measure probabilistic systems that you can't examine! I don't want to throw the baby out with the bathwater here - before LLMs became the all-encompassing elephant in the room we did this routinely.
  You absolutely can quantify the results of a chaotic black box, in the same way you can quantify the bias of a loaded die without examining its molecular structure.
  
  Reply View | 0 replies
claytongulick 3 days ago

> The methodology needed here is something like an A/B test, with quantifiable metrics that demonstrate the effectiveness of the tool. And to do it not just once, but many times under different scenarios so that it demonstrates statistical significance.
If that's what we need to do, don't we already have the answer to the question?

Reply View | 0 replies

spankibalt 3 days ago

> "Maybe I'm a curmudgeon but most of these types of blogs feel like marketing pieces with the important bit is that so much is left unsaid and not shown, that it comes off like a kid trying to hype up their own work without the benefit of nuance or depth."

C'mon, such self-congratulatory "Look at My Potency: How I'm using Nicknack.exe" fluffies always were and always will be a staple of the IT industry.

Reply View 1 reply

lcnPylGDnU4H9OF 3 days ago

Still, the best such pieces are detailed and explanatory.

Reply View | 0 replies

danielmarkbruce 3 days ago

Why not just use claude code and come to your own conclusion?

Reply View 0 replies

coolKid721 3 days ago

Yeah I was reading this seeing if there was something he'd actually show that would be useful, what pain point he is solving, but it's just slop.

Reply View 0 replies