Comment by majormajor

Comment by majormajor 2 days ago

They're quite good at algorithm bugs, a lot less good at concurrency bugs, IME. Which is very valuable still, just that's where I've seen the limits so far.

Their also better at making tests for algorithmic things than for concurrency situations, but can get pretty close. Just usually don't have great out-of-the-box ideas for "how to ensure these two different things run in the desired order."

Everything that I dislike about generating non-greenfield code with LLMs isn't relevant to the "make tests" or "debug something" usage. (Weird/bad choices about when to duplicate code vs refactor things, lack of awareness around desired "shape" of codebase for long-term maintainability, limited depth of search for impact/related existing stuff sometimes, running off the rails and doing almost-but-not-quite stuff that ends up entirely the wrong thing.)

bongodongobob 2 days ago

Well if you know it's wrong, tell it, and why. I don't get the expectation for one shotting everything 100% of the time. It's no different than bouncing ideas off a colleague.

Reply View 6 replies

majormajor 2 days ago

I don't care about one-shotting; the stuff it's bad for debugging at is the stuff where even when you tell it "that's not it" it just makes up another plausible-but-wrong idea.
For code modifications in a large codebase the problem with multi-shot is that it doesn't take too many iterations before I've spent more time on it. At least for tasks where I'm trying to be lazy or save time.

Reply View | 1 reply
- Klathmon 2 days ago
  
  > For code modifications in a large codebase the problem with multi-shot is that it doesn't take too many iterations before I've spent more time on it.
  I've found voice input to completely change the balance there.
  For stuff that isn't urgent, I can just fire off a hosted codex job by saying what I want done out loud. It's not super often that it completely nails it, but it almost always helps give me some info on where the relevant files might be and a first pass on the change.
  Plus it has the nice side effect of being a todo list of quick stuff that I didn't want to get distracted by while working on something else, and often helps me gather my thoughts on a topic.
  It's turned out to be a shockingly good workflow for me
  
  Reply View | 0 replies
nicklaf 2 days ago

It's painfully apparent when you've reached the limitations of an LLM to solve a problem it's ill-suited for (like a concurrency bug), because it will just keep spitting out non-sense, eventually going in circles or going totally off the rails.

Reply View | 1 reply
- solumunus a day ago
  
  And then one jumps in and solves the problem themself, like they’ve done for their entire career. Or maybe one hasn’t done that, and that’s who we hear complain so much? I’m not talking about you specifically, just in general.
  
  Reply View | 0 replies
ewoodrich 2 days ago

The weak points raised by the parent comment are specifically examples where the problem exists outside the model's "peripheral vision" from its context window and speaking from personal experience, aren't as simple as as adding a line to the CLAUDE.md saying "do this / don't do this".
I agree that the popular "one shot at all costs / end the chat at the first whiff of a mistake" advice is much too reductive but unlike a colleague, after putting in all that effort into developing a shared mental model of the desired outcome you reach the max context and then all that nuanced understanding instantly evaporates. You then have to hope the lossy compression into text instructions will actually steer it where you want next time but from experience that unfortunately is far from certain.

Reply View | 0 replies
hitarpetar 18 hours ago

except it's not a colleague, it's not capable of ideation, it's taking your words and generating new ones based on them. which can maybe be useful sometimes but, yeah, not really the same as bouncing ideas off a colleague

Reply View | 0 replies