Comment by flohofwoe

Comment by flohofwoe a day ago

57 replies

This is exactly what I'd want from an 'AI coding companion'.

Don't write or fix the code for me (thanks but I can manage that on my own with much less hassle), but instead tell me which places in the code look suspicious and where I need to have a closer look.

When I ask Claude to find bugs in my 20kloc C library it more or less just splits the file(s) into smaller chunks and greps for specific code patterns and in the end just gives me a list of my own FIXME comments (lol), which tbh is quite underwhelming - a simple bash script could do that too.

ChatGPT is even less useful since it basically just spend a lot of time to tell me 'everything looking great yay good job high-five!'.

So far, traditional static code analysis has been much more helpful in finding actual bugs, but static analysis being clean doesn't mean there are no logic bugs, and this is exactly where LLMs should be able to shine.

If getting more useful potential-bugs-information from LLMs requires an extensively customized setup then the whole idea is getting much less useful - it's a similar situation to how static code analysis isn't used if it requires extensive setup or manual build-system integration instead of just being a button or menu item in the IDE or enabled by default for each build.

Sharlin a day ago

This is a point I see discussed surprisingly little. Given that many (most?) programmers like designing and writing code (excluding boilerplate), and not particularly enjoy reviewing code, it certainly feels backwards to make the AI write the code and relegate the programmer to reviewing it. (I know, of course, that the whole thing is being sold to stakeholders as "LoC machine goes brrrr" – code review? what's that?)

  • SAI_Peregrinus 21 hours ago

    Creativity is fun. AIs automate that away. I want an AI that can do my laundry, fold it, and put it away. I don't need an AI to write code for me. I don't mind AI code review, it sometimes has a valid suggestion, and it's easy enough to ignore most of the rest of the time.

    • collingreen 20 hours ago

      I was thinking this again just yesterday. Do my laundry correctly and get it put away. Organized my storage. Clean the bathroom. Do the dishes. Catalog my pantry, give me recipes, and keep it correctly stocked. Maybe I'm just a simple creature but like, these are the obvious problems in my life I'll pay to have go away so why are we taking away the fun stuff instead?

      • ghiculescu 5 hours ago

        You can already pay to have all those issues go away.

    • diggan 21 hours ago

      > Creativity is fun. AIs automate that away.

      I've been developing with LLMs on my side for months/about a year now, and feels like it's allowing me to be more creative, not less. But I'm not doing any "vibe-coding", maybe that's why?

      The creative parts (for me) is coming up with the actual design of the software, and how it all fits together, what it should do and how, and I get to do that more than ever now.

      • victorbjorklund 5 hours ago

        Same. I think there are two types of devs. Those that love designing the individual building blocks and those that wanna stack the blocks together to make something new.

        At this point AI is best at the first thing and less good at the second. I like stacking blocks together. If I build a beautiful UI I don't enjoy writing the individual css code for every button but rather composing the big picture.

        Not saying either is better or worse. But I can imagine that the people that loves to build the individual blocks like AI less because it takes away something they enjoy. For me it just takes away a step I had to do to get to the composing of the big picture.

        • skydhash 4 hours ago

          The thing is, i love doing both. But there’s an actual rush of enjoyment when I finally figure one of the tenets of a system. It’s like solving a puzzle for me.

          After that, it’s all became routine work as easy as drinking water. You explain the problem and I can quicly find the solution. Using AI at this point would be like herding cats. I already know what code to write, having a handful being suggested is distracting. Like feeling a a tune, and someone playing another melody other than the one you know.

      • chrischen 7 hours ago

        Exactly. I loved doing novel implementations or abstractions… and the AI excels at the part where it modifies it slightly for different contexts… aka the boring stuff.

      • dingnuts 21 hours ago

        I'm still faster than the cheap bots.

        The creative part for me includes both the implementation and the design, because the implementation also matters. The bots get in the way.

        Maybe I would be faster if I paid for Claude Code. It's too expensive to evaluate.

        If you like your expensive AI autocomplete, fine. But I have not seen any demonstrable and maintainable productivity gains from it, and I find understanding my whole implementation faster, more fun, and that it produces better software.

        Maybe that will change, but people told me three years ago that we would be at the point today where I could not outdo the bot;

        with all due respect, I am John Henry and I am still swinging my hammer. The steam pile driving machine is still too unpredictable!

    • nl 2 hours ago

      > Creativity is fun. AIs automate that away.

      This is the complete opposite of my experiences with using AI Coding tools heavily

    • JeremyHerrman 17 hours ago

      depends on what abstraction level you enjoy being creative at.

      Some people like creative coding, others like being creative with apps and features without much care to how it's implemented under the hood.

      I like both, but IMO there is a much larger crowd for higher level creativity, and in those cases AIs don't automate the creativity away, they enable it!

    • CuriouslyC 19 hours ago

      Is AI automating creativity away if you come up with an idea and have it actually implement it?

      • Sharlin 18 hours ago

        Yes, because ideas are not worth much if anything. If you have an idea of a book, or a painting, and have someone else implement it, you have not done creative work. Literally, you have not created the work, brought it to existence. The creator has done the creativity.

    • grantWilliams 21 hours ago

      Most software is developer tools and frameworks to manage electrical state in machines.

      Such state management messes use up a lot of resources to copy around.

      As an EE working in QA future chips with a goal of compressing away developer syntax art to preserve the least amount of state management possible to achieve maximum utility; sorry self selecting biology of SWEs, but also not sorry.

      Above all this is capitalism not honorific obligationism. If hardware engineers can claim more of the tech economy for our shareholders, we must.

      There are plenty of other creative outlets that are much less resource intensive. Rich first world programmers are a small subset of the population and can branch out then and explore life rather than believe everyone else has an obligation to conserve the personal story of a generation of future dead.

  • dylan604 18 hours ago

    To me, it's the natural result of gaining popularity that enough people have started to use after the hype train rolled through and are now giving honest feedback. Real honest feedback can feel like a slap in the face when all you have had is overwhelming positive feedback from those aboard the hype train.

    The writing has been on the wall with so called hallucinations where LLMs just make stuff up that the hype was way out over its skiis. The examples of lawyers being fined for unchecked LLM outputs being presented as fact type of stories will continue to take the shine off and hopefully some of the raw gungho nature will slow down a bit.

    • zdragnar 15 hours ago

      I saw an article today from the BBC where travellers are using LLMs to plan their vacations and getting into trouble going places (sometimes dangerously remote ones) to visit landmarks that don't even exist:

      https://www.bbc.com/travel/article/20250926-the-perils-of-le...

      I'm mildly bearish on the human capacity to learn from its mistakes and have a feeling in my gut that we've taken a massive step backwards as civilization.

      • dylan604 15 hours ago

        I could almost understand a lawyer working late the night before a brief is due and just run out of time to review the output of the LLM. How do you not look up travel destinations before heading out? That's just something I can't wrap my head around in any way of trying to be kind and seeing the other side of something

      • alickz 14 hours ago

        People have blindly followed GPS routes into lakes and rivers, but that should hardly be a point against GPS

        With 8 billion people on the planet, you could write a "man bites dog" story about any invention popular enough

        "You never read about a plane that did not crash"

  • alexchantavy 21 hours ago

    There are a lot of good AI code reviewers out there where they learn project conventions based on prior PRs and make rules from them. I've found they definitely save time and catch things I would have missed - things like cubic.dev or greptile etc etc. Especially helpful for running an open source project where code quality can have high variance and as a maintainer you may feel hesitant to be direct with someone -- the machine has no feelings so it is what it is :)

  • jes5199 20 hours ago

    codex can actually do useful reviews on pull requests, as of the last few weeks

  • ratelimitsteve 19 hours ago

    honestly? this but zoom out. machines are supposed to do the grunt work so that people can spend their time being creative and doing intangible, satisfying things but we seem to have built machines to make art, music and literature in order to free ourselves up to stack bricks and shovel manure.

CharlesW 21 hours ago

> When I ask Claude to find bugs in my 20kloc C library it more or less just splits the file(s) into smaller chunks and greps for specific code patterns and in the end just gives me a list of my own FIXME comments (lol), which tbh is quite underwhelming - a simple bash script could do that too.

Here's a technique that often works well for me: When you get unexpectedly poor results, ask the LLM what it thinks an effective prompt would look like, e.g. "How would you prompt Claude Code to create a plan to effectively review code for logic bugs, ignoring things like FIXME and TODO comments?"

The resulting prompt is too long to quote, but you can see the raw result here: https://gist.github.com/CharlesWiltgen/ef21b97fd4ffc2f08560f...

From there, you can make any needed improvements, turn it into an agent, etc.

  • OGWhales 18 hours ago

    I've found this a really useful strategy in many situations when working with LLMS. It seems odd that it works, since one one think its ability to give a good reply to such a question means it already "understands" your intent in the first place, but that's just projecting human ability onto LLMS. I would guess this technique is similar to how reasoning modes seems to improve output quality, though I may misunderstand how reasoning modes work.

    • ako 18 hours ago

      Works for humans the same? Even if you know how to do a complex project, it helps to first document the approach, and then follow it.

  • einarfd 17 hours ago

    This is a great idea, and worth doing. An other option in Claude code, that can be worth trying, is the planning mode, which you start with ctrl+tab. Have it plan out what it's going to do, and keep iterating it, until the plan seems sound. Tbh. I wish I've found the planning mode earlier, it's been such a great help.

  • alickz 14 hours ago

    I have also had some success with this method

    I asked ChatGPT to analyze its weaknesses and give me a pre-prompt to best help mitigate them and it gave me this: https://pastebin.com/raw/yU87FCKp

    I've found it very useful to avoid sycophancy and increase skepticism / precision in the replies it gives me

fhd2 2 hours ago

My thoughts exactly. So many actually useful tools could be built on top of LLMs, but most of the resources go into the no code space.

I get it though, non programmers or weak programmers don't scrutinise the results and are more likely to be happy to pay. Still, bit of a shame.

Maybe these tools exist, but at least to me, they don't surface among all the noise.

walthamstow a day ago

Cursor BugBot is pretty good for this, we did the free trial and it was so popular with our devs that we ended up keeping it. Occasional false positives aside, it's very useful. It saves time for both the PR submitter and the reviewer.

flaviolivolsi a day ago

I found GPT-5 to be very much less sycophantic than other models when it comes to this stuff, so your mention of 'everything looking great yay good job high-five' surprises me. Using it via Codex CLI it often questions things. Gemini 2.5 Pro is also good on this.

KronisLV 8 hours ago

> When I ask Claude to find bugs in my 20kloc C library it more or less just splits the file(s) into smaller chunks and greps for specific code patterns and in the end just gives me a list of my own FIXME comments (lol), which tbh is quite underwhelming - a simple bash script could do that too.

I explicitly asked it to read all the code (within Cline) and it did so, gave me a dozen action items by the end of it, on a Django project. Most were a bit nitpicky, but two or three issues were more serious. I found it pretty useful!

freedomben 17 hours ago

I've had reasonably good success with asking Claude things like: "There's a bug somewhere that is causing slow response times on several endpoints, including <xyz>. Sometimes response times can get to several seconds long, and don't look correlated with CPU or memory usage. Database CPU and memory also don't seem to correlate. What is the issue?" I have to iterate a few times but it's hinted me a few really tricky issues that would have probably taken hours to find.

Definitely optimistic for this way to use AI

mhitza 21 hours ago

In an application I'm working on, I use gpt-oss-20B. In a prompt I dump in the OWASP Top 10 web vulnerabilities, and a note that it should only comment on "definitive vulnerabilities". Has been pretty effective in finding vulnerabilities in the code I write (and it's one of the poorest-rated models if you look at some comments).

Where I still need to extend this, is to introduce function calling in the flow, when "it has doubts" during reasoning, would be the right time to call out a tool that would expand the context its working with (pull in other files, etc).

  • diggan 21 hours ago

    > (and it's one of the poorest-rated models if you look at some comments).

    Yeah, don't listen to "wisdom of the crowd" when it comes to LLM models, there seems to be a ton of fud going on, especially on subreddits.

    GPT-OSS was piled on for being dumb in the first week of release, yet none of the software properly supported it at launch. As soon as it was working properly in llama.cpp, it was clear how strong the model was, but at that point the popular sentiments seems to have spread and solidified.

  • airstrike 21 hours ago

    Tool calling is the best lever for getting value out of LLMs

vidarh a day ago

I've "worked" with Claude Code to find a long standing set of complex bugs over the last couple of days, and it can do so much more. It's come up with hypotheses, tested them, used gdb in batch mode when the hypotheses failed in order to trace what happened at the assembly level, and compared with the asm dump of the code in question.

It still needs guidance, but it quashed bugs yesterday that I've previously spent many days on without finding a solution for.

It can be tricky, but they definitely can be significant aid for even very complex bugs.

trenchpilgrim a day ago

I use Zed's "Ask" mode for this all the time. It's a read only mode where the LLM focuses on figuring out the codebase instead of modifying it. You can toggle it freely mid conversation.

llleeeooo 20 hours ago

Indeed, in many machine learning models, classification is always easier than generation. Maybe that's consistent with chatgpts intelligence level

notatoad 20 hours ago

i've had great success with both chatGPT and claude with the prompt "tell me how this sucks" or "why is this shit". being a bit more crass seems to bump it out of the sycophantic mode, and being more open-ended in the type of problems you want it to find seems to yield better results.

but i've been limiting it to a lot less than 20k LoC, i'm sticking with stuff i can just paste into the chat window.

simonw a day ago

Suggestion: run a regex to remove those FIXME comments first, then try the experiment again.

I often use Claude/GPT-5/etc to analyze existing repositories while deliberately omitting the tests and documentation folders because I don't want them to influence the answers I'm getting about the code - because if I'm asking a question it's likely the documentation has failed to answer it already!