Comment by systemf_omega

Comment by systemf_omega 3 days ago

21 replies

> B2B SaaS

Perhaps that's part of it.

People here work on all kinds of industries. Some of us are implementing JIT compilers, mission-critical embedded systems or distributed databases. In code bases like this you can't just wing it without breaking a million things, so LLM agents tend to perform really poorly.

sunrunner 3 days ago

> People here work on all kinds of industries.

Yes, it would be nice to have a lot more context (pun intended) when people post how many LoC they introduced.

B2B SaaS? Then can I assume that a browser is involved and that a big part of that 200k LoC is the verbose styling DSL we all use? On the other hand, Nginx, a production-grade web server, is 250k LoC (251,232 to be exact [1]). These two things are not comparable.

The point being that, as I'm sure we all agree, LoC is not a helpful metric for comparison without more context, and different projects have vastly different amounts of information/feature density per LoC.

[1] https://openhub.net/p/nginx

  • Fr0styMatt88 2 days ago

    I primarily work in C# during the day but have been messing around with simple Android TV dev on occasion at night.

    I’ve been blown away sometimes at what Copilot puts out in the context of C#, but using ChatGPT (paid) to get me started on an Android app - totally different experience.

    Stuff like giving me code that’s using a mix of different APIs and sometimes just totally non-existent methods.

    With Copilot I find sometimes it’s brilliant but it’s so random as to when that will be it seems.

    • motorest 2 days ago

      > Stuff like giving me code that’s using a mix of different APIs and sometimes just totally non-existent methods.

      That has been my experience as well. We can control the surprising pick of APIs with basic prompt files that clarify what and how to use in your project. However, when using less-than-popular tools whose source code is not available, the hallucinations are unbearable and a complete waste of time.

      The lesson to be learned is that LLMs depend heavily on their training set, and in a simplistic way they at best only interpolate between the data they were fed. If a LLM is not trained with a corpus covering a specific domain them you can't expect usable results from it.

      This brings up some unintended consequences. Companies like Microsoft will be able to create incentives to use their tech stack by training their LLMs with a very thorough and complete corpus on how to use their technologies. If Copilot does miracles outputting .NET whereas Java is unusable, developers have one more reason to adopt .NET to lower their cost of delivering and maintaining software.

  • godelski 2 days ago

      > when people post how many LoC they introduced.
    
    Pretty ironic you and the GP talk about lines of code.

    From the article:

      Garman is also not keen on another idea about AI – measuring its value by what percentage of code it contributes at an organization.
    
      “It’s a silly metric,” he said, because while organizations can use AI to write “infinitely more lines of code” it could be bad code.
    
      “Often times fewer lines of code is way better than more lines of code,” he observed. “So I'm never really sure why that's the exciting metric that people like to brag about.”
    
    I'm with Garman here. There's no clean metric for how productive someone is when writing code. At best, this metric is naive, but usually it is just idiotic.

    Bureaucrats love LoC, commits, and/or Jira tickets because they are easy to measure but here's the truth: to measure the quality of code you have to be capable of producing said code at (approximately) said quality or better. Data isn't just "data" that you can treat as a black box and throw in algorithms. Data requires interpretation and there's no "one size fits all" solution. Data is nothing without its context. It is always biased and if you avoid nuance you'll quickly convince yourself of falsehoods. Even with expertise it is easy to convince yourself of falsehoods. Without expertise it is hopeless. Just go look at Reddit or any corner of the internet where there's armchair experts confidently talking about things they know nothing about. It is always void of nuance and vastly oversimplified. But humans love simplicity. You need to recognize our own biases.

    • sunrunner 2 days ago

      > Pretty ironic you and the GP talk about lines of code.

      I was responding specifically to the comment I replied to, not the article, and mentioning LoC as a specific example of things that don't make sense to compare.

      • godelski 2 days ago

          > the comment I replied to
        
        Which was the "GP", or "grand parent" (your comment is the parent of my comment), that I was referring to.
    • darkwater 2 days ago

      > Bureaucrats love LoC

      Looks like vibe-coders love them too, now.

      • overfeed 2 days ago

        ...but you repeat yourself (c:

        • godelski 2 days ago

          Made me think of a post from a few days ago where Pournelle's Iron Law of Bureaucracy was mentioned[0]. I think vibe coders are the second group. "dedicated to the organization itself" as opposed to "devoted to the goals of the organization". They frame it as "get things done" but really, who is not trying to get things done? It's about what is getting done and to what degree is considered "good enough."

          [0] https://news.ycombinator.com/item?id=44937893

drusepth 3 days ago

On the other hand, fault-intolerant codebases are also often highly defined and almost always have rigorous automated tests already, which are two contexts where coding agents specifically excel in.

JambalayaJimbo 3 days ago

I work on brain dead crud apps much of my time and get nothing from LLMs.

  • benjaminwootton 2 days ago

    Try Claude Code. You’ll literally be able to automate 90% of the coding part of your job.

    • dns_snek 2 days ago

      We really need to add some kind of risk to people making these claims to make it more interesting. I listened to the type of advice you're giving here on more occasions than I can remember, at least once for every major revision of every major LLM and always walked away frustrated because it hindered me more than it helped.

      > This is actually amazing now, just use [insert ChatGPT, GPT-4, 4.5, 5, o1, o3, Deepseek, Claude 3.5, 3.9, Gemini 1, 1.5, 2, ...] it's completely different from Model(n-1) you've tried.

      I'm not some mythical 140 IQ 10x developer and my work isn't exceptional so this shouldn't happen.

      • ramesh31 2 days ago

        The dark secret no one from the big providers wants to admit is that Claude is the only viable coding model. Everything else descends into a mess of verbose spaghetti full of hallucinations pretty quickly. Claude is head and shoulders above the rest and it isn't even remotely close, regardless of what any benchmark says.

    • delta_p_delta_x 2 days ago

      I've been working on macOS and Windows drivers. Can't help but disagree.

      Because of the absolute dearth of high-quality open-source driver code and the huge proliferation of absolutely bottom-barrel general-purpose C and C++, the result is... Not good.

      On the other hand, I asked Claude to convert an existing, short-ish Bash script to idiomatic PowerShell with proper cmdlet-style argument parsing, and it returned a decent result that I barely had to modify or iterate on. I was quite impressed.

      Garbage in, garbage out. I'm not altogether dismissive of AI and LLMs but it is really necessary to know where and what their limits are.

      • Sharlin 2 days ago

        I'm pretty sure the GP referred to GGP's "brain dead CRUD apps" when they talked about automating 90% of the work.

  • murukesh_s 2 days ago

    I found the opposite - I am able to get 50% improvement in productivity for day to day coding (mix of backend, frontend), mostly in Javascript but have helped in other languages. But you have to carefully review though - and have extremely well written test cases if you have to blindly generate or replace existing code.

motorest 2 days ago

> In code bases like this you can't just wing it without breaking a million things, so LLM agents tend to perform really poorly.

This is a false premise. LLMs themselves don't force you to introduce breaking changes into your code.

In fact, the inception of coding agents was lauded as a major improvement to the developer experience because they allow the LLMs themselves to automatically react to feedback from test suites, thus speeding up how code was implemented while preventing regressions.

If tweaking your code can result in breaking a million things, this is a problem with your code and how you worked to make it resilient. LLMs are only able to introduce regressions if your automated tests are unable to catch any of these million of things breaking. If this is the case then your problems are far greater than LLMs existing, and at best LLMs only point out the elephant in the room.