Comment by godelski

Comment by godelski 2 days ago

2 replies

  > rewarding people for the volume ... rather than the quality.
I suspect this is a major part of the appeal of LLMs themselves. They produce lines very fast so it appears as if work is being done fast. But that's very hard to know because number of lines is actually a zero signal in code quality or even a commit. Which it's a bit insane already that we use number of lines and commits as measures in the first place. They're trivial to hack. You even just reward that annoying dude who keeps changing the file so the diff is the entire file and not the 3 lines they edited...

I've been thinking we're living in "Goodhart's Hell". Where metric hacking has become the intent. That we've decided metrics are all that matter and are perfectly aligned with our goals.

But hey, who am I to critique. I'm just a math nerd. I don't run a multi trillion dollar business that lays off tons of workers because the current ones are so productive due to AI that they created one of the largest outages in history of their platform (and you don't even know which of the two I'm referencing!). Maybe when I run a multi trillion dollar business I'll have the right to an opinion about data.

slashdave 2 days ago

I think you will discover that few organizations use the size or number of edits as a metric of effort. Instead, you might be judged by some measure of productivity (such as resolving issues). Fortunately, language agents are actually useful at coding, when applied judiciously.

  • godelski a day ago

    Yet it's common enough we see. You also bring up a 10x engineer joke. There's two types of 10x engineers: those that do 10x the work and those who solve 10x the jira tickets but are the cause of 100x of them.

    The point is that people metric hack and very bureaucratic structures tend to incentivize metric hacking, not dissuade them. See Pournelle's Iron Law of Bureaucracy.

      > Fortunately, language agents are actually useful at coding, when applied judiciously.
    
    I'm not sure this is in doubt by anyone. By definition it really must be true. The problem is that they're not being used judiciously but haphazardly. The problem is people in large organizations are more concerned with politics than the product they make.

    If you cannot see how quality is decreasing then I'm not sure what to tell you. Yes, there are metrics where it's getting better but at the same time user frustration is increasing. AWS and Azure just had recent major outages. Cloudstrike took down lots of the world's network over an avoidable mistake. Microsoft is fumbling the windows upgrade. Apple intelligence was a disaster. YouTube search is beyond infuriating. Google search is so bad we turn to LLMs now. These are major issues and obvious. We don't even have the time to talk about the million minor issues like YouTube captions covering captions embedded in the video, which is not a majorly complicated problem to solve with AI and they're instead pushing AI upscale that is getting a lot of backlash.

    So you can claim things are being used judiciously all you want, but I'm not convinced when looking at the results. I'm not happy that every device I use is buggy as shit and simultaneously getting harder to fix myself.