Comment by xmprt

Comment by xmprt 6 days ago

10 replies

> Using apis I am familiar with but don't have memorized

I think you have to be careful here even with a typed language. For example, I generated some Go code recently which execed a shell command and got the output. The generated code used CombinedOutput which is easier to used but doesn't do proper error handling. Everything ran fine until I tested a few error cases and then realized the problem. In other times I asked the agent to write tests cases too and while it scaffolded code to handle error cases, it didn't actually write any tests cases to exercise that - so if you were only doing a cursory review, you would think it was properly tested when in reality it wasn't.

tptacek 6 days ago

You always have to be careful. But worth calling out that using CombinedOutput() like that is also a common flaw in human code.

  • dingnuts 6 days ago

    The difference is that humans learn. I got bit by this behavior of CombinedOutput once ten years ago, and no longer make this mistake.

    • csallen 6 days ago

      This applies to AI, too, albeit in different ways:

      1. You can iteratively improve the rules and prompts you give to the AI when coding. I do this a lot. My process is constantly improving, and the AI makes fewer mistakes as a result.

      2. AI models get smarter. Just in the past few months, the LLMs I use to code are making significantly fewer mistakes than they were.

      • th0ma5 6 days ago

        That you don't know when it will make a mistake and that it is getting harder to find them are not exactly encouraging signs to me.

      • gf000 5 days ago

        But my gripe with your first point is that by the time I write an exact detailed step-by-step prompt for them, I could have written the code by hand. Like there is a reason we are not using fuzzy human language in math/coding, it is ambiguous. I always feel like doing those funny videos where you have to write exact instructions on how to make a peanut butter sandwich, getting deliberately misinterpreted. Except it is not fun at all when you are the one writing the instructions.

        2. It's very questionable that they will get any smarter, we have hit the plateau of diminishing returns. They will get more optimized, we can run them more times with more context (e.g. chain of thought), but they fundamentally won't get better at reasoning.

      • kasey_junk 6 days ago

        And you can build automatic checks that reinforce correct behavior for when the lessons haven’t been learned, by bot or human.