Comment by vmg12
Comment by vmg12 6 days ago
Here are the cases where it helps me (I promise this isn't ai generated even though im using a list...)
- Formulaic code. It basically obviates the need for macros / code gen. The downside is that they are slower and you can't just update the macro and re-generate. The upside is it works for code that is slightly formulaic but has some slight differences across implementations that make macros impossible to use.
- Using apis I am familiar with but don't have memorized. It saves me the effort of doing the google search and scouring the docs. I use typed languages so if it hallucinates the type checker will catch it and I'll need to manually test and set up automated tests anyway so there are plenty of steps where I can catch it if it's doing something really wrong.
- Planning: I think this is actually a very under rated part of llms. If I need to make changes across 10+ files, it really helps to have the llm go through all the files and plan out the changes I'll need to make in a markdown doc. Sometimes the plan is good enough that with a few small tweaks I can tell the llm to just do it but even when it gets some things wrong it's useful for me to follow it partially while tweaking what it got wrong.
Edit: Also, one thing I really like about llm generated code is that it maintains the style / naming conventions of the code in the project. When I'm tired I often stop caring about that kind of thing.
> Using apis I am familiar with but don't have memorized
I think you have to be careful here even with a typed language. For example, I generated some Go code recently which execed a shell command and got the output. The generated code used CombinedOutput which is easier to used but doesn't do proper error handling. Everything ran fine until I tested a few error cases and then realized the problem. In other times I asked the agent to write tests cases too and while it scaffolded code to handle error cases, it didn't actually write any tests cases to exercise that - so if you were only doing a cursory review, you would think it was properly tested when in reality it wasn't.