Comment by cheema33

Comment by cheema33 2 days ago

14 replies

This is how you do things if you are new to this game.

Get two other, different, LLMs to thoroughly review the code. If you don’t have an automated way to do all of this, you will struggle and eventually put yourself out of a job.

If you do use this approach, you will get code that is better than what most software devs put out. And that gives you a good base to work with if you need to add polish to it.

overgard a day ago

I actually have used other LLMs to review the code, in the past (not today, but in the past). It's fine, but it doesn't tend to catch things like "this technically works but it's loading a footgun." For example, the redux test I was mentioning in my original post, the tests were reusing a single global store variable. It technically worked, the tests ran, and since these were the first tests I introduced in the code base there weren't any issues even though this made the tests non deterministic... but, it was a pattern that was easily going to break down the line.

To me, the solution isn't "more AI", it's "how do I use AI in a way that doesn't screw me over a few weeks/months down the line", and for me that's by making sure I understand the code it generated and trim out the things that are bad/excessive. If it's generating things I don't understand, then I need to understand them, because I have to debug it at some point.

Also, in this case it was just some unit tests, so who cares, but if this was a service that was publicly exposed on the web? I would definitely want to make sure I had a human in the loop for anything security related, and I would ABSOLUTELY want to make sure I understood it if it were handling user data.

  • cstejerean a day ago

    how long ago was this past? A review with latest models should absolutely catch the issue you describe, in my experience.

    • t_mahmood a day ago

      Ah, "It's work on my computer" edition of LLM.

    • overgard 16 hours ago

      December. Previous job had cursor and copilot automatically reviewing PRs.

timcobb a day ago

> you will struggle and eventually put yourself out of a job.

We can have a discussion without the stakes being so high.

summerlight a day ago

The quality of generated code does not matter. The problem is when it breaks 2 AM and you're burning thousands of dollars every minutes. You don't own the code that you don't understand, but unfortunately that does not mean you don't own the responsibility as well. Good luck on writing the postmortem, your boss will have lots of question for you.

  • icedchai 19 hours ago

    Frequently the boss is encouraging use of AI for efficiency without understanding the implications.

    And we'll just have the AI write the postmortem, so no big deal there. ;)

  • charcircuit a day ago

    AI can help you understand code faster than without AI. It allows me to investigate problems that I have little context in and be able to write fixes effectively.

lelanthran a day ago

> If you do use this approach, you will get code that is better than what most software devs put out. And that gives you a good base to work with if you need to add polish to it.

If you do use this approach, you'll find that it will descend into a recursive madness. Due to the way these models are trained, they are never going to look at the output of two other models and go "Yeah, this is fine as it is; don't change a thing".

Before you know it you're going to have change amplification, where a tiny change by one model triggers other models (or even itself) to make other changes, which triggers further changes, etc ad nauseum.

The easy part is getting the models to spit out working code. The hard part is getting it to stop.

3kkdd a day ago

Im sick and tired of these empty posts.

SHOW AN EXAMPLE OF YOU ACTUALLY DOING WHAT YOU SAY!

  • alt187 a day ago

    There's no example because OP has never done this, and never will. People lie on the internet.

    • timcobb a day ago

      I've never done this because i haven't felt compelled to do this because I want to review my own code but I imagine this works okay and isn't hard to set up by asking Claude to set this up for you...

    • senordevnyc 12 hours ago

      What? People do this all the time. Sometimes manually by invoking another agent with a different model and asking it to review the changes against the original spec. I just setup some reviewer / verifier sub agents in Cursor that I can invoke with a slash command. I use Opus 4.5 as my daily driver, but I have reviewer subagents running Gemini 3 Pro and GPT-5.2-codex and they each review the plan as well, and then the final implementation against the plan. Both sometimes identify issues, and Opus then integrates that feedback.

      It’s not perfect so I still review the code myself, but it helps decrease the number of defects I have to then have the AI correct.

  • Foreignborn a day ago

    these two posts (the parent and then the OP) seem equally empty?

    by level of compute spend, it might look like:

    - ask an LLM in the same query/thread to write code AND tests (not good)

    - ask the LLM in different threads (meh)

    - ask the LLM in a separate thread to critique said tests (too brittle, testing guidelines, testing implementation and not out behavior, etc). fix those. (decent)

    - ask the LLM to spawn multiple agents to review the code and tests. Fix those. Spawn agents to critique again. Fix again.

    - Do the same as above, but spawn agents from different families (so Claude calls Gemini and Codex).

    —-

    these are usually set up as /slash commands like /tests or /review so you aren’t doing this manually. since this can take some time, people might work on multiple features at once.