Comment by Aurornis

Comment by Aurornis 4 days ago

8 replies

AI code reviews are the worst place to introduce AI, in my experience. They can find a few things quickly, but they can also send people down unnecessary paths or be easily persuaded by comments or even the slightest pushback from someone. They're fast to cave in and agree with any input.

It can also encourage laziness: If the AI reviewer didn't spot anything, it's easier to justify skimming the commit. Everyone says they won't do it, but it happens.

For anything AI related, having manual human review as the final step is key.

aozgaa 4 days ago

Agreed.

LLM’s are fundamentally text generators, not verifiers.

They might spot some typos and stylistic discrepancies based on their corpus, but they do not reason. It’s just not what the basic building blocks of the architecture do.

In my experience you need to do a lot of coaxing and setting up guardrails to keep them even roughly on track. (And maybe the LLM companies will build this into the products they sell, but it’s demonstrably not there today)

  • CharlesW 4 days ago

    > LLM’s are fundamentally text generators, not verifiers.

    In reality they work quite well for text and numeric (via tools) analysis, too. I've found them to be powerful tools for "linting" a codebase against adequately documented standards and architectural guidance, especially when given the use of type checkers, static analysis tools, etc.

    • skydhash 4 days ago

      The value of an analysis is the decision that will be taken after getting the result. So will you actually fix the codebase or it’s just a nice report to frame and put on the wall?

      • CharlesW 4 days ago

        > So will you actually fix the codebase…

        Code quality improvements is the reason to do it, so *yes*. Of course, anyone using AI for analysis is probably leveraging AI for the "fix" part too (or at least I am).

pnathan 4 days ago

That's a fantastic counterpoint. I've found AI reviewers to be useful on a first pass, at a small-pieces level. But I hear your opinion!

chuckadams 4 days ago

I find the summary that copilot generates is more useful than the review comments most of the time. That said, I have seen it make some good catches. It’s a matter of expectations: the AI is not going to have hurt feelings if you reject all its suggestions, so I feel even more free to reject it feedback with the briefest of dismissals.

moomoo11 3 days ago

What about something like this?

Link to the ticket. Hopefully your team cares enough to write good tickets.

So if the problem is defined well in the ticket, do the code changed actually address it?

For example for a bug fix. It can check the tests and see if the PR is testing the conditions that caused the bug. It can check the code changed to see if it fits the requirements.

I think the goal with AI for creative stuff should be to make things more efficient, not replace necessarily. Whoever code reviews can get up to speed fast. I’ve been on teams where people would code review a section of the code they aren’t familiar with too much.

In this case if it saves them 30 minutes then great!

kmacdough 3 days ago

I agree and disagree. I think it's important to make it very visually clear that it is not really a PR, but rather an advanced style checker. I think they can be very useful for assessing more rote/repetitive standards that are a bit beyond what standard linters/analysis can provide. Things like institutional standards, lessons learned, etc. But if it uses the normal PR pipeline rather than the checker pipeline, it gives the false impression that it is a PR, which is not.