Code review can be better
(tigerbeetle.com)387 points by sealeck 4 days ago
387 points by sealeck 4 days ago
> and that isn't something I ever encountered in the wild (in any formal sense)
Because in the software engineering world there is very little engineering involved.
That being said, I also think that the industry is unwilling to accept the slowliness of the proper engineering process for various reasons, including non criticality of most software and the possibility to amend bugs and errors on the fly.
Other engineering fields enjoy no such luxuries, the bridge either holds the train or it doesn't, you either nailed the manufacturing plant or there's little room for fixing, the plane's engine either works or not
Different stakes and patching opportunities lend to different practices.
Summoning Hillel Wayne....
There is plenty on large scale enterprise projects, but than whole that stuff is looked down by "real developers".
Also in many countries, to one call themselves Software Engineer, they actually have to hold a proper degree, from a certified university or professional college, validated by the countrie's engineering order.
Because naturally 5 year (or 3 per country) degree in Software Engineering is the same a six weeks bootcamp.
I never finished my degree, but I believe I'm a very good developer (my employere agree). In my times most good programmers were self-taught.
I don't mind (hypothetically) not being allowed to call myself "engineer", but I do mind false dichotomy of "5 year course" vs "six week bootcamp". In the IT world it's entirely possibly to learn everything yourself and learn it better than one-fits-all course ever could.
It still is engineering you only mistake design phase.
Writing code is the design phase.
You don’t need design phase for doing design.
Will drop link to relevant video later.
I see there has been a “spirited discussion” on this. We can get fairly emotionally invested into our approaches.
In my experience (and I have quite a bit of it, in some fairly significant contexts), “It Depends” is really where it’s at. I’ve learned to take an “heuristic” approach to software development.
I think of what I do as “engineering,” but not because of particular practices or educational credentials. Rather, it has to do with the Discipline and Structure of my approach, and a laser focus on the end result.
I have learned that things don’t have to be “set in stone,” but can be flexed and reshaped, to fit a particular context and development goal, and that goals can shift, as the project progresses.
When I have worked in large, multidisciplinary teams (like supporting hardware platforms), the project often looked a lot more “waterfall,” than when I have worked in very small teams (or alone), on pure software products. I’ve also seen small projects killed by overstructure, and large projects, killed, by too much flexibility. I’ve learned to be very skeptical of “hard and fast” rules that are applied everywhere.
Nowadays, I tend to work alone, or on small teams, achieving modest goals. My work is very flexible, and I often start coding early, with an extremely vague upfront design. Having something on the breadboard can make all the difference.
I’ve learned that everything that I write down, “ossifies” the process (which isn’t always a bad thing), so I avoid writing stuff down, if possible. It still needs to be tracked, though, so the structure of my code becomes the record.
Communication overhead is a big deal. Everything I have to tell someone else, or that they need to tell me, adds rigidity and overhead. In many cases, it can’t be avoided, but we can figure out ways to reduce the burden of this crucial component.
It’s complicated, but then, if it were easy, everyone would be doing it.
Googler, but opinions are my own.
I disagree. The design phase of a substantial change should be done beforehand with the help of a design doc. That forces you to put in writing (and in a way that is understandable by others) what you are envisioning. This exercise is really helpful in forcing you to think about alternatives, pitfalls, pros & cons, ... . This way, once stakeholders (your TL, other team members) agreed then the reviews related to that change become only code related (style, use this standard library function that does it, ... ) but the core idea is there.
I also read this series of blog posts recently where the author, Hillel Wayne, talked to several "traditional" engineers that had made the switch to software. He came to a similar conclusion and while I was previously on the fence of how much of what software developers do could be considered engineering, it convinced me that software engineer is a valid title and that what we do is engineering. First post here: https://www.hillelwayne.com/post/are-we-really-engineers/
This is the talk on real software engineering:
> Writing code is the design phase.
Rich Hickey agrees it's a part of it, yes. https://www.youtube.com/watch?v=c5QF2HjHLSE
> Writing code is the design phase.
No, it really isn't. I don't know which amateur operation you've been involved with, but that is really not how things work in the real world.
In companies that are not entirely dysfunctional, each significant change to the system's involve a design phase, which often includes reviews from stakeholders and involved parties such as security reviews and data protection reviews. These tend to happen before any code is even written. This doesn't rule out spikes, but their role is to verify and validate requirements and approaches, and allow new requirements to emerge to provide feedback to the actual design process.
The only place where cowboy coding has a place is in small refactoring, features and code fixes.
> Because in the software engineering world there is very little engineering involved.
I can count on one hand the number of times I've been given the time to do a planning period for something less than a "major" feature in the past few years. Oddly, the only time I was able to push good QA, testing, and development practices was at an engineering firm.
I work in a small team where we are essentially 4-6 core developers. When I develop a feature I usually talk about it with my coworkers once I made a rough draft in my head how I'd approach it. They do the same so our code reviews are mostly only the minor code smells etc. but we usually decide on the architecture together (2-3 people usually).
I find this to be one of the most important things in our team. Once people don't agree on code it all kinda goes downhill with nobody wanting to interact with code they didn't write for various reasons.
In bigger orgs I believe it's still doable this way as long as responsibilities are shared properly and it's not just 4 guys who know it all and 40 others depend on them.
> It always seems as if the code review is the only time when all stakeholders really gets involved and starts thinking about a change.
That is a problem with your organization, not with Git or any version control system. PRs are orthogonal to it.
If you drop by a PR without being aware of the ticket that made the PR happen and the whole discussion and decision process that led to the creation of said tickets, you are out of the loop.
Your complain is like a book publisher complaining that the printing process is flawed because seeing the printed book coming out of the production line is the only time when all stakeholders really get involved. Only if you work in a dysfunctional company.
I saw this in many places, so I read that original statement like a complaint about a widespread problem, not an exception in one company.
Sometimes is not even about a PR, it is about an entire project. I always do reviews (design and code, separate stages) for projects where code is almost complete when people come for design reviews and by the time we get to code reviews it is usually too late to fix problems other than showstoppers. I worked in small companies, huge companies (over 100k employees), some are better, most are bad, in my experience. YMMV, of course.
> Not that I know how to fix that. You can't have everyone in the entire company spend time looking at every possible thing that might be developed in the near future. Or can you?
You don't need to. I've seen this generally work with some mix of the following:
1. Try to decouple systems so that it's less likely for someone in a part of the org to make changes that negatively impact someone in a more distant part of the org.
2. Some design review process: can be formal "you will write a design doc, it will be reviewed and formally approved in a design committee" if you care more about integrity and less about speed, or can be "write a quick RFC document and share it to the relevant team(s)".
3. Some group of people that have broad context on the system/code-base (usually more senior or tenured engineers). Again, can be formal: "here is the design review committee" or less formal: "run it by these set of folks who know there stuff". If done well, I'd say you can get pretty broad coverage from a group like this. Definitely not "everyone in the entire company". That group can also redirect or pull others in.
4. Accept that the process will be a bit lossy. Not just because you may miss a reviewer, but also, because sometimes once you start implementing the reality of implementation is different than what people expect. You can design the process for this by encouraging POC or draft implementations or spikes, and set expectations that not all code is expected to make it into production (any creative process includes drafts, rewrites, etc that may not be part of the final form, but help explore the best final form).
I've basically seen this work pretty well at company sizes from 5 engineers all the way up to thousands.
Where I work we tend to write RFCs for fundamental design decisions. Deciding what counts as a "fundamental design decision" is sometimes self-moderated in the moment but we also account for it when making long term plans. For example when initially creating epics in Jira we might find it hard to flesh out as we don't really know how we're going to approach it, so we just start it off with a task to write an RFC.
These can be written either for just our team or for the eyes of all other software teams. In the latter case we put these forward as RFCs for discussion in a fortnightly meeting, which is announced well in advance so people can read them, leave comments beforehand, and only need to attend the meeting if there's an RFC of interest to them up for discussion.
This has gone pretty well for us! It can feel like a pain to write some of these, and at times I think we overuse them somewhat, but I much prefer our approach to any other place I've worked where we didn't have any sort of collaborative design process in place at all.
I view this process like this: code review is a communication tool: you can discuss concrete decisions vs hand waving and explaining in the conceptual space, which of course has its place, but is limited.
But writing the whole working code just to discuss some APIs is too much and will require extra work to change if problems are surfaced on review.
So a design document is something in the middle: it should draw a line where the picture of the planned change is as clear as possible and can be communicated with shareholders.
Other possible middle grounds include PRs that don’t pass all tests or that don’t even build at all. You just have to choose the most appropriate sequence of communication tools to come to agreements in the team and come to a point where the team is on the same page on all the decisions and how the final picture looks.
I share your feelings.
Regarding design reviews, we used to have them at my current job. However we stopped doing both formal design documents and design reviews in favor of prototyping and iterative design.
The issue with the design phase is that we often failed to account for some important details. We spent considerable time discussing things and, when implementing, realized that we omitted some important detail or insight. But since we already invested that much time in the design phase, it was tempting to take shortcuts.
What's more, design reviews were not conducted by the whole team, since it would be counter-productive to have 10-more people in the same room. So we'd still discover things during code reviews.
And then not everyone is good at/motivated to producing good design documents.
In the end, I believe that any development team above 5 people is bound to encounter these kinds of inefficiencies. The ideal setup is to put 5 people in the same room with the PO and close to a few key users.
> The ideal setup is to put 5 people in the same room with the PO and close to a few key users.
(I suspect you are aware, but just in case this is new to you.) This is essentially the core of Extreme Programming.
What team size does XP recommends, if any ?
It seems like the standard around me is between 8 to 12 people. This is too many in my opinion.
I believe this is because management is unknowingly aiming for the biggest team the does not completely halts instead of seeking a team that delivers the most bang for the buck.
One way to fix it: pair programming. You're getting feedback in real time as you write the code.
Unfortunately, the conditions where it works well can be difficult to set up. You need people who are into it and have similar schedules. And you don't want two people waiting for tests to run.
It's not for everyone. Some people have excellent reasons why it isn't workable for them. Others have had terrible experiences. It takes a great deal of practice to be a good pair and, if you don't start by working with an experienced pair, your memories of pairing are unlikely to be fond.
However.
I paired full-time, all day, at Pivotal, for 5 years. It was incredible. Truly amazing. The only time in my career when I really thrived. I miss it badly.
I did it at a startup for a few months. The startup failed, but I think it was more of a business failure.
Pivotal Labs was a contracting firm that did it for years. They aren’t around anymore, but they had a good run:
It's crazy that we go out of our way to design processes (code review, design review) to avoid actually just... working together? And then we design organizations that optimize for those processes instead of optimizing for collaboration.
With AI based coding (no, i won't use "Vibe coding", thank you) this workflow improves a lot. Instead of jumping straight into code, I have my engineers write a Notion doc that describes what needs to be built. Think of it like an LLD, but really it’s a prompt for Claude-code. This forces them to think through the problem at a low level, and they share the doc with me before sending it to Claude — so I get to review early in the process. Once we finalize this "LLD" or "low-level-prompt", they hand it to Claude. The next time I see the work is in a GitHub PR. At that point, we rarely have to throw everything away and start from scratch.
Seems like your organization is lacking structure and communication.
Where I work, the structure is such that most parts of the codebase have a team that is responsible for it and does the vast majority of changes there. If any "outsider" plans a change, they come talk to the team and coordinates.
And we also have strong intra-team communication. It's clear who is working on what and we have design reviews to agree on the "how" within the team.
It's rare that what you describe happens. 95% of the code reviews I do are without comments or only with minor suggestions for improvement. Mainly because we have developed a culture of talking to each other about major things beforehand and writing the code is really just the last step in the process. We also have developed a somewhat consistent style within the teams. Not necessarily across the teams, but that's ok.
TL;DR: It's certainly possible to do things better that what you are experiencing. It's a matter of structure, communication and culture.
This can be used in any process where the result is only judged at the end.
The solution here may be to add a midterm check. I think this is what you mean by a "design review."
In my experience, there are some rules that need to be followed for it to work.
- Keep the number of stakeholders involved in all decisions, including PR, as small as possible.
- Everyone involved should take part in this check. That way, no one will be surprised by the results.
- This check should have been documented, like in the ticket.
This can be used in any process where the result is only judged at the end. The solution here may be to add a midterm check. I think this is what you mean by a "design review." In my experience, there are some rules that need to be followed for it to work. We should keep the number of stakeholders involved in all decisions, including PR, as small as possible. Everyone involved should take part in this mid-term check. That way, no one will be surprised by the results. This check should have been documented, like in the ticket.
When and how to do this check and how to handle disagreements depend on the task, culture, and personalities.
We should do something similar with AI-coding.
If you don't have a documented mid-term check, vibe-coded PR might not be what you expected.
You can get the same thing if you do user interface testing after you've built the thing. A design system can help there - at the very least, the change can feed back into the next revision of the playbook.
Even if you can't fix it this time, hopefully you've taught someone a better pattern. The direction of travel should still be positive.
On personal projects I've used architectural decision records, but I've never tried them with a team.
Just taking a step back, it is SO COOL to me to be reading about stacked pull requests on HN.
When we started graphite.dev years ago that was a workflow most developers had never heard of unless they had previously been at FB / Google.
Fun to see how fast code review can change over 3-4yrs :)
I'm a pre-mercurial arcanist refugee who tends to promote Graphite in teams that are still struggling with mega-PRs and merge commits and other own goal GitHub-isms. Big fan in general even with the somewhat rocky scaling road we've been on :)
And I very much appreciate both the ambition and results that come from making it interop with PRs, its a nightmare problem and its pretty damned amazing it works at all, let alone most of the time.
I would strongly lobby for a prescriptive mode where Graphite initializes a repository with hardcore settings that would allow it to make more assumptions about the underlying repo (merge commits, you know the list better than I do).
I think that's what could let it be bulletproof.
We've talked about a "safe mode" where we initialize it similar to JJ - such that you can no longer directly run git commands without funneling them thru graphite, but which would make it bulletproof. Would that be interesting?
I think jujitsu is interesting it it's own right!
It seems non-obvious that you would have to prohibit git commands in general, they're already "buyer beware" with the current tool (and arcanist for that matter). Certainly a "strict mode" where only well-behaved trees could interact with the tool creates scope for all kinds of performance and robustness optimizations (and with reflog bisecting it could even tell you where you went off script).
I was more referring to the compromises that gt has to make to cope with arbitrary GitHub PRs seem a lot more fiddly than directly invoking git, but that's your area of expertise and my anecdote!
Broad strokes I'm excited for the inevitable decoupling of gt from GitHub per se, it was clearly existential for zero to one, but you folks are a first order surface in 2025.
Keep it up!
git-spice does everything I liked from Graphite, but it’s fully open source and easy to adopt piecemeal.
git-spice is completely open source and free: https://abhinav.github.io/git-spice/
Stacked pull requests seem to add a layer of complexity to solve a problem that should and can be avoided in the first place.
Frequent, small changes are really a good practice.
Then we have things like trunk-based development and continuous integration.
I’m confused. How do you do frequent small changes and avoid stacked PRs. Do you just do a small commit, wait for a review, merge, do another small commit? Or do you make a bunch of small commits locally and only put up the next one for review when the previous one is reviewed and merged?
That’s the only models I can think of and it’s weird to advocate to have a variable time asynchronous process in the middle of your code or review loops. Seems like you’re just handicapping your velocity for no reason.
Stacked PRs are precisely about factoring out small changes into individually reviewable commits that can be reviewed and landed independently, decoupling reviewer and developer while retaining good properties like small commits that the reviewer is going to do a better job on, larger single purpose commits that the reviewer knows to spend more time on without getting overwhelmed dealing with unrelated noise, and the ability to see relationships between smaller commits and the bigger picture. Meanwhile the developer gets to land unobtrusive cleanups that serve a broader goal faster to avoid merge conflicts while getting feedback quicker on work while working towards a larger goal.
The only time stacked commits aren’t as useful is for junior devs who cants organize themselves well enough to understand how to do this well (it’s an art you have to intentionally practice at) and don’t generally have a good handle on the broader scope of what they’re working towards.
Trunk-based development, by itself, is a fool's errand.
But combine it with TDD & pairing and it becomes a license to deliver robust features at warp speed.
I don’t follow. Regardless of where you merge, are you not developing features on a shared branch with others? Or do you just have a single long development branch and then merge once “you’re done” and hope that there’s no merge conflicts? But regardless, I’m missing how reviews are being done.
Stacked PRs allow me to post frequent, small changes without waiting for a review between each one.
Well, you don't need stacked PRs for that...
I think stacked PRs are a symptoms of the issues the underlying workflow (feature branches with blocking reviews) has.
Given the security incident that happened to CodeRabbit I’m a bit less enthusiastic about testing out new tools that have LLMs and my codebase under the same tool.
What can be a very nice experiment to try something new can easily become a security headache to deal with.
I don’t understand. By LLMs you’re referring to the optional LLM review graphite offers as an additional purchase on product? I’m not sure I understand the risk concern.
As someone who already breaks tasks into atomic (or near atomic) pieces and always has done, is this just submitting a PR for each commit as you go? How does it work for breaking changes? Requires use of feature flags?
Sort of, yeah! It lends itself well to a 1 PR = 1 commit philosophy. Every PR has to pass CI to be mergeable. If you want to make a CI-breaking change, putting that behind a feature flag is one valid strategy.
I'd recommend giving it a try to see what it's like. The `gt`/onboarding tour is pretty edifying and brief.
It's likely that you'll find that `gt` is "enabling" workflows that you've already found efficient solutions for, because it's essentially an opinionated and productive subset of git+github. But it comes with some guardrails and bells and whistles that makes it both (1) easier for devs who are new to trunk-based dev to grok and (2) easier for seasoned devs to do essentially the same work they were already doing with fewer clicks and less `git`-fu.
Dude, I love Graphite.
Best AI code review, hands down. (And I’ve tried a few.)
The biggest grip I have with Github is the app is painfully slow. And by slow, I mean browser tab might freeze level slow.
Shockingly, the best code review tool I've ever used was Azure DevOps.
Stash (now BitBucket Server) had the best code review going, head and shoulders above GitHub to the point I thought GitHub would obviously adopt their approach. But I imagine Atlassian has now made it slow and useless like they do with all their products and acquisitions.
Stash was not an acquisition. Stash was built from the ground up inside Atlassian during its golden age, by a bunch of engineers who really cared about performance. Though it helped that they didn't have Jira's 'problem' of having 8 figures of revenue hanging off a terrible database schema designed a decade ago.
You might be thinking of Fisheye/Crucible, which were acquisitions, and suffered the traditional fate of being sidelined.
(You are 100% correct that Stash/Bitbucket Server has also been sidelined, but that has everything to do with their cloud SaaS model generating more revenue than selling self-hosted licenses. The last time I used it circa 2024, it was still way faster than Bitbucket Cloud though.)
Source: worked at Atlassian for a long time but left a few years ago.
Bit Bucket had a git-related tool called Stash? I love Bit Bucket, but I'm glad I did not know about that.
What did you like so much about DevOps?
I use it every day and don't have any issues with the review system, but to me it's very similar to github. If anything, I miss being able to suggest changes and have people click a button to integrate them as commits.
I've used this 'suggestion' workflow in azure devops. https://devblogs.microsoft.com/devops/introducing-the-new-pu...
I find the idea of using git for code reviews directly quite compelling. Working with the change locally as you were the one who made it is very convenient, considering the clunky read-only web UI.
I didn't get why stick with the requirement that review is a single commit? To keep git-review implementation simple?
I wonder if approach where every reviewer commits their comments/fixes to the PR branch directly would work as well as I think it would. One might not even need any additional tools to make it convenient to work with. This idea seems like a hybrid of traditional github flow and a way Linux development is organized via mailing lists and patches.
By read-only I meant that you can't fully interact with the code: run/debug it, use intellisense, etc.
Described well in the post. This way you have to switch between ide and web diff viewer, redundant and not convenient.
> I didn't get why stick with the requirement that review is a single commit
Yeah that is pretty weird. If 5 people review my code, do they all mangle the same review commit? We don't do that with code either, feels like it's defeating the point.
Review would need to be commits on top of the reviewed commit. If there are 5 reviews of the same commit, then they all branch out from that commit. And to address them, there is another commit which also lives besides them. Each commit change process becomes a branch with stacked commits beinf branches chained on top of one another. Each of the commits in those chained branches then has comment commits attached. Those comment commits could even form chains if a discussion is happening. Then when everybody is happy, each branch gets squashed into a single commit and those then get rebased on the main branch.
You likely want to make new commits for that though to preserve the discussions for a while. And that's the crux: That data lives outside the main branch, but needs to live somewhere.
> When I review code, I like to pull the source branch locally. Then I soft-reset the code to mere base, so that the code looks as if it was written by me.
This is eerily similar to how I review large changes that do not have a clear set of commits. The real problem is working with people that don’t realize that if you don’t break work down into small self contained units, everybody else is going to have to do it individually. Nobody can honestly say they can review tons of diffs to a ton of files and truly understand what they’ve reviewed.
The whole is more than just the sum of the parts.
For those that want an easy button. Here ya go.
``` review () { if [[ -n $(git status -s) ]] then echo 'must start with clean tree!' return 1 fi
git checkout pristine # a branch that I never commit to
git rebase origin/master
branch="$1"
git branch -D "$branch"
git checkout "$branch"
git rebase origin/master
git reset --soft origin/master
git reset
nvim -c ':G' # opens neovim with the fugitive plugin - replace with your favorite editor
git reset --hard
git status -s | awk '{ print $2 }' | xargs rm
git checkout pristine
git branch -D "$branch"
}
```This does also tie in directly with tickets and the overall workflow the team has. I find this to have a huge effect on how managable PRs are. I feel the majority of devs are quite oblivious to the code they produce, they simply keep coding untill they fill the acceptence criteria. No matter if the result is 200 lines in 1 file, or 1 000 lines in 30 files.
I use this: https://github.com/sindrets/diffview.nvim
as a PR review tool in neovim. It's basically vscode's diff tool UI-wise but integrates with vim's inbuilt diff mode.
Also, `git log -p --function-context` is very useful for less involved reviews.
Its pretty clear to a growing number of devs what a review tool should look like. It is more a matter of what needs to happen so this becomes a usable and sustainable reality and what shape of organisation/ players can make this happen in the right way.
- git itself wont go much further than the change-id which is already a huge win (thanks to jj, git butler, gerrit and other teams)
- graphite and github clearly showed they are not interested in solving this for anyone but their userslaves and have obviously opposing incentives.
- there are dozens of semi abandoned cli tools trying this without any traction, a cli can be a part of a solution but is just a small part
What we need:
- usable fully local
- core team support for vscode not just a broken afterthought by someone from the broader community
- web UI for usecases where vscode does not fit (possibly via vscode web or other ways to reuse as much of the interface work that went into the vscode integration)
- the core needs to be usable from a cli or library with clear boundaries so other editor teams can build as great integrations as the reference but fitting their native ui concepts
- it needs to work for commits, branches, stacked commits and any snapshot an agent creates as well as reviewing a devs own work before pushing
- it needs to incorporate CI/CD signals natively, meta did great UI work on this and its crucial to not ignore all that progress but build on top of it
- it needs to be as fine grained as the situation requires and with editability at every step. Why can i just accept one line in cursor but there is nothing like that when reviewing a humans code? Why can i fix a typo without any effort when reviewing in cursor when i have to go through at least 5 clicks to do the same when fixing a typo of a human.
- It needs to by fully incremental, when a pr is fixed there needs to be a simple way to review just the fix and not re-review the whole pr or the full file
Here's an alternative I've wondered about: Instead of one person writing code, and another reviewing it - instead you have one person write the first pass and then have another person adjust it and merge it in. And vice-versa; the roles rotate.
Anyone tried something like this? How did it go?
I've noticed for a long time that if I have participated in writing the code under review, I'm able to provide much more insight. I think what you're suggesting starts from thinking the code as "our code" instead of my code vs. your code, which so easily happens with pull requests. And learning to work iteratively instead of trying to be too perfect from the start, which goes well with methodologies like TDD.
I would want the first person to write 90+% of the code, and really more like ~98% of it, because at some point you need to just do your job. But I like the idea of having the reviewer make the relevant changes themselves and merge it in. That's more or less what we did at the first place I worked, and the expectation was that both of you were responsible for the code. If it was more than minor changes the second person could send you notes for you to implement, but they were always the person to merge. I prefer it to the alternative of chasing someone down so that they hit "approve" so that you can go back to your desk and hit "merge."
That’s basically an async pair programming session, isn’t it?
Just cases where PR were submitted close to someone holidays and I assigned it to someone else in the team to bring it over the line. But otherwise I have worked with sync pair programming only.
Great idea, if you're fine with development time to take twice as long.
I use the GitHub Pull Request extension in VSCode to do the same thing (reviewing code locally in my editor). It works pretty well, and you can add/review comments directly in the editor.
It's better, but still quite deep vendor lock-in (in both GitHub and VSCode).
Well my employer chooses to use GitHub so I don’t have a choice there. And it’s vendor lock-in VSCode but that’s already my primary editor so it means there’s no need to learn another tool just for code review.
GitHub may be dominant, but it's not like it doesn't have competitors nipping at its heels (GitLab, BitBucket come to mind).
VSCode is open source, and there are plenty of IDEs...
I guess I'm just focused on different lock-in concerns than you are.
Same! Its much nicer now especially since Github seems to be pretty arbitrary/rigid about when it hides files that have "too many changes". Its so much nicer to see/navigate around such changes quickly in VSCode vs trying to do the same in the web interface.
I suspect that since this is possible with VSCode/Github, its probably extensible to other providers editors.
It's so cool that Git is considering first class change IDs!! That's huge! This sounds similar to what we had at Facebook to track revisions in Phabricator diffs. Curious if anyone knows the best place to read about this?
The fundamental problem is that git doesn't track branches in any sane way. Maybe it would be better to fix that? Fossil remembers what branch a commit was committed on, so the task branch itself is a change ID. That might be tricky to solve while also allowing git commands to mess with history of course. Fossil doesn't have that problem.
Agree with your pain points. One thing id add is GitHub makes you reapprove every PR after each push. As an OSS contributor it’s exhausting to chase re-approvals for minor tweaks.
This is a security setting that the author has chosen to enable.
Hm that’s not the case for my repositories? Maybe you have a setting enabled for that?
Recently, I've been wondering about the point of code review as a whole.
When I started my career, no one did code review. I'm old.
At some point, my first company grew; we hired new people and started to offshore. Suddenly, you couldn't rely on developers having good judgement... or at least being responsible for fixing their own mess.
Code review was a tool I discovered and made mandatory.
A few years later, everyone converged on GitHub, PRs, and code review. What we were already doing now became the default.
Many, many years layer, I work with a 100% remote team that is mostly experienced and 75% or more of our work is writing code that looks like code we've already written. Most code review is low value. Yes, we do catch issues in review, especially with newer hires, but it's not obviously worth the delay of a review cycle.
Our current policy is to trust the author to opt-in for review. So far, this approach works, but I doubt it will scale.
My point? We have a lot of posts about code review and related tools and not enough about whether to review and how to make reviews useful.
I am very much in the same position right now. My dev team has introduced mandatory code reviews for every change and I can see their output plummeting. It also seems that most code reviews done are mostly syntax and code format related - noone actually seems to run the code or look at the actual logic if it makes sense.
I think its easy to add processes under the good intention of "making the code more robust and clean", but I never heard anyone discuss what is the cost of this process to the team's efficiency.
Interesting take! Personally I'd never throw out code review, for a couple reasons.
1. It's easy to optimise for talented, motivated people in your team. You obviously want this, and it should be the standard, but you also want it to be the case that somebody who doesn't care about their work can't trash the codebase.
2. I find even people just leaving 'lgtm' style reviews for simple things, does a lot to make sure folks keep up with changes. Even if there's nothing caught, you still want to make sure there aren't changes that only one person knows about. That's how you wind up with stuff like, the same utility functions written 10 times.
My rule of thumb is that if you have an OnCall rotation for a codebase, you should require reviews. Besides all the benefits you've mentioned, its important to spread know-how of the code so that people on the rotation don't need to be pulled in e.g. over the weekends/on vacation because they're the only ones familiar with the code.
(There should be breakglass mechanisms to bypass code reviews, sure. Just the default should always be to require reviews)
Very nice to read.
Sourcehut is missing in the list; it’s built on the classical concept of sending patches / issues / bugs / discussion threads via email and it integrates this concept into its mailing lists and ci solution that also sends back the ci status / log via email.
Drew Devault published helpful resources for sending and reviewing patches via email on git-send-email.io and git-am.io
me and my team have been doing code reviews purely within IntelliJ, for something like 6 years. We started doing it "by hand", by checking out the branch and comparing with master, then using Github for comments.
Now there's official support and tooling for reviews (at least in IDEA, but probably in the others too), where you also get in-line highlighting of changed lines, comments, status checks, etc...
I feel sorry for anyone still using GitHub itself (or GitLab or whatever). It's horrible for anything more than a few lines of changes here and there.
I have been working on the PR implementation for lubeno[1] and have been thinking a lot about the code review process.
A big issue is that every team has a slightly different workflow, with different rules and requirements. The way GitHub is structured is a result of how the GitHub team works. They built the best tool for themselves with their "just keep appending commits to a PR" workflow.
Either you need to have enough flexibility so that the tool can be adapted to everyone's existing workflow. Or you need to be opinionated about your workflow (GitHub) and force everyone to match it in some way. And in most cases this works very well, because people just want you to tell them the best way of doing things and not spend time figuring out what the best workflow would look like.
[1]: https://lubeno.dev
This is how I learned to do code review when I was a new junior dev. I would write my review comments on another junior's code, and then our team lead would go write their comments that we missed, and then both of us juniors would read and see what we missed. It was a good way to learn about coding and reviewing I think.
I use CodeRabbit that helps, but it does not fix the two root issues. I run their free VS code plugin to review local commits first, which catches nits, generates summaries, and keeps me in my editor. The PR bot then adds structure so humans focus on design and invariants. Review state still lives in the forge, not in Git, and interdiffs still depend on history. If Git gets a stable Change-Id, storing review metadata in Git becomes realistic. Until then this is a pragmatic upgrade that reduces friction without changing the fundamental.
Essentially, you are turning fork/branch induced changes to "precommit" review like workflow which is great.
I was on a lookout for best "precommit" review tool and zeroed on Magit, gitui, Sublime Merge.
I am not an emac user, so i'll have to learn this.
I never did proper code review, other than when being lucky that we got a team of top devs in specific projects.
More often than not, it either doesn't exist, or turns out in a kind of architecture fetishism that the lead devs/architects have from conferences or space ship enterprise architecture.
Already without this garbage it feels so much better, than arguing about SOLID, clean code, hexagonal architecture, member functions being with an underscore, explicit types or not,...
Gitpatch attempts to solve this. Supports versioned patches and patch stacks (aka stacked PRs). Also handles force-pushes in stacks correctly even without Change-IDs using heuristics based on title, author date etc. It should also be unusually fast. Disclosure: I'm the author.
I'm not convinced that review comments as commits make thing easier, but I think storing them in git in some way is a good idea (i.e. git annotations or in commit messages after merge etc)
> But modifying code under review turned out to be tricky.
GitLab enables this - make the suggestion in-line which the original dev can either accept or decline.
Kind of. Don't you have to type the change into the browser? Which means your change might not even be syntactically correct. It would be far better if you could make the change locally then somehow and that straight to GitLab. Also how does it work with multiple commits? Which commit does it amend?
> Alas, when I want to actually leave feedback on the PR, I have to open the browser, navigate to the relevant line in the diff, and (after waiting for several HTTP round-trips) type my suggestion into a text area
This doesn't seem like much of a problem, does it? It's a matter of alt-tab and a click or two.
Also, what is the point of having reviews in the git history?
I was recently looking for something that at least presents a nice diff that resembles code review one in neovim.
This is a pretty cool tool for it: https://github.com/sindrets/diffview.nvim
On the branch that you are reviewing, you can do something like this:
:DiffviewOpen origin/HEAD...HEAD
Worth mentioning that Tangled has support for stacked pull requests and a unique round-based PR flow with interdiffing: https://blog.tangled.sh/stacking
This brings back memories of https://opendev.org/ttygroup/gertty when I was contributing to OpenStack
I am happy with Gerrit but I am sure I do not know even how to use 20% of its capacity.
The patchsets get stacked up and you know where you left off if there are different changes and that is very cool.
I like the idea of max 500 lines for custom solutions to problems that have "existing" solutions, might have to steal that.
Anyone know what editor the author is using in the first screenshot showing two panels side by side?
If you want to remain relevant in the AI-enabled software engineering future, you MUST get very good at reviewing code that you did not write.
AI can already write very good code. I have led teams of senior+ software engineers for many years. AI can write better code than most of them can at this point.
Educational establishments MUST prioritize teaching code review skills, and other high-level leadership skills.
> AI can already write very good code
Debatable, with same experience, depends on the language, existing patterns, code base, base prompts, and complexity of a task
Yeah, LLMs can do that very well, IMO. As an experienced reviewer, the "shape" of the code shouldn't inform correctness, but it can be easy to fall into this pattern when you review code. In my experience, LLMs tend to conflate shape and correctness.
> As an experienced reviewer, the "shape" of the code shouldn't inform correctness, but it can be easy to fall into this pattern when you review code.
For human written code, shape correlates somewhat with correctness, largely because the shape and the correctness are both driven by the human thought patterns generating the code.
LLMs are trained very well at reproducing the shape of expected outputs, but the mechanism is different than humans and not represented the same way in the shape of the outputs. So the correlation is, at best, weaker with the LLMs, if it is present at all.
This is also much the same effect that makes LLMs convincing purveyors of BS in natural language, but magnified for code because people are more used to people bluffing with shape using natural language, but churning out high-volume, well-shaped, crappy substance code is not a particularly useful skill for humans to develop, and so not a frequently encountered skill. And so, prior to AI code, reviewers weren't faced with it a lot.
> you MUST get very good at reviewing code that you did not write.
I find that interesting. That has always been the case at most places my friends and I have worked at that have proper software engineering practices, companies both very large and very small.
> AI can already write very good code. I have led teams of senior+ software engineers for many years. AI can write better code than most of them can at this point.
I echo @ZYbCRq22HbJ2y7's opinion. For well defined refactoring and expanding on existing code in limited scope they do well, but I have not seen that for any substantial features especially full-stack ones, which is what most senior engineers I know are finding.
If you are really seeing that then I would either worry about the quality of those senior+ software engineers or the metrics you are using to assess the efficacy of AI vs. senior+ engineers. You don't have to even show us any code: just tell us how you objectively came to that conclusions and what is the framework you used to compare them.
> Educational establishments MUST prioritize teaching code review skills
Perhaps more is needed but I don't know about "prioritizing"? Code review isn't something you can teach as a self-contained skill.
> and other high-level leadership skills.
Not everyone needs to be a leader and not everyone wants to be a leader. What are leadership skills anyway? If you look around the world today, it looks like many people we call "leaders" are people accelerating us towards a dystopia.
I’m considered one of the stronger code reviewers on the team, what grinds my gears is seeing large, obviously AI heavy PRs and finding a ton of dumb things wrong with them. Things like totally different patterns and even bugs. I’ve lost trust that the person putting up the PR has even self reviewed their own code and has verified it does what they intend.
If you’re going to use AI you have to be even more diligent and self reviewed your code, otherwise you’re being a shitty team mate.
Same. I work at a place that has gone pretty hard into AI coding, including onboarding managers into using it to get them into the dev lifecycle, and it definitely puts an inordinate amount of pressure on senior engineers to scrutinize PRs much more closely. This includes much more thorough reviews of tests as well since AI writes both the implementation and tests.
It's also caused an uptick in inbound to dev tooling and CI teams since AI can break things in strange ways since it lacks common sense.
if you are seeing that it just means they are not using the tool properly or using the wrong tool.
AI assisted commits on my team are "precise".
There is no reason to think that code review will magically be spared by the AI onslaught while code writing falls, especially as devs themselves lean more on the AI and have less and less experience coding every day.
There just hasn't been as many resources yet poured into improving AI code reviews as there has for writing code.
And in the end the whole paradigm itself may change.
Totally agree with this. Code review is quickly becoming the most important skill for engineers in the AI era. Tools can generate solid code, but judgment, context, and maintainability come from humans. That’s exactly why we built LiveReview(https://hexmos.com/livereview/) — to help teams get better at reviewing and learning from code they didn’t write.
I don't know about "shouldn't", I think it's fine if they do. But I basically agree, at some fundamental level, you have to have some trust in your coworkers. If someone says "This fixes X", and they haven't even tried running it or testing it, they shouldn't be your coworker. The purpose of code reviews shouldn't be "is this person honest?" or "is this person totally incompetent?". If they're not, it's a much bigger issue, one that shouldn't be dealt with through code reviews.
Very different situation if it's open source or an external contribution, of course.
The author mentioned that he doesn't want to make suggestions that don't actually work. That seems like a pretty valid reason to run the code.
While I like the post and agree with everything the author talked about I find that this is not my problem. Despite having a similar workflow (classic vim user). The problem I have and I think a lot of others have too is that review just doesn't actually exist. LGTMs are not reviews, yet so common.
I'm not sure there's even a tech solution to this class of problems and it is down to culture. LGTMs exist because it satisfies the "letter of the law" but not the spirit. Classic bureaucracy problem combined with classic engineer problems. It feels like there are simple solutions but LGTMs are a hack. You try to solve this by requiring reviews but LGTMs are just a hack to that. Fundamentally you just can't measure the quality of a review[0]. Us techie types and bureaucrats have a similar failure mode: we like measurements. But a measurement of any kind is meaningless without context. Part of the problem is that businesses treat reviewing as a second class citizen. It's not "actual work" so shouldn't be given preference, which excuses the LGTM style reviews. Us engineers are used to looking at metrics without context and get lulled into a false sense of security, or convince ourselves that we can find a tech solution to this stuff. I'm sure someone's going to propose a LLM reviewer and hey, it might help, but it won't address the root problems. The only way to get good code reviews is for them to be done by someone capable of writing the code in the first place. Until the LLMs can do all the coding they won't make this problem go away, even if they can improve upon the LGTM bar. But that's barely a bar, it's sitting on the floor.
The problem is cultural. The problem is that code reviews are just as essential to the process as writing the code itself. You'll notice that companies that do good code review already do this. Then it is about making this easier to do! Reducing friction is something that should happen and we should work on, but you could make it all trivial and it wouldn't make code reviews better if they aren't treated as first class citizens.
So while I like the post and think the tech here is cool, you can't engineer your way out of a social problem. I'm not saying "don't solve engineering problems that exist in the same space" but I'm making the comment because I think it is easy to ignore the social problem by focusing on the engineering problem(s). I mean the engineering problems are magnitudes easier lol. But let's be real, avoiding addressing this, and similar, problems only adds debt. I don't know what the solution is[1], but I think we need to talk about it.
[0] Then there's the dual to LGTM! Code reviews exist and are detailed but petty and overly nitpicky. This is also hacky, but in a very different way. It is a misunderstanding of what review (or quality control) is. There's always room for criticism as nothing you do, ever, will be perfect. But finding problems is the easy part. The hard part is figuring out what problems are important and how to properly triage them. It doesn't take a genius to complain, but it does take an expert to critique. That's why the dual can even be more harmful as it slows progress needlessly and encourages the classic nerdy petty bickering over inconsequential nuances or over unknowns (as opposed to important nuances and known unknowns). If QC sees their jobs as finding problems and/or their bosses measure their performance based on how many problems they find then there's a steady state solution as the devs write code with the intentional errors that QC can pick up on, so they fulfill their metric of finding issues, and can also easily be fixed. This also matches the letter but not the spirit. This is why AI won't be able to step in without having the capacity of writing the code in the first place, which solves the entire problem by making it go away (even if agents are doing this process).
[1] Nothing said here actually presents a solution. Yes, I say "treat them as first class citizens" but that's not a solution. Anyone trying to say this, or similar things, is a solution is refusing to look at all the complexities that exist. It's as obtuse as saying "creating a search engine is easy. All you need to do is index all (or most) of the sites across the web." There's so much more to the problem. It's easy to over simplify these types of issues, which is a big part of why they still exist.
Part of the problem is that businesses treat reviewing as a second class citizen. It's not "actual work" so shouldn't be given preference, which excuses the LGTM style reviews.
I've been out of the industry for a while but I felt this way years ago. As long as everybody on the team has coding tasks, their review tasks will be deprioritized. I think the solution is to make Code Reviewer a job and hire and pay for it, and if it's that valuable the industry will catch on.
I would guess that testing/QA followed a similar trajectory where it had to be explicitly invested in and made into a job to compete for or it wouldn't happen.
I can be totally wrong, but I feel like that's a thing that sounds better on paper. I'm sure there's ways to do this correctly but every instance I've seen has created division and paid testers/QC less. Which I'd say the lower pay is a strong signal of it being considered second class. Has anyone seen this work successfully?
I also think there's benefits to review being done by devs. They're already deep in the code and review does have a side benefit of broadening that scope. Helping people know what others are doing. Can even help serve as a way to learn and improve your development. I guess the question is how valuable these things are?
I don't see a lot of value in generic code reviewers. I want the reviewers to be actively engaged in writing somewhat related code themselves, otherwise the value of their opinions will decline over time.
As for prioritization... isn't it enough knowing that other people are blocked on your review? That's what incentivizes me to get to the reviews quickly.
I guess it's always going to depend a lot on your coworkers and your organization. If the culture is more about closing tickets than achieving some shared goal, I don't know what you could do to make things work.
I've used a more hard-core version of this in my own company and have been meaning to write a blog post about it for years. For now an HN comment will suffice. Here's my version, the rationale and the findings.
WORKFLOW
Every repository is personal and reviewer merges, kernel style. Merging is taking ownership: the reviewer merges into their own tree when they are happy and not before. By implication there is always one primary code reviewer, there is never a situation where someone chooses three reviewers and they all wait for someone else to do the work. The primary reviewer are on the hook for the deliverable as much as the reviewee is.
There is no web based review tool. Git is managed by a server configured with Gitolite. Everyone gets their own git repository under their own name, into which they clone the product repository. Everyone can push into everyone else's repos, but only to branches matching /rr/{username}/something and this is how you open a pull request. Hydraulic is an IntelliJ shop and the JetBrains git UI is really good, so it's easy to browse open RRs (review requests) and check them out locally.
Reviewing means pushing changes onto the rr branch. Either the reviewer makes the change directly (much faster than nitpicky comment roundtrips), or they add a //FIXME comment that IntelliJ is configured to render in lurid yellow and purple for visibility. It's up to the reviewee to clear all the FIXMEs before a change will be merged. Because IntelliJ is very good at refactoring, what you find is that reviewers are willing to make much bigger improvements to a change than you'd normally get via web based review discussions. All the benefits the article discusses are there except 100x because IntelliJ is so good at static analysis. A lot of bugs that sneak past regular code review are caught this way because reviewers can see live static analysis results.
Sometimes during a review you want to ask questions. 90% of the time, this is because the code isn't well documented enough and the solution is to put the question in a //FIXME that's cleared by adding more comments. Sometimes that would be inappropriate because the conversation would have no value to others, and it can be resolved via chat.
Both reviewee and reviewer are expected to properly squash and rebase things. It's usually easier to let commits pile up during the review so both sides have state on the changes, and the reviewer then squashes code review commits into the work before merging. To keep this easy most review requests should turn into one or two commits at most. There should not be cases where people are submitting an RR with 25 "WIP" commits that are all tangled up. So it does require discipline, but this isn't much different to normal development.
RATIONALE
1. Conventional code review can be an exhausting experience, especially for junior developers who make more mistakes. Every piece of work comes back with dozens of nitpicky comments that don't seem important and which is a lot of drudge work to apply. It leads to frustration, burnout and interpersonal conflicts. Reviewees may not understand what is being asked of them, resulting in wasted time. So, latency is often much lower if the reviewer just makes the changes directly in their IDE and pushes. People can then study the commits and learn from them.
2. Conventional projects can struggle to scale up because the codebase becomes a commons. Like in a communist state things degrade and litter piles up, because nobody is fully responsible. Junior developers or devs under time pressure quickly work out who will give them the easiest code review experience and send all the reviews to them. CODEOWNERS are the next step, but it's rare that the structure of your source tree matches the hierarchy of technical management in your organization so this can be a bad fit. Instead of improving widely shared code people end up copy/pasting it to avoid bringing in more mandatory reviewers. It's also easy for important but rarely changed directories to be left out, resulting in changes to core code slowing down because it'd require the founder of the company to approve a trivial refactoring PR.
FINDINGS
Well, it worked well for me at small scale (decent sized codebase but a small team). I never scaled it up to a big team although it was inspired by problems seen managing a big team.
Because most questions are answered by improving code comments rather than replying in a web UI the answers can help LLMs. LLMs work really well in my codebase and I think it's partly due to the plentiful documentation.
Sometimes the lack of a web UI for browsing code was an issue. I experimented with using IntelliJ link format, but of course not everyone wants to use IntelliJ. I could have set up a web UI over git just for source browsing, without the full GitHub experience, but in the end never bothered.
Gitolite is a very UNIXy set of Perl scripts. You need a gray beard to use it well. I thought about SaaSifying this workflow but it never seemed worth it.
This hits home. I’ve run into the same pain with conventional web-based review tools: slow, nitpicky, and nobody really “owns” the merge. Your kernel-style approach makes a ton of sense — putting the reviewer on the hook changes the dynamic completely. And pushing FIXMEs straight into the branch instead of playing comment-ping-pong? That’s a huge quality-of-life win.
We’ve gone a slightly different route at my team. Instead of reinventing the workflow around Gitolite/IntelliJ, we layered in LiveReview(https://hexmos.com/livereview/). It’s not as hardcore, but it gives us a similar payoff: reviewers spend less time on drudge work because LiveReview auto-catches a ton of the small stuff (we’re seeing ~40% fewer prod bugs). That leaves humans free to focus on the bigger design and ownership questions — the stuff machines can’t solve.
Different tools, same philosophy: make review faster, saner, and more about code quality than bureaucracy.
I got into using Jujutsu this year. I'm liking it so far. Is there a beta access in the works?
putting the review into git notes might have worked better. It's not attached to tje lines directly, but the commit and it can stay as part of the repo
What bothered me for a long time with code reviews is that almost all useful things they catch (i.e. not nit-picking about subjective minor things that doesn't really matter) are much too late in the process. Not rarely the only (if any) useful outcome of a review is that everything has to be done from scratch in a different ways (completely new design) or that it is abandoned since it turns out it should never have been done at all.
It always seems as if the code review is the only time when all stakeholders really gets involved and starts thinking about a change. There may be some discussion earlier on in a jira ticket or meeting, and with some luck someone even wrote a design spec, but there will still often be someone from a different team or distant part of the organization that only hears about the change when they see the code review. This includes me. I often only notice that some other team implemented something stupid because I suddenly get a notification that someone posted a code review for some part of the code that I watch for changes.
Not that I know how to fix that. You can't have everyone in the entire company spend time looking at every possible thing that might be developed in the near future. Or can you? I don't know. That doesn't seem to ever happen anyway. At university in the 1990's in a course about development processes there wasn't only code reviews but also design reviews, and that isn't something I ever encountered in the wild (in any formal sense) but I don't know if even a design review process would be able to catch all the things you would want to catch BEFORE starting to implement something.