Who has not been there: getting pinged for a peer review for a ‘simple’ story, only to see a 2000 line diff, where the developer has spent weeks on the story, refactored half the codebase, and taken a turn in the implementation that makes no sense to you, and does not match any existing patterns in your codebase.
I think we all have experienced this. But if this is a pattern, the problem is a fundamental misunderstanding of the purpose of code reviews, and how they should be approached. With AI tooling dissolving the bottleneck of writing code, flaws in your processes become severe enough that we need to reconsider the entire process. Should we be relying on reviews after the fact?
Why we review
So to kick things off, I’ve always been a strong proponent of reviews. Eric S. Raymond famously said “Given enough eyeballs, all bugs are shallow” in his essay “The Cathedral and the Bazaar”, back in 1999. This was written to explain the benefits of open source, but decades later it still applies to software engineering in general, and specifically to the practice of reviewing code.
When done right, reviews improve the quality of the code, and at the same time make sure you have more than one developer who understands the code under review. It should come as no surprise what the risks are of having only a single developer who can work on a critical codebase.
Code reviews should not be primarily about “nits”, test coverage and syntax. Whenever possible use tooling like linters, coverage tooling and scanners for outdated dependencies and security risks. With AI review tooling, this can go a step further even: AI tooling is great at flagging when, for example, documentation does not match the code anymore.
What developers should focus on are the core decisions taken in the approach. Does the design make sense? Are we using the correct abstractions? Do we cover edge cases that tooling won’t catch? Do we strike the right balance between complexity and maintainability?
Our time and attention is our most finite resource, and we should spend it wisely.
Looks good to me
So while reviews are great in theory and practice, they are easy to get wrong. Are you using reviews as a collaboration tool, or are you (or other engineers) using them to force personal preferences on an entire team?
The first mistake I see often is a strong focus on bikeshedding unimportant details. While details matter, they should not be the focus of the review over the big picture design questions. Again, the amount of time and energy we have to spend in a day is limited. The person who is inviting a review is inviting feedback, not an endless discussion on the philosophy of object-oriented programming.
Another common mistake I see is where stories get refined in a very shallow way, only for fundamentally flawed approaches to be discovered during the review stage. I have been in the situation where a developer had been working for weeks on a story, only for us to see that they went off in the completely wrong direction. They interpreted the story wrong, but we then had to tell them they wasted weeks of work. No matter who is “at fault” in this situation, this is never a positive experience for anyone.
It’s tough to “force” a developer to rework their code this extensively. But allowing it to go through because they “worked so hard” on it, is worse. But teams under pressure still often take this approach.
Another common pattern is “Looks good to me” approvals. While CI/CD is often set up in a way that approvals are required (and this is in general a good idea), people will often skim the review and approve it. Without putting in the time needed to understand the code and the context of the change. This often isn’t caught, most of the time it works out, but the times it doesn’t are often the most frustrating and impactful. Whose “fault” is it, when the developer requested a review and both reviewers did not catch the mistake?
With the review load exploding through the use of AI tooling, these kinds of patterns are now becoming more evident. It’s easy to “blame AI”, but this is a cultural issue. Not a tooling issue. In fact, with AI tooling saving so much time, we can easily spend more time on the important things. Like quality.
The unifying explanation of why these pitfalls happen is the complexity paradox. The more complex the change, the more likely people are to rubber stamp it. Again, our time and attention is limited. Companies incentivize developers to complete stories, not to spend time helping other developers complete theirs. AI did not create this problem at all. AI accelerates — both the good and the bad.
Processes fail because of culture. Culture is dictated by what the company values and incentivizes. If you want good reviews, you need to create the right incentives. If you create the wrong incentives for employees (at any level), AI will accelerate them right off a cliff.
The review pass dissolves
We’ve reached a watershed point in the history of software engineering with how AI is changing the way we work. How much AI has improved in the past year is staggering. Within months I’ve seen it go from “I can do it faster and better” to it being a peer that challenges me.
We now see that bottlenecks are moving away from writing code, to the other parts of the process. The human in the loop is the bottleneck, but at the same time a hard requirement. AI gets it right most of the time, but when it gets things wrong, it can do so spectacularly. As a software engineer, you are still responsible for the code you ship. You can’t use “but the AI said…” as an argument why you just leaked PII data.
So AI isn’t a simple tool that we use to speed up a small part of our entire software development lifecycle. Instead, we need to fundamentally reconsider some of our processes. And in a lot of ways, what was old is becoming new again.
One of the most interesting developments I see with the rise of AI tooling is that we’re going back to relatively ‘old school’ practices, but updated for the modern era. Spec Driven Development is a great example of this: the term is new but it’s in essence a combination of TDD and Design By Contract, both of which borrow from formal verification methods.
While most of us work in agile teams, often doing some form of Scrum, many developers have never heard of Extreme Programming, developed in 1996 by Kent Beck, which predates the agile manifesto by 5 years, and which heavily inspired it (Kent Beck is one of the original authors). One of the core principles of XP is Pair Programming, and Mob Programming is an extension of this.
The main issue with mobbing in particular is that it’s rather inefficient when there is a lot of time spent “writing code”. This was for me the primary reason why I was, in general, against mobbing as the default. My ADHD brain is not helping me there; watching others type slowly is painful.
The current state of AI turned this completely around, however. We don’t have to type much more ourselves, so the biggest bottleneck is the decision making. So we can reap the benefits of pairing and mobbing, without the drawbacks.
And this isn’t theoretical as far as I’m concerned. In the past months we’ve done this consistently on non-trivial changes and it works extremely well. Anything that we design, refine and implement in this way can be deployed almost immediately.
We still have our automated CI/CD processes in place as a verification step. But doing the design work up front and making the big decisions together means that we do not have to spend time on those after the fact. We just use the decisions to steer the AI in the right direction, and the code review happens as part of the process.
We should rebuild
While I am still not fond of forcing developers to pair program, what I’ve experienced so far is that the economics completely changed now that we are using AI tooling. We produce better results faster in pairs, and can immediately push through changes because we both do not have to spend time and energy understanding the context. Because the context is already ‘live’ in our heads.
So we need to make a choice: do we refine our processes to take AI into account, or do we rebuild them from the ground up? In this day and age, our only option is to reassess and rebuild our fundamental processes.