Code Review at Scale Without Becoming a Bottleneck

Software Delivery code-review, engineering-practice

Code review either distributes context or stalls every PR. Three structural changes that keep review fast, useful and not unfairly loaded on seniors.

  • By Orzed Team
  • 5 min read
Key takeaways
  • Review SLA under 24 hours for routine PRs is the cadence that keeps teams shipping.
  • Round-robin reviewer assignment beats 'whoever sees it first' for distributing load.
  • Review checklists separate blocking from advisory; nobody is sure what they are reviewing for otherwise.
  • Reviewer load should be measured. Senior engineers reviewing 80% of PRs is a structural problem.

A team we worked with had grown from 8 to 24 engineers over eighteen months. They had not changed their code review process. The result: PRs sat for two to four days waiting for review. Half the team’s time was spent context-switching between writing code and reviewing PRs. The most senior engineers were doing 60 to 70 percent of all reviews because everyone defaulted to tagging them.

We measured the review process for a week. Median time-to-first-review was 21 hours. Median time-to-approval was 2.4 days. The senior engineer who was the most-tagged reviewer was spending 14 hours per week on reviews.

We installed three structural changes. Six weeks later: median time-to-first-review was under 4 hours. Time-to-approval was under 1 day. The reviewer load was distributed across 14 engineers instead of concentrated on 4. No engineer was spending more than 3 hours per week reviewing.

This piece is about those three changes. Each one is small; together they reshape what code review feels like at scale.

Change 1: explicit review SLA

The team agrees on a maximum time a PR can sit waiting for first review. A reasonable starting SLA: 4 to 8 working hours for routine PRs, 24 hours for larger ones, paged for urgent fixes.

The SLA is not a target; it is a commitment. If the assigned reviewer cannot meet it, they reassign explicitly. The PR does not sit; it gets a new reviewer who can.

The mechanism: a queue dashboard or a Slack reminder bot that surfaces PRs approaching SLA. A daily standup mention of “any PRs over SLA” makes drift visible.

Without an SLA, “I’ll get to it later” is the default and “later” stretches to days. With one, the team knows the contract and either honours it or surfaces the problem.

Change 2: round-robin reviewer assignment

The author of a PR does not pick the reviewer. A round-robin (or weighted round-robin) assigns reviewers automatically based on availability, expertise and recent load.

Tools: GitHub’s CODEOWNERS for area-based routing, plus a small bot or GitHub action that handles rotation within an area.

# CODEOWNERS sketch
/services/billing/   @billing-team
/services/auth/      @auth-team
/lib/                @platform-team

# Custom rotation: assign one reviewer per area, weighted by current load

The benefits:

  • The senior engineer who used to be tagged on everything stops being.
  • Newer engineers learn faster because they review code they would not have chosen.
  • Knowledge spreads across the team instead of pooling.
  • Reviewer load is measurable and balanceable.

The discipline that makes this work: the assigned reviewer is the assigned reviewer. They review or they reassign. They do not “wait for the senior to weigh in”; that defeats the rotation.

Change 3: review checklist with blocking vs advisory comments

A common review failure is ambiguity about what is required. The reviewer leaves 12 comments. The author cannot tell which ones must be addressed before merge and which are nice-to-haves. Either everything gets addressed (slow) or some get dismissed (the reviewer is annoyed).

Fix: explicit comment categories.

MarkerMeaning
blocking:Must be fixed before merge
suggest:Reviewer’s preference; author decides
question:Reviewer wants to understand; not necessarily a change
nit:Cosmetic or stylistic; usually ignorable

GitHub PR templates can include this taxonomy. After two weeks of use, the team converges on the convention without it being mentioned again.

The effect: reviews become parseable. The author sees “two blockings, three suggests, one nit” and knows what to do. Time to address comments drops; arguments about “do I have to change this” stop.

What to review for

Effective code review is not “look at every line and find bugs”. The reviewer’s eyes are not better than the author’s at line-level bugs (the author wrote the code; they have already read it more carefully). The review’s value is:

Intent and design. Does this change do what the issue says? Is the chosen approach reasonable? Are there alternatives the author should have considered?

Blast radius. What breaks if this is wrong? Does the change affect parts of the system the author may not be familiar with?

Test coverage and observability. Are the right things tested? If this change causes an incident, will the team be able to diagnose it from logs and metrics?

API and contract changes. Are downstream consumers handled? Is the migration path clear if this is breaking?

Security and privacy. Does this change introduce new attack surface, leak data, or violate privacy promises?

Style, naming, and minor refactors are best handled by linters and auto-formatters. A reviewer commenting on indentation is a reviewer not adding value.

Measuring reviewer load

A team that does not measure reviewer load distributes it unfairly by accident. The standard metric: PRs reviewed per engineer per week.

Engineer profileHealthy review load
Senior IC3 to 6 PRs per week
Mid IC2 to 5 PRs per week
Junior IC1 to 3 PRs per week
Tech lead4 to 8 PRs per week
Engineering manager1 to 3 PRs per week

Above these numbers, the engineer is doing review instead of their actual work. Below them, they are not getting enough exposure to the rest of the codebase to maintain context.

If one engineer is consistently above 10 PRs per week reviewed, the rotation is broken. Fix the rotation; do not normalise the overload.

What about AI-assisted review?

AI review tools (Copilot Workspace, Cursor’s review features, dedicated tools like Greptile and Coderabbit) have improved meaningfully in 2025-2026. Useful applications:

  • Style and convention checks before human review
  • Obvious bug surfacing (null pointer risks, error handling gaps)
  • Test coverage gap identification
  • Security pattern matching (hardcoded secrets, SQL injection risks)

What AI review still does poorly:

  • Design judgement
  • Intent verification (“does this match what the issue asked for”)
  • Cross-system blast radius assessment
  • Anything that requires understanding the team’s history with the code

Treat AI review as a more powerful linter that runs before human review. Do not use it as a replacement for human judgement on the things humans are still better at.

What we install on engagements

For a team scaling past 10 engineers:

  1. Define the SLA in a written engineering policy.
  2. Implement reviewer rotation via CODEOWNERS plus a load-balancing action.
  3. Adopt the comment taxonomy in PR template and team handbook.
  4. Set up a review-load dashboard showing per-engineer weekly load.
  5. Quarterly review of the metrics: are the SLAs being met, is the load balanced, where is the system breaking.

The work is structural, not cultural. The teams that get this right ship code faster without burning out the senior engineers. The teams that ignore it find themselves in the situation we walked into: PRs stalled, seniors burnt out, juniors disengaged.

Code review is a coordination problem with engineering solutions. Solve the structural part; the cultural part takes care of itself.

Frequently asked

Questions teams ask

Should every PR have two reviewers?

No. One thoughtful review beats two perfunctory ones. Two reviewers are useful for high-blast-radius changes (auth, billing, data migrations); routine changes need one. Mandating two everywhere produces approval cascades where each reviewer assumes the other will catch issues.

What about AI-assisted code review?

Useful as a first pass for style, obvious bugs and convention violations. Not a replacement for human judgement on intent, design and risk. Treat AI review as the equivalent of a linter, not a senior engineer.

How do we handle a PR that needs a major rework?

Pause it, have a 30-minute conversation, decide whether to land a smaller piece first or scrap and restart. Long PRs with 40+ comments rarely converge; the synchronous conversation is faster than another comment cycle.