What Is a Code Review Platform?
A code review platform allows developers to propose code changes (pull requests), receive line-by-line feedback from peers, run automated checks, and merge into the main branch. Examples: GitHub Pull Requests, GitLab Merge Requests, Gerrit. Core challenges: diff computation at scale, comment threading, CI integration, merge conflict detection, and real-time collaboration on reviews.
System Requirements
Functional
- Create pull requests from a feature branch to a base branch
- Display unified and split diffs of changed files
- Comment on specific lines; reply to comments (threads)
- Request reviews from specific users
- CI status checks (tests, linting) required before merge
- Merge the PR (merge commit, squash, rebase)
Non-Functional
- 100M repositories, 10M PRs/day created
- Diff computation for large PRs (10K line changes) in <2s
- Review comments searchable and persistent
Core Data Model
repositories: id, owner_id, name, default_branch, visibility
pull_requests: id, repo_id, author_id, title, body, head_sha,
base_branch, status, created_at, merged_at
pr_reviews: id, pr_id, reviewer_id, state(approved/changes_requested/commented)
review_comments: id, pr_id, reviewer_id, file_path, line_number,
diff_hunk, body, thread_id, created_at
ci_checks: id, pr_id, check_name, status, url, started_at, completed_at
Diff Computation
Computing a diff between two commits (head_sha vs base_sha) involves:
- Retrieve the file trees for both commits from Git object store
- Find changed files: files present in one tree but not the other, or with different blob hashes
- For each changed file: run Myers diff algorithm to produce a unified diff (O(ND) where N is lines and D is edit distance)
For large PRs with many files: compute diffs in parallel across workers. Cache computed diffs keyed by (base_sha, head_sha, file_path) — diffs are immutable once committed. Serve from cache on subsequent page loads.
Storing Comments with Diff Context
A review comment is anchored to a specific line in a specific file at a specific commit. The challenge: if the author pushes a new commit, lines shift. Store the diff_hunk (the surrounding context lines) with each comment. When rendering the comment on the new diff, match the diff_hunk to find the current position of the comment. If the hunk cannot be found (the code was deleted), mark the comment as outdated.
CI Check Integration
When a PR is created or updated (new commit pushed): publish a “pr_updated” event to Kafka. CI runners consume this event and start the configured checks (build, test, lint). Each check reports status via a webhook: POST /repos/{repo}/statuses/{sha}. The platform updates ci_checks table and re-evaluates merge eligibility (all required checks must be passing). Required checks are configured per branch protection rules.
Merge Operations
- Merge commit: creates a merge commit preserving full history. git merge –no-ff.
- Squash merge: combines all PR commits into one. Cleaner history for feature branches with many “WIP” commits.
- Rebase merge: replays PR commits on top of base branch. Linear history, but SHA changes (commits are re-created).
On merge: acquire a distributed lock for the repository (prevent concurrent merges causing conflicts). Check for merge conflicts (merge-base three-way merge). If clean: perform the merge, update branch pointer, delete the feature branch (optional). Release lock.
Merge Queue
When many PRs target the same branch simultaneously, each one must be tested against the others’ changes to prevent breakage. A merge queue: approved PRs enter the queue; the system batches them, runs CI on the combined batch, and merges atomically if green. This avoids the “works individually, breaks together” problem. GitHub’s merge queue uses this model.
Notifications and Review Requests
When a reviewer is requested: notify via email and in-app notification. When a review is submitted: notify the PR author. Track unread review comments per user. Use CODEOWNERS file to automatically request reviews from the team owning changed files.
Interview Tips
- Diff caching by (base_sha, head_sha, file_path) is the key performance optimization.
- Comment anchoring to diff_hunk handles the “lines shift after new commits” problem.
- Merge lock prevents concurrent merges to the same branch.
- CI as a webhook integration keeps the PR platform decoupled from CI systems.