System Design Interview: Design a Leaderboard / Top-K System

System Design: Top-K / Leaderboard (Heavy Hitters)

Top-K problems appear in many real-world systems: game leaderboards, trending hashtags, most-viewed videos, top search queries, most-active users. The challenge at scale: millions of score updates per second, millions of users querying the leaderboard, and requirements for both real-time and accurate rankings.

Requirements

Functional: Update a user’s score. Get global rank for a user. Get top-K users by score. Get users ranked around a specific user (neighbor ranks). Support millions of users.

Non-functional: Score updates in <10ms, leaderboard query in <50ms, 100K score updates/sec, 50K leaderboard reads/sec, real-time or near-real-time rankings.

Redis Sorted Sets are the industry-standard solution for leaderboards. Each member has a floating-point score; members are ordered by score. All operations are O(log N).

import redis

r = redis.Redis(host='localhost', port=6379)
LEADERBOARD_KEY = "game:leaderboard"

def update_score(user_id: str, score: float) -> None:
    """Set user's score (ZADD replaces existing score)."""
    r.zadd(LEADERBOARD_KEY, {user_id: score})

def increment_score(user_id: str, delta: float) -> float:
    """Increment user's score by delta. Returns new score."""
    return r.zincrby(LEADERBOARD_KEY, delta, user_id)

def get_rank(user_id: str) -> int:
    """Get 0-indexed rank (0 = highest). Returns None if not in leaderboard."""
    rank = r.zrevrank(LEADERBOARD_KEY, user_id)  # ZREVRANK: higher score = lower rank index
    return rank + 1 if rank is not None else None  # 1-indexed for display

def get_top_k(k: int) -> list[tuple]:
    """Get top K users with scores. Returns [(user_id, score), ...]"""
    return r.zrevrange(LEADERBOARD_KEY, 0, k-1, withscores=True)

def get_neighbors(user_id: str, n: int = 5) -> list[tuple]:
    """Get n users above and below user_id in the ranking."""
    rank = r.zrevrank(LEADERBOARD_KEY, user_id)
    if rank is None:
        return []
    start = max(0, rank - n)
    end = rank + n
    return r.zrevrange(LEADERBOARD_KEY, start, end, withscores=True)

Redis Sorted Set internals: Implemented as a skip list + hash table. Skip list provides O(log N) rank/range queries. Hash table provides O(1) score lookup by member. ZADD, ZRANK, ZINCRBY, ZRANGE are all O(log N). Stores ~50 bytes per member.

Capacity: 100M users × 50 bytes = 5 GB — fits in a single Redis instance. For 1B users: shard by user_id prefix across multiple Redis instances. Each instance handles a subset of users; global rank requires aggregating across shards.

Solution 2: Database + Rank Computation

For moderate scale (<10M users), a database approach works:

-- Get user's rank (PostgreSQL)
SELECT COUNT(*) + 1 AS rank
FROM leaderboard
WHERE score > (SELECT score FROM leaderboard WHERE user_id = $1);

-- Top K
SELECT user_id, score, RANK() OVER (ORDER BY score DESC) AS rank
FROM leaderboard
ORDER BY score DESC
LIMIT $1;

Pros: exact ranking, full SQL expressiveness (filter by country, time range). Cons: rank query is O(N) without partial index. Optimize with: CREATE INDEX idx_score ON leaderboard(score DESC). For “rank of user X”: index scan until score < X’s score — O(rank) not O(N) with a good query planner.

Solution 3: Count-Min Sketch (Approximate Heavy Hitters)

When you don’t need exact rankings and the key space is huge (all URLs ever visited, all search queries), use a Count-Min Sketch — a probabilistic data structure that estimates frequency with bounded error.

import hashlib

class CountMinSketch:
    def __init__(self, width: int = 1000, depth: int = 5):
        self.width = width
        self.depth = depth
        self.table = [[0] * width for _ in range(depth)]
        self.seeds = list(range(depth))

    def _hash(self, item: str, seed: int) -> int:
        h = hashlib.md5(f"{seed}:{item}".encode()).hexdigest()
        return int(h, 16) % self.width

    def increment(self, item: str) -> None:
        for i in range(self.depth):
            self.table[i][self._hash(item, i)] += 1

    def estimate(self, item: str) -> int:
        """Returns minimum count across all hash functions — least overcount."""
        return min(self.table[i][self._hash(item, i)] for i in range(self.depth))

Space: O(width * depth) regardless of number of distinct items. Error: with width=1000, depth=5, estimates are within 0.1% of true count with 99.9% probability. Used in: Twitter trending topics, Apache Kafka stream processing, network traffic analysis.

Sharded Leaderboard for Global Scale

def get_global_top_k(k: int, num_shards: int = 10) -> list[tuple]:
    """Aggregate top-K from all shards."""
    # Fan out to all shards in parallel
    import concurrent.futures
    with concurrent.futures.ThreadPoolExecutor() as pool:
        futures = [
            pool.submit(r_shard[i].zrevrange, LEADERBOARD_KEY, 0, k-1, withscores=True)
            for i in range(num_shards)
        ]
        all_results = [f.result() for f in futures]

    # Merge: combine all candidates, take global top-K
    import heapq
    all_entries = [item for shard_result in all_results for item in shard_result]
    top_k = heapq.nlargest(k, all_entries, key=lambda x: x[1])   # sort by score
    return top_k

Each shard fan-out returns its local top-K; the coordinator merges N*K entries and returns the global top-K. Correct because the global top-K must appear in at least one shard’s top-K list.

Interview Tips

  • Redis Sorted Sets are the canonical answer — mention ZADD, ZREVRANK, ZINCRBY, ZREVRANGE.
  • Separate score ingestion (high write throughput) from rank queries (read-heavy) — they have different scaling properties.
  • Time-windowed leaderboard: use separate sorted sets per time window (daily, weekly). On score update, update all relevant windows. Expire old windows with Redis TTL.
  • Ties: Redis ZREVRANK returns 0-indexed rank; two users with equal scores get adjacent ranks (not equal). For tie-handling: use composite score = score * 10^10 + (MAX_INT – join_timestamp) to ensure earlier joiners rank higher on ties.

🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Content Delivery

🏢 Asked at: Snap Interview Guide

🏢 Asked at: Twitter / X Interview Guide 2026: Real-Time Systems, Feed Architecture, and Platform Engineering

🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

🏢 Asked at: Coinbase Interview Guide

Scroll to Top