System Design: Top-K / Leaderboard (Heavy Hitters)
Top-K problems appear in many real-world systems: game leaderboards, trending hashtags, most-viewed videos, top search queries, most-active users. The challenge at scale: millions of score updates per second, millions of users querying the leaderboard, and requirements for both real-time and accurate rankings.
Requirements
Functional: Update a user’s score. Get global rank for a user. Get top-K users by score. Get users ranked around a specific user (neighbor ranks). Support millions of users.
Non-functional: Score updates in <10ms, leaderboard query in <50ms, 100K score updates/sec, 50K leaderboard reads/sec, real-time or near-real-time rankings.
Solution 1: Redis Sorted Sets (Recommended)
Redis Sorted Sets are the industry-standard solution for leaderboards. Each member has a floating-point score; members are ordered by score. All operations are O(log N).
import redis
r = redis.Redis(host='localhost', port=6379)
LEADERBOARD_KEY = "game:leaderboard"
def update_score(user_id: str, score: float) -> None:
"""Set user's score (ZADD replaces existing score)."""
r.zadd(LEADERBOARD_KEY, {user_id: score})
def increment_score(user_id: str, delta: float) -> float:
"""Increment user's score by delta. Returns new score."""
return r.zincrby(LEADERBOARD_KEY, delta, user_id)
def get_rank(user_id: str) -> int:
"""Get 0-indexed rank (0 = highest). Returns None if not in leaderboard."""
rank = r.zrevrank(LEADERBOARD_KEY, user_id) # ZREVRANK: higher score = lower rank index
return rank + 1 if rank is not None else None # 1-indexed for display
def get_top_k(k: int) -> list[tuple]:
"""Get top K users with scores. Returns [(user_id, score), ...]"""
return r.zrevrange(LEADERBOARD_KEY, 0, k-1, withscores=True)
def get_neighbors(user_id: str, n: int = 5) -> list[tuple]:
"""Get n users above and below user_id in the ranking."""
rank = r.zrevrank(LEADERBOARD_KEY, user_id)
if rank is None:
return []
start = max(0, rank - n)
end = rank + n
return r.zrevrange(LEADERBOARD_KEY, start, end, withscores=True)
Redis Sorted Set internals: Implemented as a skip list + hash table. Skip list provides O(log N) rank/range queries. Hash table provides O(1) score lookup by member. ZADD, ZRANK, ZINCRBY, ZRANGE are all O(log N). Stores ~50 bytes per member.
Capacity: 100M users × 50 bytes = 5 GB — fits in a single Redis instance. For 1B users: shard by user_id prefix across multiple Redis instances. Each instance handles a subset of users; global rank requires aggregating across shards.
Solution 2: Database + Rank Computation
For moderate scale (<10M users), a database approach works:
-- Get user's rank (PostgreSQL)
SELECT COUNT(*) + 1 AS rank
FROM leaderboard
WHERE score > (SELECT score FROM leaderboard WHERE user_id = $1);
-- Top K
SELECT user_id, score, RANK() OVER (ORDER BY score DESC) AS rank
FROM leaderboard
ORDER BY score DESC
LIMIT $1;
Pros: exact ranking, full SQL expressiveness (filter by country, time range). Cons: rank query is O(N) without partial index. Optimize with: CREATE INDEX idx_score ON leaderboard(score DESC). For “rank of user X”: index scan until score < X’s score — O(rank) not O(N) with a good query planner.
Solution 3: Count-Min Sketch (Approximate Heavy Hitters)
When you don’t need exact rankings and the key space is huge (all URLs ever visited, all search queries), use a Count-Min Sketch — a probabilistic data structure that estimates frequency with bounded error.
import hashlib
class CountMinSketch:
def __init__(self, width: int = 1000, depth: int = 5):
self.width = width
self.depth = depth
self.table = [[0] * width for _ in range(depth)]
self.seeds = list(range(depth))
def _hash(self, item: str, seed: int) -> int:
h = hashlib.md5(f"{seed}:{item}".encode()).hexdigest()
return int(h, 16) % self.width
def increment(self, item: str) -> None:
for i in range(self.depth):
self.table[i][self._hash(item, i)] += 1
def estimate(self, item: str) -> int:
"""Returns minimum count across all hash functions — least overcount."""
return min(self.table[i][self._hash(item, i)] for i in range(self.depth))
Space: O(width * depth) regardless of number of distinct items. Error: with width=1000, depth=5, estimates are within 0.1% of true count with 99.9% probability. Used in: Twitter trending topics, Apache Kafka stream processing, network traffic analysis.
Sharded Leaderboard for Global Scale
def get_global_top_k(k: int, num_shards: int = 10) -> list[tuple]:
"""Aggregate top-K from all shards."""
# Fan out to all shards in parallel
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as pool:
futures = [
pool.submit(r_shard[i].zrevrange, LEADERBOARD_KEY, 0, k-1, withscores=True)
for i in range(num_shards)
]
all_results = [f.result() for f in futures]
# Merge: combine all candidates, take global top-K
import heapq
all_entries = [item for shard_result in all_results for item in shard_result]
top_k = heapq.nlargest(k, all_entries, key=lambda x: x[1]) # sort by score
return top_k
Each shard fan-out returns its local top-K; the coordinator merges N*K entries and returns the global top-K. Correct because the global top-K must appear in at least one shard’s top-K list.
Interview Tips
- Redis Sorted Sets are the canonical answer — mention ZADD, ZREVRANK, ZINCRBY, ZREVRANGE.
- Separate score ingestion (high write throughput) from rank queries (read-heavy) — they have different scaling properties.
- Time-windowed leaderboard: use separate sorted sets per time window (daily, weekly). On score update, update all relevant windows. Expire old windows with Redis TTL.
- Ties: Redis ZREVRANK returns 0-indexed rank; two users with equal scores get adjacent ranks (not equal). For tie-handling: use composite score = score * 10^10 + (MAX_INT – join_timestamp) to ensure earlier joiners rank higher on ties.
🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Content Delivery
🏢 Asked at: Snap Interview Guide
🏢 Asked at: Twitter / X Interview Guide 2026: Real-Time Systems, Feed Architecture, and Platform Engineering
🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture
🏢 Asked at: Coinbase Interview Guide