System Design: Distributed Counters, Leaderboards, and Real-Time Analytics

The Problem: Counting at Scale

Incrementing a counter sounds trivial — until you need to count 10 million page views per second across 100 servers without a central bottleneck. Naive approach: every event writes UPDATE counters SET count = count + 1 WHERE id = X. At high throughput, database row contention becomes the bottleneck. At Twitter scale (500M tweets/day = 5700/sec), a single counter update per tweet-like would saturate a single database row.

Approach 1: Redis INCR

Redis INCR is an atomic O(1) operation. For read-heavy metrics (like view counts), Redis handles millions of increments per second. Persist asynchronously to the database every N seconds or N events.

import redis
r = redis.Redis()

def increment_view(post_id: str, user_id: str):
    # Deduplicate: only count unique views
    already_viewed = r.sadd(f"views:{post_id}:users", user_id)
    if already_viewed:
        r.incr(f"views:{post_id}:count")

def get_view_count(post_id: str) -> int:
    return int(r.get(f"views:{post_id}:count") or 0)

# Periodic flush to DB (every 60 seconds)
def flush_counters():
    for key in r.scan_iter("views:*:count"):
        post_id = key.decode().split(':')[1]
        count = int(r.getset(key, 0) or 0)
        if count > 0:
            db.execute("UPDATE posts SET view_count = view_count + %s WHERE id = %s",
                       [count, post_id])

Approach 2: Sharded Counters

For ultra-high-traffic counters (YouTube view count, Twitter like count), shard the counter into N shards. Each increment goes to a random shard. Read requires summing all shards.

import random

NUM_SHARDS = 16

def increment(counter_name: str):
    shard = random.randint(0, NUM_SHARDS - 1)
    r.incr(f"counter:{counter_name}:shard:{shard}")

def get_count(counter_name: str) -> int:
    keys = [f"counter:{counter_name}:shard:{i}" for i in range(NUM_SHARDS)]
    values = r.mget(keys)
    return sum(int(v or 0) for v in values)

Trade-off: reads are O(N_SHARDS). With 16 shards, writes distribute 16x more, eliminating hot key contention. Reads require 16 operations but can be batched with MGET (one network round-trip).

Approach 3: Lambda Architecture

For analytics that need both real-time and historical accuracy:

  • Speed layer: Kafka + Flink/Spark Streaming. Process events in near-real-time windows (1-minute, 5-minute). Store in Redis. Low latency, approximate accuracy.
  • Batch layer: Kafka + Spark batch jobs process complete historical data daily. Store in data warehouse (BigQuery, Snowflake, Redshift). High accuracy, high latency.
  • Serving layer: merge batch + speed layer results. Batch provides accurate historical; speed provides recent events not yet in batch.

Approach 4: Time-Series Databases

For metrics with a time dimension (requests/minute, error rate over time), use a time-series database:

  • InfluxDB: designed for time-series, supports downsampling (retain 1-minute resolution for 7 days, 1-hour resolution for 1 year), native aggregations.
  • Prometheus: pull-based metrics collection. Scraped from services every 15 seconds. PromQL for querying. Grafana for visualization.
  • TimescaleDB: PostgreSQL extension for time-series. Full SQL support, automatic partitioning by time.

Real-Time Leaderboards

# Redis Sorted Set: O(log n) insert, O(log n) rank query
def add_score(leaderboard: str, user_id: str, score: float):
    r.zadd(leaderboard, {user_id: score})

def increment_score(leaderboard: str, user_id: str, delta: float):
    r.zincrby(leaderboard, delta, user_id)

def get_top_k(leaderboard: str, k: int = 10):
    # ZREVRANGEBYSCORE: highest scores first, with scores
    return r.zrevrange(leaderboard, 0, k - 1, withscores=True)

def get_rank(leaderboard: str, user_id: str) -> int:
    rank = r.zrevrank(leaderboard, user_id)  # 0-indexed, None if not found
    return (rank + 1) if rank is not None else None

Sliding Window Rate Counter

def is_rate_limited(user_id: str, window_seconds: int = 60, max_requests: int = 100) -> bool:
    now = time.time()
    key = f"rate:{user_id}"
    # Remove events older than the window
    r.zremrangebyscore(key, '-inf', now - window_seconds)
    # Count events in window
    count = r.zcard(key)
    if count >= max_requests:
        return True  # rate limited
    # Add current event
    r.zadd(key, {str(now): now})
    r.expire(key, window_seconds)
    return False

Exactly-Once Event Counting

At-least-once delivery (Kafka default) means events may be counted twice on reprocessing. For exact counts:

  • Use an event ID (UUID) per event. Before counting, check if event_id is already in a “processed” set (Bloom filter for memory efficiency, Redis SET for accuracy). Only count if not seen before.
  • For session-based deduplication (count unique views, not total views), use a per-user per-item set: SADD views:{post_id}:unique {user_id} returns 0 if already in set.
  • HyperLogLog: Redis PFADD/PFCOUNT for approximate unique counts with 0.81% error at O(1) memory per counter — perfect for “unique visitors” type metrics.

Interview Questions

Q: How would you design YouTube’s view count system?

Views are write-heavy (each play increments the counter). Deduplicate within a session window (same user within 24 hours counts once). Architecture: (1) Player sends view event to a Kafka topic. (2) A Flink job deduplicates by {video_id, user_id, day} and increments a sharded Redis counter (16 shards per video). (3) Every 5 minutes, a flush job reads Redis shards, sums them, and applies the delta to the database with an atomic ADD. (4) View counts are read from Redis for real-time display; the database is the authoritative store for historical analytics.

Asked at: Twitter/X Interview Guide

Asked at: Netflix Interview Guide

Asked at: LinkedIn Interview Guide

Asked at: Databricks Interview Guide

Asked at: Snap Interview Guide

Scroll to Top