System Design: Distributed Cache — Eviction Policies, Consistency, and Scaling (2025)

Why Distributed Caching?

A single application server’s in-memory cache is lost on restart and not shared across instances. A distributed cache (Redis, Memcached) provides a shared, persistent (optionally), low-latency key-value store accessible by all service instances. Key metrics: cache hit rate (aim for > 95% for hot data), cache latency (< 1ms p99 in the same region), memory efficiency (how well eviction policy utilizes available memory). The decision to cache: cache is beneficial when reads far outnumber writes, the same data is read repeatedly, the computation or database query is expensive, and the data does not need to be always perfectly fresh (tolerate some staleness).

Eviction Policies

LRU (Least Recently Used): evict the item that was accessed least recently. Works well when recent access predicts future access (temporal locality). Implemented with a doubly linked list + hashmap (O(1) get and put). Redis approximation: Redis does not track exact LRU order (too expensive). Instead, it samples N random keys and evicts the least recently used among the sample. LFU (Least Frequently Used): evict the item accessed least often. Better for skewed access patterns (popular items stay in cache even if not accessed recently). Harder to implement efficiently — Redis 4.0+ includes LFU. TTL-based eviction: every key has a time-to-live; evict on expiry. Combines with LRU/LFU for the remaining memory. Write-through vs. write-behind: write-through — update cache and DB synchronously on write (strong consistency, higher write latency). Write-behind (write-back) — update cache immediately, DB asynchronously (lower write latency, risk of data loss if cache crashes before DB write).

Cache Aside (Lazy Loading) Pattern

class CacheAsideService:
    def get_user(self, user_id: int) -> User:
        # 1. Check cache
        cached = self.redis.get(f"user:{user_id}")
        if cached:
            return User.from_json(cached)

        # 2. Cache miss: load from DB
        user = self.db.query("SELECT * FROM users WHERE id = %s", user_id)
        if not user:
            return None

        # 3. Populate cache with TTL
        self.redis.setex(
            f"user:{user_id}",
            3600,  # TTL: 1 hour
            user.to_json()
        )
        return user

    def update_user(self, user_id: int, data: dict) -> User:
        # Update DB first (source of truth)
        user = self.db.execute(
            "UPDATE users SET ... WHERE id = %s", user_id, data
        )
        # Invalidate cache (delete, not update -- avoid race conditions)
        self.redis.delete(f"user:{user_id}")
        return user

Why delete on update rather than update the cache? Race condition: if two threads simultaneously update the user, one may write a stale value into the cache after the other has already written the fresh value. Delete + lazy reload (on next read) avoids this race. This is the “cache-aside with invalidation” pattern.

Consistent Hashing for Cache Sharding

With N cache nodes, a naive modulo hash (key % N) remaps nearly all keys when a node is added or removed. Consistent hashing maps keys to a virtual ring (0 to 2^32). Each cache node owns a segment of the ring. On node addition/removal: only keys in the affected segment are remapped (approximately 1/N of all keys). Virtual nodes (vnodes): each physical node is represented by K virtual nodes spread around the ring. This improves load balancing (without vnodes, node placement can be uneven). K=150 virtual nodes per physical node is a common default (used by Cassandra). Implementation: sorted array of (hash, node) pairs; binary search to find the responsible node for a key.

Cache Stampede and Thundering Herd Prevention

Cache stampede: a popular key expires. Simultaneously, hundreds of requests find a cache miss and all query the database, overwhelming it. Solutions: Probabilistic early expiration: before a key expires, some requests probabilistically recompute it. Formula: if current_time + beta * staleness_factor > expiry_time: recompute now (with probability). This spreads the recomputation over time rather than all at once. Mutex/lock on miss: the first request to detect a cache miss acquires a distributed lock (Redis SETNX with TTL). Only the lock holder queries the DB and repopulates the cache. Other requests wait or return stale data if available. Background refresh: a background job refreshes cache entries before they expire (based on last access time). Popular items never expire — they are continuously refreshed while they have traffic.

See also: Netflix Interview Prep

See also: Cloudflare Interview Prep

See also: Uber Interview Prep

Scroll to Top