System Design: Consistent Hashing Explained with Virtual Nodes

Consistent Hashing: System Design Deep Dive

Consistent hashing is a distributed hashing scheme that minimises key remapping when nodes are added or removed. It is the backbone of distributed caches, distributed databases, and load balancers. Almost every senior-level system design interview expects you to explain it precisely.

The Problem with Naive Hashing

With a cluster of N servers, the naive approach routes key k to server hash(k) % N. When a node is added or removed, N changes and nearly every key remaps to a different server — O(K) keys move, where K is the total number of keys. For a 10-node cache cluster that loses one node, ~90% of the keyspace remaps, causing a thundering-herd cache miss storm.

# Naive hashing – BAD for dynamic clusters
class NaiveHashRing:
    def __init__(self, nodes):
        self.nodes = list(nodes)

    def get_node(self, key):
        return self.nodes[hash(key) % len(self.nodes)]

    def add_node(self, node):
        self.nodes.append(node)
        # ~K*(N/(N+1)) keys now map to different nodes – catastrophic

    def remove_node(self, node):
        self.nodes.remove(node)
        # Again, ~K/N keys remap

The Ring Abstraction

Consistent hashing maps both nodes and keys onto a virtual ring of 2^32 positions (using a hash function like MD5 or xxHash). Each key is assigned to the first node clockwise from its position on the ring.

  • Add a node: Only keys between the new node and its predecessor migrate — O(K/N) keys.
  • Remove a node: Only keys owned by the removed node migrate to its successor — O(K/N) keys.
  • Lookup: Binary search over sorted ring positions — O(log N) per key.

Implementation: Basic Ring

import hashlib
import bisect

class ConsistentHashRing:
    def __init__(self):
        self._ring  = {}          # position -> node
        self._keys  = []          # sorted positions

    def _hash(self, value: str) -> int:
        return int(hashlib.md5(value.encode()).hexdigest(), 16)

    def add_node(self, node: str) -> None:
        pos = self._hash(node)
        self._ring[pos] = node
        bisect.insort(self._keys, pos)

    def remove_node(self, node: str) -> None:
        pos = self._hash(node)
        del self._ring[pos]
        idx = bisect.bisect_left(self._keys, pos)
        self._keys.pop(idx)

    def get_node(self, key: str) -> str | None:
        if not self._ring:
            return None
        pos = self._hash(key)
        idx = bisect.bisect(self._keys, pos) % len(self._keys)
        return self._ring[self._keys[idx]]

The Critical Problem: Hot Spots

With a small number of real nodes, positions on the ring are uneven. One node may own 40% of the ring while another owns 5%. This causes non-uniform load distribution — exactly the problem we want to avoid.

Virtual Nodes (vnodes)

Each physical node is mapped to V positions on the ring (virtual nodes). Instead of hash("node-A"), we hash hash("node-A#0"), hash("node-A#1"), …, hash("node-A#V-1"). With V=150, the load imbalance drops below 5% (empirically). Cassandra and DynamoDB both use virtual nodes.

class VNodeHashRing:
    def __init__(self, virtual_nodes: int = 150):
        self.V      = virtual_nodes
        self._ring  = {}
        self._keys  = []

    def _hash(self, value: str) -> int:
        return int(hashlib.md5(value.encode()).hexdigest(), 16)

    def add_node(self, node: str) -> None:
        for i in range(self.V):
            pos = self._hash(f"{node}#{i}")
            self._ring[pos] = node
            bisect.insort(self._keys, pos)

    def remove_node(self, node: str) -> None:
        for i in range(self.V):
            pos = self._hash(f"{node}#{i}")
            del self._ring[pos]
            idx = bisect.bisect_left(self._keys, pos)
            self._keys.pop(idx)

    def get_node(self, key: str) -> str | None:
        if not self._ring:
            return None
        pos = self._hash(key)
        idx = bisect.bisect(self._keys, pos) % len(self._keys)
        return self._ring[self._keys[idx]]

    def get_nodes(self, key: str, n: int) -> list[str]:
        """Return n distinct physical nodes for replication."""
        if not self._ring:
            return []
        pos   = self._hash(key)
        idx   = bisect.bisect(self._keys, pos) % len(self._keys)
        seen  = set()
        nodes = []
        for _ in range(len(self._keys)):
            node = self._ring[self._keys[idx % len(self._keys)]]
            if node not in seen:
                seen.add(node)
                nodes.append(node)
                if len(nodes) == n:
                    break
            idx += 1
        return nodes

Replication with Preference Lists

DynamoDB uses the get_nodes(key, N) pattern above — each key has a preference list of N distinct physical nodes. With N=3 and quorum writes (W=2) and reads (R=2), the system tolerates a single-node failure while achieving strong consistency guarantees (W+R > N).

Bounded Load Consistent Hashing

Google’s 2017 paper (Mirrokni et al.) adds a load cap: a node rejects a key if its current load exceeds (1+ε) × average_load. The key is then forwarded clockwise to the next node. This bounds the maximum load imbalance to factor (1+ε) ≈ 1.25, preventing hot-spot overload during traffic spikes.

class BoundedLoadRing(VNodeHashRing):
    def __init__(self, virtual_nodes=150, epsilon=0.25):
        super().__init__(virtual_nodes)
        self.epsilon    = epsilon
        self.load       = {}     # node -> current key count
        self.total_keys = 0

    def get_node_bounded(self, key: str) -> str | None:
        if not self._ring:
            return None
        self.total_keys += 1
        average = self.total_keys / max(len(set(self._ring.values())), 1)
        cap     = (1 + self.epsilon) * average
        pos     = self._hash(key)
        idx     = bisect.bisect(self._keys, pos) % len(self._keys)
        for _ in range(len(self._keys)):
            node = self._ring[self._keys[idx % len(self._keys)]]
            if self.load.get(node, 0) < cap:
                self.load[node] = self.load.get(node, 0) + 1
                return node
            idx += 1
        return None   # all overloaded — shouldn't happen

Real-World Usage

System Usage V-nodes
Apache Cassandra Token ring partitioning, virtual nodes for load balance 256
Amazon DynamoDB Preference list, sloppy quorum via consistent hashing yes
Redis Cluster 16384 hash slots assigned to nodes (simplified consistent hashing) slots
Memcached (libketama) Client-side ring for cache routing 150
Nginx upstream hash Sticky routing for session affinity configurable
Envoy proxy Ring hash load balancing policy configurable

Complexity Summary

Operation Basic Ring VNode Ring
Add node O(log N) insert, O(K/N) keys migrate O(V log VN) insert, O(K/N) migrate
Remove node O(log N) delete, O(K/N) keys migrate O(V log VN) delete, O(K/N) migrate
Get node (lookup) O(log N) O(log VN)
Load balance Poor (O(1/N) variance high) Good (~5% imbalance at V=150)

Interview Questions

How would you add weighted nodes to consistent hashing?

Assign a number of virtual nodes proportional to the node’s capacity. A node with 2x CPU/RAM gets 2V vnodes. This naturally routes more traffic to powerful nodes without any extra routing logic.

What happens during a node failure — how do you avoid data loss?

With replication (N=3), each key lives on 3 consecutive nodes. When node B fails, reads/writes fall through to the next node in the preference list (hinted handoff in Dynamo-style systems). When B recovers, hints are replayed to restore full replication.

How does Redis Cluster differ from pure consistent hashing?

Redis Cluster uses 16384 fixed hash slots assigned to master nodes. This is equivalent to consistent hashing with V=16384/num_nodes virtual nodes, but the slot assignments are explicit and stored in cluster state, not computed on-the-fly. Hot slot migration is manual (CLUSTER SETSLOT).

How do you detect key skew (hot keys) on the ring?

Track per-node request rates with exponentially weighted moving averages. If any node exceeds 2× average rate, emit a HOTSPOT alert. For hot keys specifically, add a key-level frequency counter (Count-Min Sketch) and replicate hot keys to extra nodes.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is consistent hashing and why is it used in distributed systems?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Consistent hashing maps both servers and keys onto a virtual ring. Each key is owned by the first server clockwise from its ring position. When a server is added or removed, only O(K/N) keys migrate (where K=keys, N=nodes) instead of remapping nearly all keys as in modular hashing. This is critical for distributed caches and databases that must tolerate node churn without thundering-herd cache miss storms.”
}
},
{
“@type”: “Question”,
“name”: “What are virtual nodes and why does consistent hashing need them?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “With a small number of physical nodes, their ring positions are uneven, causing one node to own 40% of keyspace while another owns 5%. Virtual nodes solve this: each physical node is mapped to V positions (hash(“node-A#0”), u2026, hash(“node-A#V-1″)). With V=150, load imbalance drops below 5%. Cassandra uses 256 vnodes; libketama uses 150.”
}
},
{
“@type”: “Question”,
“name”: “How does DynamoDB use consistent hashing for replication?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “DynamoDB hashes each key to a ring position, then builds a preference list of N distinct physical nodes starting from that position. Writes go to all N nodes (quorum W=2 of N=3). This means each key has 3 replicas on 3 different nodes. When a node fails, reads/writes fall through to the next node in the preference list, and hinted handoff queues repairs for when the node returns.”
}
},
{
“@type”: “Question”,
“name”: “What is the time complexity of consistent hashing lookup?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “O(log(V*N)) where V=virtual nodes per server and N=server count. The ring is stored as a sorted array of positions; lookup is a binary search to find the successor position. For V=150, N=10: log(1500) u2248 11 comparisons per lookup.”
}
},
{
“@type”: “Question”,
“name”: “How does Redis Cluster differ from consistent hashing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Redis Cluster uses 16,384 fixed hash slots. Each key is assigned to slot = CRC16(key) % 16384. Slots are statically distributed to master nodes. This is similar to consistent hashing with a very large number of virtual nodes, but slot-to-node mapping is stored in explicit cluster state rather than computed on-the-fly. Hot slot migration is manual via CLUSTER SETSLOT.”
}
}
]
}

Asked at: Uber Interview Guide

Asked at: Databricks Interview Guide

Asked at: Cloudflare Interview Guide

Asked at: Netflix Interview Guide

Asked at: Twitter/X Interview Guide

Scroll to Top