System Design: Sharding and Data Partitioning Explained

Sharding and Data Partitioning: System Design Deep Dive

Sharding (horizontal partitioning) splits a large dataset across multiple database nodes so no single node holds all the data. It is the primary scaling strategy for write-heavy workloads that exceed single-node capacity.

Why Shard?

  • Write throughput: a single PostgreSQL master handles ~10K-50K writes/sec. Sharding multiplies this linearly.
  • Storage: a 10TB dataset across 10 shards = 1TB per node, fitting in fast SSD tiers.
  • Read parallelism: scatter-gather queries fan out to all shards and merge results.

Partitioning Strategies

Range Partitioning

Assign rows to shards based on value ranges of the shard key (e.g., user_id 1-1M → shard 0, 1M-2M → shard 1). Simple to implement and supports range scans efficiently. Risk: hot shards if data is skewed (e.g., new users all land on the latest shard).

Hash Partitioning

shard = hash(key) % N. Distributes load uniformly when keys are random. Range scans require hitting all shards. Adding shards requires rehashing — mitigated by consistent hashing.

Directory-Based Partitioning

A lookup service maps each key to its shard. Maximum flexibility (move individual keys), but the directory is a single point of failure and a bottleneck. Used by DynamoDB’s partition metadata layer.

Choosing a Shard Key

The shard key determines everything: access patterns, hot spots, join complexity.

  • High cardinality: enough distinct values to distribute across shards (user_id: good; country_code: bad for 10+ shards).
  • Query alignment: most queries should include the shard key to avoid scatter-gather. For a social app, shard by user_id so “get user’s posts” hits one shard.
  • Write distribution: avoid monotonically increasing keys (timestamps, auto-increment IDs) with hash partitioning — they create hot trailing shards. Add a random prefix or use UUID v4.

Cross-Shard Queries and Joins

Joins across shards require application-level aggregation. Strategies:

  • Denormalize: embed frequently joined fields into the primary entity (store username in every post row).
  • Scatter-gather: fan out query to all shards, merge results in the application layer. Works for aggregates (COUNT, SUM) but is expensive.
  • Secondary index shards: maintain a separate index shard that maps secondary keys (email → user_id). Two-hop reads: index shard → data shard.

Hotspot Mitigation


# Problem: celebrity user with 10M followers causes hot shard
# Solution 1: Write splitting — distribute writes across N virtual shards
def shard_key_for_celebrity(user_id, num_splits=10):
    # Writes spread across: user_id#0, user_id#1, ..., user_id#9
    return f"{user_id}#{random.randint(0, num_splits - 1)}"

# Reads must aggregate across all splits
def get_follower_count(user_id, num_splits=10):
    counts = [get_shard(f"{user_id}#{i}").follower_count for i in range(num_splits)]
    return sum(counts)

# Solution 2: Read replicas per shard — scale reads without touching write path

Resharding

When a shard grows too large or traffic skews, you must reshard. Approaches:

  • Double-write: write to old and new shard layout simultaneously, backfill old data, cut over reads, stop writing to old layout.
  • Consistent hashing with virtual nodes: add a new node; it takes a fraction of keys from all existing nodes. Only O(K/N) keys move.
  • Logical shards: provision more logical shards than physical nodes (e.g., 1024 logical shards on 8 nodes = 128 per node). Resharding becomes rebalancing logical shards — no data movement for range-split, just metadata update.

Sharding in Real Systems

System Strategy Notes
MongoDB Hash or range on shard key Config server stores chunk metadata; mongos routes queries
Cassandra Consistent hashing (token ring) 256 virtual nodes per physical node; RF=3
MySQL (Vitess) Hash on primary key Vitess adds sharding proxy layer above MySQL; resharding online
DynamoDB Hash partition + sort key Automatic partition split when a partition exceeds 10GB or 3000 RCU/1000 WCU
Redis Cluster 16384 fixed hash slots CRC16(key) % 16384; slot migration online via CLUSTER SETSLOT

Interview Tips

  • Always ask about the primary query pattern before choosing a shard key.
  • State the tradeoff: hash = uniform distribution but no range scans; range = efficient scans but hotspot risk.
  • Mention consistent hashing when discussing adding/removing nodes.
  • For social platforms: shard users by user_id, posts by (user_id, created_at) to keep a user’s posts together.

Asked at: Uber Interview Guide

Asked at: Netflix Interview Guide

Asked at: Databricks Interview Guide

Asked at: Stripe Interview Guide

Asked at: Twitter/X Interview Guide

Scroll to Top