System Design Interview: Design a Multi-Region Database System

Why Multi-Region Databases?

A single-region database has two critical weaknesses: latency (users in Europe querying a US database experience 150ms+ RTT) and availability (a regional outage takes down the entire service). Multi-region databases address both by replicating data across geographically distributed data centers.

The fundamental challenge is the CAP theorem: in the presence of a network partition between regions, you must choose between consistency (all regions see the same data) and availability (all regions continue to serve requests). Most production systems choose eventual consistency with careful conflict resolution.

  • Cloudflare Interview Guide
  • Shopify Interview Guide
  • Databricks Interview Guide
  • Netflix Interview Guide
  • Coinbase Interview Guide
  • Stripe Interview Guide
  • Replication Topologies

    Single-Leader (Primary-Replica)

    One region is the primary (accepts writes); other regions are replicas (serve reads). Replication lag is the delay between a write on the primary and its appearance on replicas.

    • Pros: simple, no write conflicts, strong consistency possible with synchronous replication
    • Cons: all writes must go to the primary (adds latency for distant writers), primary failure requires failover (RPO = replication lag, RTO = failover time)
    • Best for: read-heavy workloads where writes can tolerate primary-region latency

    Multi-Leader (Active-Active)

    Multiple regions accept writes. Each leader replicates to others asynchronously.

    • Pros: low write latency from any region, continues operating if any region goes down
    • Cons: write conflicts are inevitable and must be resolved. Conflict resolution strategies: last-write-wins (LWW, by timestamp — risks losing data), application-level merging (CRDT-style), or routing writes for the same entity to the same region (avoiding conflicts via consistent hashing on entity ID)
    • Best for: globally distributed write-heavy workloads (Google Docs, collaborative tools)

    Leaderless (Dynamo-style)

    No designated leader. Any node accepts reads and writes. Uses quorum reads and writes: write to W nodes, read from R nodes, where W + R > N (total replicas). Sloppy quorums + hinted handoff handle node failures. Anti-entropy (Merkle trees) repair divergent replicas.

    • Pros: highest availability, no single point of failure
    • Cons: complex conflict resolution, eventual consistency by default
    • Examples: Cassandra (tunable quorums), DynamoDB, Riak

    Conflict Resolution Strategies

    • Last-Write-Wins (LWW): each write carries a timestamp; higher timestamp wins. Simple but risks data loss when clocks are skewed. Used by Cassandra (with caveats).
    • Version vectors: track a vector of (node_id → version) per object. Detects causality — if version A causally precedes B, B wins. If concurrent (neither precedes the other), surface the conflict to the application.
    • CRDTs: Conflict-free Replicated Data Types. Data structures with merge operations that are commutative, associative, and idempotent. Example: G-Counter (grow-only counter), OR-Set (observed-remove set). Operations always converge regardless of order. Best for: counters, sets, shopping carts.
    • Application-level resolution: surface conflicts to the application; it decides (e.g., merge shopping cart items rather than overwrite).

    Read-Your-Writes Consistency

    After a user submits a form, they expect to see their changes immediately — even with async replication. Strategies:

    1. Route reads to the primary for a short window (1 second) after a write
    2. Track the write timestamp in a client cookie; only read from a replica whose replication lag is before that timestamp
    3. Route the user to the same region (sticky sessions) — their write goes to that region’s leader and they read from the same region’s replica, which is always up-to-date

    Global Databases in Practice

    CockroachDB / Google Spanner

    Distributed SQL with strong consistency (serializable isolation) across regions. Uses TrueTime (GPS/atomic clocks) or HLC (Hybrid Logical Clocks) for global transaction ordering. Trade-off: cross-region transactions pay the latency of a Paxos consensus round across regions (typically 100–200ms for writes spanning two regions). Best for: global financial systems where consistency is non-negotiable.

    Cassandra (Multi-DC)

    Tunable consistency: LOCAL_QUORUM reads/writes from the local datacenter only (fast, but inter-DC replication is async). EACH_QUORUM requires quorum in every datacenter (strong consistency but slow). In practice: use LOCAL_QUORUM for most operations, accept eventual consistency between regions.

    DynamoDB Global Tables

    Multi-active replication across AWS regions. Uses last-write-wins on item level with DynamoDB Streams propagating changes. Writes to any region replicate to all others within seconds. Best for: AWS-native applications needing global low-latency reads and writes.

    Schema and Migration Strategy

    Multi-region schema changes are dangerous — a migration that runs region-by-region creates a window where different regions have different schemas. Follow expand-contract pattern:

    1. Expand: add new column (nullable, backward-compatible). Deploy everywhere.
    2. Migrate: backfill existing rows. Deploy writes to both old and new columns.
    3. Switch reads: once all regions read from new column, remove writes to old column.
    4. Contract: drop old column once all regions are fully migrated.

    Interview Framework

    When asked to design a multi-region database system, address these in order:

    1. Consistency requirement: strong (financial) vs. eventual (social feed). This determines your topology.
    2. Write pattern: write-heavy and global (multi-leader), or write-primary with global reads (single-leader)?
    3. Conflict resolution: LWW for simple cases, CRDTs for counters/sets, application-level for complex merges.
    4. Read-your-writes: sticky sessions or timestamp-based routing.
    5. Failure modes: region partition — what does the system do? Degrade to single-region? Accept stale reads? Reject writes?
    6. Schema migrations: expand-contract pattern, never breaking changes across region boundaries.
    Scroll to Top