System Design Fundamentals: CAP Theorem, Consistency, and Availability
Understanding distributed systems fundamentals is essential for any system design interview. These concepts explain the trade-offs behind every architectural decision. Expected knowledge at all FAANG/MANGA companies and any role involving distributed systems.
CAP Theorem
In a distributed system, you can only guarantee two of three properties simultaneously:
- Consistency (C): every read receives the most recent write or an error. All nodes see the same data at the same time.
- Availability (A): every request receives a response (not an error), but the response might not be the most recent data.
- Partition Tolerance (P): the system continues operating even when network messages are dropped between nodes.
Since network partitions are a reality in distributed systems, the real choice is between C and A during a partition:
- CP systems: HBase, Zookeeper, MongoDB (with majority write concern), Etcd. Return error when partition occurs rather than stale data.
- AP systems: Cassandra, CouchDB, DynamoDB, Riak. Return potentially stale data rather than error.
PACELC Model (Extension of CAP)
CAP only considers partition scenarios. PACELC also considers the normal case:
- If Partition: choose between Availability vs Consistency (like CAP)
- Else (normal operation): choose between Latency vs Consistency
DynamoDB: PA/EL – available on partition, low latency normally (eventual consistency). Spanner: PC/EC – consistent on partition, consistent normally (higher latency, uses TrueTime).
Consistency Models
Strong Consistency
After a write completes, all subsequent reads return the new value. Like a single-node database. Expensive: requires synchronous replication or consensus (Raft/Paxos). Used in: financial systems, configuration stores (Etcd/Zookeeper).
Eventual Consistency
If no new writes occur, all replicas will eventually converge to the same value. Reads may see stale data in the meantime. High availability and low latency. Used in: DNS, email, social media counters, shopping carts (Amazon Dynamo paper).
Causal Consistency
Operations that are causally related are seen in the same order by all nodes. If you write A then write B (B depends on A), all nodes see A before B. MongoDB 4.0+ multi-document sessions, Facebook Cassandra causal reads.
Read Your Own Writes
After a write, the same client always reads the new value. Others may still see old value. Implemented by: routing reads to the primary, using session affinity, or version vectors. Important for user experience (post a comment, immediately see it).
Replication Strategies
Single Leader (Master-Replica)
All writes go to leader, replicated to followers. Reads from followers (eventual consistency) or leader (strong consistency). Failover: promote a follower. Used by: PostgreSQL, MySQL, MongoDB. Simple but leader is a bottleneck.
Multi-Leader
Multiple nodes accept writes. Useful for multi-datacenter deployments (each DC has a leader). Conflict resolution needed (last-write-wins, vector clocks, CRDTs). Used by: CouchDB, Google Docs (CRDT).
Leaderless (Dynamo-style)
Any node accepts reads and writes. Quorum: write to W nodes, read from R nodes. If W + R > N (replication factor), reads see at least one up-to-date write. DynamoDB, Cassandra, Riak use this model.
N = replication factor (e.g., 3)
W = write quorum (e.g., 2)
R = read quorum (e.g., 2)
W + R > N ensures strong consistency
W = 1, R = 1: maximum availability, eventual consistency
ACID vs BASE
ACID (traditional databases)
- Atomicity: all-or-nothing transactions
- Consistency: data always satisfies integrity constraints
- Isolation: concurrent transactions do not interfere
- Durability: committed data survives failures
BASE (NoSQL systems)
- Basically Available: system is available even under failure
- Soft state: state may change without input (replication catching up)
- Eventual consistency: will become consistent given time
Consensus Algorithms
- Paxos: theoretical foundation. Hard to implement correctly. Used in Google Chubby, Apache Zookeeper (ZAB variant).
- Raft: designed for understandability. Leader election + log replication. Used in: Etcd, CockroachDB, TiKV, Consul. Most common in new systems.
Consensus used for: leader election, distributed locks, configuration stores, replicated state machines.
Interview Application
When asked about a system design, state the consistency requirement upfront:
- Financial transactions, seat booking: strong consistency required
- Social media likes, view counts, shopping carts: eventual consistency acceptable
- User settings, profile reads: read-your-own-writes minimum
- Leaderboard rankings: eventual consistency with bounded staleness (seconds)
Key Terms to Know
- Write-ahead log (WAL): durability mechanism, changes written to log before applying
- Hinted handoff: Dynamo-style stores buffer writes for temporarily unavailable nodes
- Read repair: detect and fix stale replicas during read operations
- Anti-entropy: background process (Merkle tree comparison) to sync replicas
- Vector clock / Version vector: track causality for conflict detection in multi-leader systems
Companies that ask this: Cloudflare Interview Guide 2026: Networking, Edge Computing, and CDN Design
Companies that ask this: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence
Companies that ask this: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
Companies that ask this: Coinbase Interview Guide
Companies that ask this: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems
Companies that ask this: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture