What Is a Configuration Management System?
A configuration management system stores key-value configuration data that services read at startup or at runtime, enabling feature flags, service discovery, and dynamic tuning without redeployment. Examples: etcd (Kubernetes backbone), Consul, AWS AppConfig, LaunchDarkly. Core challenges: strong consistency (every reader sees the same value), watch notifications (push updates to subscribers within milliseconds), and high availability despite distributed consensus overhead.
System Requirements
Functional
- Get/Put/Delete key-value pairs
- Watch: subscribe to changes on a key or prefix
- Transactions: compare-and-swap (CAS) for leader election and distributed locks
- Leases: keys auto-expire if the holder does not renew (ephemeral keys)
- RBAC: per-key and per-prefix access control
Non-Functional
- Strong consistency: linearizable reads (no stale reads)
- Watch latency: config changes propagated to subscribers in <100ms
- High availability: survive minority node failures (3-node or 5-node cluster)
- 10K reads/second, 100 writes/second (config is read-heavy)
Raft Consensus
etcd uses the Raft consensus algorithm to replicate writes across nodes. A cluster of N nodes tolerates (N-1)/2 failures: a 3-node cluster tolerates 1 failure; a 5-node cluster tolerates 2. Raft basics:
- One node is elected leader (via randomized election timeout)
- All writes go to the leader
- Leader appends the write to its log and replicates to followers
- Once a majority (quorum) acknowledge, the write is committed
- Committed writes are applied to the state machine (key-value store)
- Readers can read from leader (linearizable) or followers (stale)
# etcd client operations
etcdctl put /services/auth/host "10.0.1.5"
etcdctl get /services/auth/host
etcdctl watch /services/auth/ # watch prefix, get notified on any change
etcdctl put /locks/job1 "worker-3" --lease=60 # auto-expire in 60s
Watch Mechanism
Clients register watches on keys or prefixes. The server stores active watchers. When a write is committed, the server scans watchers matching the written key and pushes a WatchEvent to each subscriber over a gRPC stream. The event includes: key, new value, old value, revision number. Clients reconnect automatically on disconnect and resume from the last seen revision (no events missed).
Feature Flags with Config System
PUT /features/dark_mode {"enabled": true, "rollout_percent": 20}
# Application code
config = etcd.get("/features/dark_mode")
if config["enabled"] and hash(user_id) % 100 < config["rollout_percent"]:
show_dark_mode()
Percentage rollout without redeployment. Watch for changes: when the config is updated, the app receives a WatchEvent and hot-reloads the feature flag within 100ms. No restart required.
Service Discovery
Services register themselves on startup with a lease:
lease = etcd.grant_lease(ttl=30)
etcd.put(f"/services/auth/{instance_id}", json.dumps({"host": "10.0.1.5", "port": 8080}), lease=lease)
# Keepalive: renew lease every 10 seconds
etcd.keepalive(lease)
On crash: keepalive stops, lease expires in 30 seconds, the key is auto-deleted. Other services watching /services/auth/ receive a delete event and remove the dead instance from their load balancer. Kubernetes uses etcd exactly this way for pod registration.
Compare-and-Swap for Leader Election
# Only succeeds if the key does not currently exist
txn = etcd.transaction(
compare=[etcd.transactions.version("/election/leader") == 0],
success=[etcd.transactions.put("/election/leader", node_id, lease=my_lease)],
failure=[]
)
if txn.succeeded:
# This node is leader
CAS atomicity: only one node succeeds in creating the key. The winning node is the leader. When the leader crashes, its lease expires and the key is deleted. Other nodes retry the CAS and a new leader is elected.
Scaling Reads
Config systems are 100:1 read-heavy. Raft requires quorum only for writes. Reads: (1) Linearizable reads: route to leader, which confirms its term with a quorum heartbeat — highest consistency, higher latency. (2) Serializable reads: read from any follower — may be slightly stale, 3-5ms faster. (3) Client-side caching: cache config values in the application, invalidate on WatchEvent. Most apps cache config in memory and update on watch — reduces etcd reads to near zero.
Interview Tips
- Raft is the consensus algorithm — know the quorum rule: N nodes tolerate (N-1)/2 failures.
- Leases enable ephemeral keys without explicit deletion — key for service discovery and leader election.
- Watch + gRPC stream is the push mechanism — not polling.
- Client-side caching with watch invalidation is the production read pattern.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does Raft consensus guarantee that all nodes agree on the same configuration value?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Raft ensures agreement through a leader-based log replication protocol. All writes go to the elected leader. The leader appends the write to its own log and sends AppendEntries RPCs to all followers. Once a majority (quorum) of nodes have appended the entry to their logs and acknowledged, the leader considers it committed and applies it to the state machine. The leader then notifies followers of the commit. Only committed entries are served to clients. Why this prevents disagreement: a minority of nodes (e.g., 1 of 3) can fail without affecting consensus. A quorum (majority) is required to both commit a write and elect a new leader. Any two quorums overlap by at least one node — so any newly elected leader is guaranteed to have seen all committed entries from the previous term. This overlap property prevents two different values from being committed for the same log index, ensuring all nodes eventually apply the same writes in the same order.” }
},
{
“@type”: “Question”,
“name”: “How do watch subscriptions work without polling in an etcd-style system?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Watches use server-sent push over a persistent gRPC bidirectional stream. Client opens a Watch RPC: etcd.watch("/services/", revision=current_revision). The server registers this watcher in memory, keyed by the watched key/prefix. When any write to /services/* is committed, the server immediately pushes a WatchEvent to all matching watchers: {type: PUT, key, value, prev_value, mod_revision}. The client receives the event within milliseconds — no polling. Revision numbers are critical for reliability: if the client disconnects and reconnects, it resumes the watch from its last-seen revision, requesting all events since that revision. The server stores a compacted event history (configurable, e.g., last 10K events). Events older than the compaction window are gone; the client must re-read the current value and start a new watch. This is why etcd clients always re-read the key after establishing a new watch — to avoid the gap between the last-seen revision and the watch start.” }
},
{
“@type”: “Question”,
“name”: “How do leases enable ephemeral keys for service discovery and leader election?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “A lease is a time-to-live (TTL) object that any key can be attached to. When a service registers itself, it: (1) creates a lease with TTL=30s, getting back a lease_id; (2) writes its key with the lease_id attached. The key exists only while the lease is alive. The service must send KeepAlive RPCs every 10s to renew the lease TTL. If the service crashes: keepalives stop, the lease expires after 30s, and all keys attached to that lease are atomically deleted. Watchers of those keys receive DELETE events and update their view of available services. Why TTL instead of cleanup on crash: crash detection is unreliable in distributed systems (network partition looks the same as crash). A TTL-based lease provides eventual cleanup regardless of the failure mode. The TTL is the "worst case staleness" — after 30 seconds, the dead service's registration is gone. For leader election: the leader holds a lease; if it crashes, the lease expires and other candidates can CAS a new leader key into existence.” }
}
]
}