What Is a Configuration Management System?
A configuration management system stores key-value configuration data that services read at startup or at runtime, enabling feature flags, service discovery, and dynamic tuning without redeployment. Examples: etcd (Kubernetes backbone), Consul, AWS AppConfig, LaunchDarkly. Core challenges: strong consistency (every reader sees the same value), watch notifications (push updates to subscribers within milliseconds), and high availability despite distributed consensus overhead.
System Requirements
Functional
- Get/Put/Delete key-value pairs
- Watch: subscribe to changes on a key or prefix
- Transactions: compare-and-swap (CAS) for leader election and distributed locks
- Leases: keys auto-expire if the holder does not renew (ephemeral keys)
- RBAC: per-key and per-prefix access control
Non-Functional
- Strong consistency: linearizable reads (no stale reads)
- Watch latency: config changes propagated to subscribers in <100ms
- High availability: survive minority node failures (3-node or 5-node cluster)
- 10K reads/second, 100 writes/second (config is read-heavy)
Raft Consensus
etcd uses the Raft consensus algorithm to replicate writes across nodes. A cluster of N nodes tolerates (N-1)/2 failures: a 3-node cluster tolerates 1 failure; a 5-node cluster tolerates 2. Raft basics:
- One node is elected leader (via randomized election timeout)
- All writes go to the leader
- Leader appends the write to its log and replicates to followers
- Once a majority (quorum) acknowledge, the write is committed
- Committed writes are applied to the state machine (key-value store)
- Readers can read from leader (linearizable) or followers (stale)
# etcd client operations
etcdctl put /services/auth/host "10.0.1.5"
etcdctl get /services/auth/host
etcdctl watch /services/auth/ # watch prefix, get notified on any change
etcdctl put /locks/job1 "worker-3" --lease=60 # auto-expire in 60s
Watch Mechanism
Clients register watches on keys or prefixes. The server stores active watchers. When a write is committed, the server scans watchers matching the written key and pushes a WatchEvent to each subscriber over a gRPC stream. The event includes: key, new value, old value, revision number. Clients reconnect automatically on disconnect and resume from the last seen revision (no events missed).
Feature Flags with Config System
PUT /features/dark_mode {"enabled": true, "rollout_percent": 20}
# Application code
config = etcd.get("/features/dark_mode")
if config["enabled"] and hash(user_id) % 100 < config["rollout_percent"]:
show_dark_mode()
Percentage rollout without redeployment. Watch for changes: when the config is updated, the app receives a WatchEvent and hot-reloads the feature flag within 100ms. No restart required.
Service Discovery
Services register themselves on startup with a lease:
lease = etcd.grant_lease(ttl=30)
etcd.put(f"/services/auth/{instance_id}", json.dumps({"host": "10.0.1.5", "port": 8080}), lease=lease)
# Keepalive: renew lease every 10 seconds
etcd.keepalive(lease)
On crash: keepalive stops, lease expires in 30 seconds, the key is auto-deleted. Other services watching /services/auth/ receive a delete event and remove the dead instance from their load balancer. Kubernetes uses etcd exactly this way for pod registration.
Compare-and-Swap for Leader Election
# Only succeeds if the key does not currently exist
txn = etcd.transaction(
compare=[etcd.transactions.version("/election/leader") == 0],
success=[etcd.transactions.put("/election/leader", node_id, lease=my_lease)],
failure=[]
)
if txn.succeeded:
# This node is leader
CAS atomicity: only one node succeeds in creating the key. The winning node is the leader. When the leader crashes, its lease expires and the key is deleted. Other nodes retry the CAS and a new leader is elected.
Scaling Reads
Config systems are 100:1 read-heavy. Raft requires quorum only for writes. Reads: (1) Linearizable reads: route to leader, which confirms its term with a quorum heartbeat — highest consistency, higher latency. (2) Serializable reads: read from any follower — may be slightly stale, 3-5ms faster. (3) Client-side caching: cache config values in the application, invalidate on WatchEvent. Most apps cache config in memory and update on watch — reduces etcd reads to near zero.
Interview Tips
- Raft is the consensus algorithm — know the quorum rule: N nodes tolerate (N-1)/2 failures.
- Leases enable ephemeral keys without explicit deletion — key for service discovery and leader election.
- Watch + gRPC stream is the push mechanism — not polling.
- Client-side caching with watch invalidation is the production read pattern.