What Is a Config Management Service?
A configuration management service stores application settings that need to change without a code deployment: feature flags (enable/disable features), A/B test parameters, rate limits, timeout values, and system thresholds. Used by Netflix (Archaius), Facebook (GateKeeper), LaunchDarkly, and HashiCorp Consul.
Core Features
Feature flags: boolean toggles to enable/disable code paths. Enable for 1% of users (canary), then roll out to 100% over days. If issues arise, disable instantly without a deploy. Dynamic config: numeric/string values (timeout_ms=500, max_retries=3) changeable at runtime. Services poll or subscribe for updates. Targeting: evaluate flags differently per user (user_id, country, plan tier, employee vs external). Audit log: every config change recorded with who changed it, when, and what the previous value was.
Data Model
Config (flag): flag_id, name, type (BOOLEAN, STRING, NUMBER, JSON), default_value, description, owner_team, created_at. ConfigRule: rule_id, flag_id, priority, condition (JSON: {“country”: “US”, “plan”: “premium”}), value. ConfigVersion: version_id, flag_id, changed_by, changed_at, previous_value, new_value. Rules are evaluated in priority order; first matching rule wins; no match uses default_value.
Rule Evaluation
class FlagEvaluator:
def evaluate(self, flag_name: str, context: dict) -> Any:
flag = self.cache.get(f"flag:{flag_name}")
rules = self.cache.get(f"rules:{flag_name}") # sorted by priority
for rule in rules:
if self.matches(rule.condition, context):
return rule.value
return flag.default_value
def matches(self, condition: dict, context: dict) -> bool:
for key, expected in condition.items():
if isinstance(expected, list):
if context.get(key) not in expected: return False
elif isinstance(expected, dict) and "gte" in expected:
if context.get(key, 0) < expected["gte"]: return False
else:
if context.get(key) != expected: return False
return True
Config Distribution Architecture
Storage: PostgreSQL for source of truth (all flags, rules, history). Cache: Redis for fast reads — each flag serialized as a JSON string. Services: client SDKs poll Redis every 10 seconds or subscribe to change events. Change propagation: when a flag is updated in PostgreSQL, the config service publishes an invalidation event to Redis Pub/Sub. All SDK instances subscribed to Pub/Sub receive the event and refresh the affected flag from Redis. End-to-end propagation latency: under 1 second.
Safe Rollout
Percentage rollout: assign each user to a consistent bucket (0-99) using hash(flag_name + user_id) % 100. Rule: if bucket < rollout_percentage, return the new value. This is stable — the same user always gets the same bucket, so they do not flip between old and new behavior during a rollout. Canary deployment: start at 1%, monitor error rates and latency for 30 minutes, then 5%, 10%, 25%, 50%, 100%. Automatic rollback: if error rate exceeds threshold (e.g., 5x baseline), automatically disable the flag and alert the team.
Consistency and Caching
Client SDKs cache flag values in process memory (fastest, no network call per evaluation). Background thread refreshes cache on a configurable interval (default 10s). On startup: load all flags from Redis before serving traffic (fail open vs fail closed is configurable per flag). For flags controlling billing or security: lower TTL (1-5s) or synchronous evaluation from Redis. Consistency trade-off: with 10s polling, a flag change takes up to 10s to propagate. For immediate kills (security incidents), use the Pub/Sub invalidation path which propagates in under 1 second.
Operational Features
Scheduled rollouts: “enable at 9am on launch day” — store scheduled_at on the rule, evaluate based on current time. Kill switch: a special always-off override rule with maximum priority. Dependency tracking: flag A may depend on flag B — evaluate dependencies in topological order. Dashboard: real-time flag state, rollout percentage, and error rate correlation.
Asked at: Netflix Interview Guide
Asked at: Cloudflare Interview Guide
Asked at: Databricks Interview Guide
Asked at: Atlassian Interview Guide