Q: How do you audit and recover from a bad config change?

Every config change is written to an immutable audit log: who changed it, when, the previous value, and the new value. To recover from a bad change: one-click rollback in the dashboard restores the previous version (looks up ConfigVersion, updates the flag to previous_value, logs the rollback event). For automated rollback: integrate with monitoring -- if error rate increases by 3x within 5 minutes of a flag change, automatically disable the flag and alert the team. The audit log is append-only -- never delete entries, even for rollbacks. This provides a complete history for post-incident review. Store audit logs in a separate table with restricted write access (only the config service can write, not application code).

Question 1

What is a feature flag and why use it instead of a code deploy?

Accepted Answer

A feature flag (feature toggle) is a configuration value that enables or disables a code path at runtime without requiring a new deployment. Benefits: (1) Instant rollback -- if a new feature causes issues, flip the flag off in seconds vs 10-30 minutes for a rollback deploy. (2) Gradual rollout -- enable for 1% of users, monitor for errors, expand to 100% over days. (3) A/B testing -- show two variants to different user segments and measure impact. (4) Kill switch -- disable a feature under load without touching the codebase. (5) Dark launches -- deploy code that is off, test it internally, then enable for users. Flags decouple deployment (when code ships) from release (when users see it).

Question 2

How do you implement percentage-based feature flag rollouts?

Accepted Answer

Assign each user to a consistent bucket using a deterministic hash: bucket = hash(flag_name + user_id) % 100. This gives a value 0-99 that is stable for the same user and flag -- the user does not flip between old and new behavior during a gradual rollout. Enable the flag if bucket  config service publishes -> Redis delivers to all subscribers -> SDK refreshes = under 1 second. The polling serves as a safety net if the Pub/Sub message is missed.

Question 3

How do you implement targeting rules for feature flags (user segments, country, plan)?

Accepted Answer

Store targeting rules as a priority-ordered list of conditions and values. Each rule has: condition (JSON: {country: [US, CA], plan: premium}), value (true/false or a variant), and priority. Evaluation: for a given user context (user_id, country, plan, account_age), evaluate rules in priority order. The first matching rule returns its value. If no rule matches, return the default value. Condition types: exact match (country == US), list membership (plan in [premium, enterprise]), numeric comparison (account_age_days >= 30), percentage rollout (hash(flag+user_id) % 100 < 10). Store rules in Redis as a sorted set by priority for fast evaluation without a database query.

Question 4

How do you audit and recover from a bad config change?

Accepted Answer

Every config change is written to an immutable audit log: who changed it, when, the previous value, and the new value. To recover from a bad change: one-click rollback in the dashboard restores the previous version (looks up ConfigVersion, updates the flag to previous_value, logs the rollback event). For automated rollback: integrate with monitoring -- if error rate increases by 3x within 5 minutes of a flag change, automatically disable the flag and alert the team. The audit log is append-only -- never delete entries, even for rollbacks. This provides a complete history for post-incident review. Store audit logs in a separate table with restricted write access (only the config service can write, not application code).

System Design: Configuration Management Service — Feature Flags, Dynamic Config, and Safe Rollouts

What Is a Config Management Service?

Core Features

Data Model

Rule Evaluation

Config Distribution Architecture

Safe Rollout

Consistency and Caching

Operational Features