Question 1

How does Prometheus store time series data efficiently?

Accepted Answer

Prometheus uses a custom storage format optimized for time series metrics. Write path: raw samples are first written to a WAL (Write-Ahead Log) on disk for crash recovery, then buffered in memory in 2-hour chunks per time series. When a 2-hour memory chunk is complete, it is flushed to disk as an immutable block and compressed. Block format: each block covers a 2-hour window and contains separate files for chunks (the actual sample data), index (mapping from metric labels to chunk locations), and metadata. Compression: Prometheus uses Gorilla-style XOR compression for sample values (consecutive float64 values share many bits u2014 XOR produces small integers) and delta encoding for timestamps (store differences, not absolute values). This achieves ~1.3-1.5 bytes per sample versus 16 bytes raw (8 bytes timestamp + 8 bytes float64). Compaction: over time, small 2-hour blocks are merged by a background compaction process into larger 4-hour, 8-hour, ... blocks. Larger blocks enable more aggressive compression and faster range queries (fewer blocks to scan). Retention: Prometheus deletes blocks outside the retention window (default 15 days) during compaction. The compaction process also handles downsampling for remote storage (Thanos/Cortex).

Question 2

What is high cardinality and why is it a problem for time series databases?

Accepted Answer

Cardinality in a TSDB is the number of unique time series u2014 the number of unique combinations of metric name + all label values. High cardinality means millions or billions of unique time series. Example: http_requests_total{user_id="12345", endpoint="/api/v2/users"} u2014 if user_id can be any of 100M user IDs and there are 100 endpoints: cardinality = 100M u00d7 100 = 10 billion unique time series. Why this is a problem: (1) Memory u2014 Prometheus keeps the last two hours of all active time series in RAM. At 4KB per time series, 10 billion series = 40 terabytes of RAM u2014 impossible. (2) Disk u2014 each unique time series requires its own chunk files and index entries. (3) Query performance u2014 queries that don't specify a high-cardinality label must scan all series (e.g., sum over all user_ids). (4) Index size u2014 the label index grows proportionally to cardinality. Solution: never use unbounded high-cardinality identifiers (user_id, request_id, session_id, IP address) as labels. Use low-cardinality labels only: environment (prod/staging), region (us-east/eu-west), service, endpoint (bucketed path, not raw URL with IDs), HTTP method, status code. For per-user metrics, use a different storage system (OLAP database, event store) designed for high-cardinality data.

Question 3

How does PromQL compute rate() and why is it needed for counters?

Accepted Answer

Prometheus counters are monotonically increasing u2014 they only go up (e.g., http_requests_total = 1,500,000 and increasing with every request). Querying the raw counter value tells you the total since the server started, not the current request rate. rate() computes the per-second rate of increase over a time window: rate(http_requests_total[5m]) = (last_value - first_value) / 300_seconds. This gives "requests per second, averaged over the last 5 minutes." PromQL handles counter resets automatically: if the counter resets to 0 (e.g., server restart), rate() detects the reset (last_value < previous_value) and adjusts u2014 it counts only the increase since the reset, not a false negative spike. irate() computes the instantaneous rate using only the last two data points u2014 more responsive to sudden changes but noisier (high variance). rate() over a longer window is smoother u2014 appropriate for dashboards and alerting. increase(counter[5m]) = rate(counter[5m]) u00d7 300 = total increase over the 5-minute window (as a float, not necessarily integer due to extrapolation). Rule: always use rate() or irate() with counters, never display the raw counter value on dashboards u2014 it means nothing without the context of when the counter started.

System Design Interview: Time Series Database (Prometheus / InfluxDB)

What Is a Time Series Database?

Data Model

Write Path and Compression

Query Language: PromQL

Scalability: Thanos and Cortex

Downsampling and Retention

Interview Checklist

Companies That Ask This