System Design Interview: Time Series Database (Prometheus / InfluxDB)

What Is a Time Series Database?

A time series database (TSDB) stores sequences of timestamped values — metrics, sensor readings, financial prices, IoT data. Unlike general-purpose databases, TSDBs are optimized for: high write throughput (millions of data points per second), time-range queries (give me CPU usage for the last hour), efficient storage with compression, and automatic data retention (delete data older than 90 days). Prometheus, InfluxDB, TimescaleDB, and Graphite are popular TSDBs. Use cases: infrastructure monitoring (CPU, memory, network), application performance monitoring (request latency, error rate), IoT sensor data, and financial tick data.

Data Model

A time series is identified by a metric name and a set of labels (key-value pairs). In Prometheus: metric_name{label1=”value1″, label2=”value2″} timestamp value. Example: http_requests_total{method=”GET”, endpoint=”/api/users”, status=”200″} 1714000000 42543. Each unique combination of metric name + labels is one time series. Labels enable filtering and aggregation: sum(http_requests_total{status=”200″}) by (endpoint) gives total successful requests per endpoint. Cardinality: the number of unique time series = combinations of all label values. High-cardinality labels (user_id, request_id) create millions of time series — a common anti-pattern that degrades performance.

Write Path and Compression

TSDBs receive millions of writes per second. Efficient storage uses two tricks:

Delta encoding: instead of storing absolute timestamps (1714000000, 1714000060, 1714000120), store the first timestamp and then deltas (0, 60, 60, 60). Deltas are small integers, compressing well. If scrape intervals are consistent (every 60 seconds), the deltas are constant — achievable with run-length encoding (just store: first=1714000000, delta=60, count=100).

Gorilla compression (Facebook’s time series compression for floating-point values): XOR consecutive values. If CPU usage is 45.2, 45.3, 45.1, …, the XOR of consecutive pairs shares many high bits. Store only the meaningful bits. Achieves 1.37 bytes/sample on average vs 16 bytes/sample raw (timestamp + float64). Prometheus achieves similar compression in its block storage format.

In-memory write buffer: writes go to an in-memory chunk (WAL — Write-Ahead Log + memory buffer) for the most recent 2 hours. These hot chunks are queried frequently and compressed lightly for fast access. Periodically (every 2 hours), memory chunks are flushed to disk as immutable blocks, recompressed more aggressively (Snappy/zstd), and indexed. Old blocks are merged and compacted.

Query Language: PromQL

Prometheus Query Language (PromQL) enables flexible metric aggregation:

  • http_requests_total — select all time series with this name
  • rate(http_requests_total[5m]) — per-second rate over 5-minute window (for counter metrics)
  • histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) — P99 latency
  • sum(rate(http_requests_total[5m])) by (endpoint) — total RPS per endpoint
  • avg_over_time(cpu_usage[1h]) — average CPU over last hour

PromQL evaluates over time ranges by scanning relevant time series blocks, applying time-range filters (start/end timestamps), and computing aggregations over the matching data points.

Scalability: Thanos and Cortex

Single-node Prometheus handles ~1M active time series and ~1M samples/second. For larger deployments:

  • Thanos: adds a sidecar to Prometheus that uploads blocks to S3 for long-term storage. A Thanos Query component fans out queries to multiple Prometheus instances and S3, deduplicates results, and presents a single query endpoint. Scales horizontally — add Prometheus instances for more ingestion capacity.
  • Cortex / Mimir: fully distributed, horizontally scalable Prometheus-compatible TSDB. Each component (ingestor, querier, compactor, store-gateway) scales independently. Used by Grafana Cloud to serve thousands of tenants on a single multi-tenant cluster.
  • TimescaleDB: time series extension for PostgreSQL. Automatically partitions data into time-based chunks (hypertables). Enables SQL queries on time series data, referential integrity with other relational tables, and familiar tooling. Best for: IoT data with relational dimensions, financial data requiring SQL joins.

Downsampling and Retention

Raw metric data at 15-second resolution for 1 year: 1M time series × (365 × 24 × 60 × 4) samples = 210 billion samples. At 1.37 bytes/sample (Gorilla): ~290GB. Manageable for a single cluster but not indefinitely. Downsampling: keep raw data for 15 days. After 15 days, downsample to 5-minute resolution (retain min/max/avg over 5 minutes). After 90 days, downsample to 1-hour resolution. This reduces 1-year storage by 95%. Downsampling runs as a background compaction job — never delete raw data until downsampling is confirmed.

Interview Checklist

  • Data model: metric_name{label_key=label_value} → (timestamp, float) time series
  • Write path: WAL + memory buffer → periodic flush to immutable disk blocks
  • Compression: delta encoding for timestamps, XOR/Gorilla for float values
  • Query: time-range scan over indexed blocks, PromQL aggregations
  • Scale: sharded Prometheus + Thanos (S3 long-term) or Cortex/Mimir
  • Retention: raw → 5-minute → 1-hour downsampling tiers

  • Shopify Interview Guide
  • Uber Interview Guide
  • LinkedIn Interview Guide
  • Netflix Interview Guide
  • Cloudflare Interview Guide
  • Companies That Ask This

    Scroll to Top