System Design: IoT Data Platform — Device Ingestion, Time-Series Storage, and Real-Time Alerting

Requirements

An IoT data platform ingests telemetry from millions of connected devices — sensors, smart meters, industrial equipment, wearables — and provides storage, querying, and real-time alerting. Core challenges: massive write throughput (millions of devices sending readings every few seconds), efficient time-series storage (data is append-only and queried by time range), real-time anomaly detection (alerting within seconds of a threshold breach), and long-term data retention with cost optimization. Scale: AWS IoT Core connects 500 million+ devices. Industrial IoT platforms may ingest 1 million readings per second. Smart city deployments can have 100K+ sensors per city block.

Device Connectivity and Ingestion

Protocol selection: MQTT (Message Queue Telemetry Transport) is the standard for IoT ingestion. It uses a pub/sub model over TCP with a 2-byte header overhead, QoS levels (0=fire-and-forget, 1=at-least-once, 2=exactly-once), and persistent sessions (broker queues messages for offline devices). Alternative: CoAP (Constrained Application Protocol) for extremely resource-constrained devices (microcontrollers). HTTP/REST for devices with sufficient resources and where reliability matters more than efficiency. Architecture: devices connect to an MQTT broker cluster (EMQX, HiveMQ, AWS IoT Core). Each device publishes to a topic: devices/{device_id}/telemetry. The broker routes messages to subscribers — in this case, the ingestion pipeline. Ingestion pipeline: MQTT broker → Kafka (device_telemetry topic) → stream processor (Flink/Spark Streaming) → time-series DB + real-time alert engine. Kafka serves as a buffer and replay log. Throughput: Kafka handles 1M+ messages/second on a modest cluster. Message schema: {device_id, timestamp (ISO 8601 or Unix epoch), readings: {temperature: 23.4, humidity: 60, battery: 87}, metadata: {firmware: “v2.1.3”}}. Device registry: maintain a device registry (PostgreSQL): device_id, owner_id, type, model, firmware_version, provisioning_date, location, is_active. Used to validate incoming messages and enrich with metadata.

Time-Series Storage

Time-series data has a distinct access pattern: high write throughput, queries always include a time range, recent data is accessed far more than old data. Purpose-built time-series databases outperform general-purpose RDBMS by 10-100x for this workload. InfluxDB: purpose-built for metrics and events. Measurements (like tables) organized by tags (indexed metadata: device_id, location, type) and fields (values: temperature, humidity). Retention policies: auto-expire old data (e.g., raw 1-second data kept 30 days; 1-minute aggregates kept 1 year; 1-hour aggregates kept forever). Continuous queries: InfluxDB runs aggregate queries continuously in the background, populating downsampled measurements. TimescaleDB: PostgreSQL extension. Hypertables auto-partition data by time (chunks: e.g., 1-week per chunk). Chunk compression: older chunks are compressed 90%+. Standard SQL queries work on all chunks transparently. Best of both worlds: time-series optimization + full SQL. Compression: time-series data compresses extremely well (delta encoding for timestamps, gorilla encoding for floating-point values). InfluxDB and TimescaleDB achieve 10-50x compression. A raw 100GB dataset compresses to 2-10GB. Data tiering: keep recent data (last 7 days) on fast NVMe SSDs. Move older data to object storage (S3) with Parquet format for cheap long-term analytics (queried via Athena or BigQuery).

Real-Time Alerting

Alert rules: device owners configure threshold-based alerts: temperature > 80°C for 5 consecutive readings → CRITICAL alert. Rules stored in an AlertRule table: rule_id, device_id (or device_group), metric, operator, threshold, sustained_periods, severity, notification_channels. Alert evaluation: Flink stateful stream processing evaluates rules against the live data stream. Per-device state machine: track the count of consecutive threshold breaches. If count >= sustained_periods: fire the alert. Deduplication: suppress repeated alerts for the same condition — fire once when the condition is met, fire a resolution alert when it clears. Alert routing: alerts published to a Kafka alert topic → notification service → PagerDuty, SMS, email, webhook, Slack. Anomaly detection: ML-based detection (isolation forest, LSTM autoencoder) for devices where the threshold is not known in advance. Baseline learned from 2-4 weeks of historical data per device. Anomaly score published alongside readings. Users configure alert on anomaly_score > 0.8. Deployed as a separate Flink job consuming from the same Kafka topic.

Query and Visualization

Common query patterns: time-range queries (temperature of device X from 9am to 5pm), aggregation (hourly average temperature across all devices in building Y), downsampling (1-second raw → 1-minute averages for charting), latest value (current status of all devices in a fleet). Query API: REST/GraphQL with time range, device filter, metric, and aggregation parameters. Backend translates to InfluxDB Flux queries or TimescaleDB SQL. Dashboard: Grafana connects natively to InfluxDB and TimescaleDB. Pre-built IoT dashboards with auto-refresh. Customers can build custom dashboards. Live streaming: WebSocket connection pushes new readings to the dashboard every second without polling. Device shadow: maintain a “shadow” (last known state) for each device in Redis: HSET device:{id}:shadow temperature 23.4 humidity 60 updated_at 1700000000. Provides instant “current state” queries without scanning the time-series DB. Updated by the stream processor on every new reading.

See also: Databricks Interview Prep

See also: Cloudflare Interview Prep

See also: Snap Interview Prep

Scroll to Top