System Design Interview: Design a Distributed Message Queue (Kafka)

System Design: Distributed Message Queue (Kafka)

A distributed message queue decouples producers (services that generate events) from consumers (services that process them), enabling async communication, backpressure handling, and replay. Apache Kafka is the de facto standard — it handles trillions of messages per day at LinkedIn, Netflix, and Uber. This guide covers the design of a Kafka-like system from the ground up.

Requirements

Functional: Producers publish messages to named topics. Consumers subscribe to topics and receive messages. Messages are retained for configurable duration (7 days default). Messages can be replayed from any offset. At-least-once delivery guarantee.

Non-functional: 1 million messages/sec write throughput, sub-10ms publish latency, horizontal scalability, fault tolerance (survive broker failures), ordered messages within a partition.

Core Concepts

  • Topic: Named stream of messages (e.g., “user.clicks”, “order.created”)
  • Partition: A topic is divided into N ordered, immutable logs. Messages within a partition are totally ordered. Partitions enable parallelism.
  • Offset: Sequential integer assigned to each message within a partition. Consumers track their position via offset.
  • Producer: Writes messages to a partition (by key hash or round-robin). Receives ACK after message is durably written.
  • Consumer Group: A group of consumers sharing a topic subscription. Each partition is assigned to exactly one consumer in the group. This gives parallel consumption without duplicate processing.
  • Broker: A server that stores partitions and serves producers/consumers. Kafka cluster = multiple brokers.

Data Model

Topic:     topic_name, num_partitions, retention_bytes, retention_ms
Partition: topic_name, partition_id, leader_broker_id, replica_broker_ids
Message:   offset, key, value (bytes), timestamp, headers
Offset:    consumer_group, topic, partition_id, committed_offset

Write Path

Producer → Partition Selection → Leader Broker → Write to Segment File
                                                       ↓
                                             Replicate to Follower Brokers
                                                       ↓
                                             ACK to Producer (after N replicas confirm)

Partition selection: If message has a key: partition = hash(key) % num_partitions — same key always goes to same partition (ordering guarantee for related messages, e.g., all events for user_id=123). If no key: round-robin across partitions (maximizes throughput).

Durability: Producer acks setting (Kafka’s acks config): acks=0 — fire and forget (fastest, can lose messages); acks=1 — leader acknowledges (can lose if leader fails before replicating); acks=all — all in-sync replicas acknowledge (no data loss, highest latency).

Storage: Each partition is a sequence of segment files (each ~1 GB). New messages are appended to the active segment. Sequential writes to disk are extremely fast (500 MB/s vs 100 MB/s random). Retention: old segments are deleted when size > retention_bytes or age > retention_ms.

Read Path

Consumer → Poll(topic, partition, offset) → Broker → Return messages from offset
                ↓
        Process messages
                ↓
        Commit offset (mark progress)

Zero-copy read: Kafka uses Linux sendfile() to transfer data from disk directly to the network socket, bypassing application memory. This is 4–6x faster than read-then-write.

Consumer group rebalancing: When a consumer joins or leaves, the group coordinator (a special broker) triggers a rebalance — reassigns partitions among the active consumers. During rebalancing, consumption pauses briefly. New consumers get partitions at the last committed offset.

Delivery Guarantees

Guarantee How Tradeoff
At-most-once Commit offset before processing May miss messages on crash
At-least-once Commit offset after successful processing May reprocess on crash (duplicates possible)
Exactly-once Idempotent producer + transactional consumer Higher latency, more complex

At-least-once is the industry default. Handle idempotency in the consumer: track processed message IDs in Redis or DB, skip duplicates on reprocessing.

Replication and Fault Tolerance

Each partition has a leader and N-1 followers (replicas). Producers and consumers only talk to the leader. Followers pull from the leader continuously. In-Sync Replicas (ISR) = followers that are caught up within a threshold. If the leader fails, ZooKeeper (or Kafka’s built-in Raft in newer versions) elects a new leader from the ISR. With replication factor 3 and min-ISR=2: the cluster survives 1 broker failure without data loss.

Scale Numbers

  • LinkedIn runs Kafka at ~7 trillion messages/day (80M messages/sec peak)
  • Single broker handles 1–5 GB/s throughput with SSDs
  • 1M messages/sec with 1KB messages = 1 GB/s. Requires ~10 brokers with SSD.
  • Consumer throughput limited by processing speed, not Kafka (Kafka can replay as fast as disk allows)

Interview Tips

  • Explain partitions as the unit of parallelism — more partitions = more consumer group parallelism.
  • Mention that partition ordering is guaranteed; topic-level ordering is not (messages across partitions are interleaved).
  • At-least-once vs exactly-once: at-least-once with consumer-side idempotency is the pragmatic choice.
  • Contrast with a traditional queue (RabbitMQ): Kafka retains messages and allows replay; RabbitMQ deletes after consumption. Use Kafka when you need event sourcing, audit trails, or stream processing.
  • Use cases: event streaming, log aggregation, change data capture (CDC), decoupling microservices.

🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

🏢 Asked at: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

🏢 Asked at: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

🏢 Asked at: Cloudflare Interview Guide

Scroll to Top