Question 1

What are the at-most-once, at-least-once, and exactly-once delivery semantics in Kafka?

Accepted Answer

Delivery semantics describe what happens when a consumer crashes mid-processing: At-most-once: the consumer commits the offset to Kafka before processing the message. If the consumer crashes after committing but before finishing processing, the message is skipped u2014 it won't be redelivered (Kafka thinks it's done). This is appropriate for metrics/logging where occasional data loss is acceptable. At-least-once: the consumer processes the message and then commits the offset. If the consumer crashes after processing but before committing, Kafka redelivers the message u2014 it will be processed twice. This is the default and most common choice. Consumers must be idempotent (re-processing the same message produces the same outcome u2014 use deduplication with a unique message ID). Exactly-once: requires Kafka Transactions API. The producer sends messages with a transaction ID; Kafka assigns a producer ID and sequence numbers, deduplicating retries at the broker. The consumer uses a transactional consumer that commits offsets atomically with its own Kafka producer output. Only works within a Kafka-to-Kafka pipeline. For Kafka-to-external-database, use idempotent writes (upsert by message_id) to approximate exactly-once.

Question 2

How does Kafka achieve high throughput compared to traditional message brokers?

Accepted Answer

Kafka achieves throughput of millions of messages per second through several design decisions: (1) Sequential disk I/O u2014 Kafka appends messages to a log file sequentially, which achieves disk throughput of hundreds of MB/s (versus random I/O at 1-2 MB/s). Sequential reads also enable OS page cache optimization u2014 frequently accessed log segments stay in RAM. (2) Zero-copy data transfer u2014 Kafka uses the sendfile() system call (Linux) to transfer data from page cache directly to the network socket, bypassing user space. This eliminates 2 memory copies and 2 context switches per message delivery. (3) Batching u2014 producers accumulate messages in a batch (by time or size) and send in bulk. Consumers receive batches. Batch compression (snappy, lz4, zstd) reduces network bandwidth by 3-5x. (4) Partitioning u2014 a topic with N partitions can be written to and read from in parallel by N producers and N consumer group members simultaneously. (5) Pull-based consumers u2014 consumers control their read pace, eliminating backpressure issues. Compare to RabbitMQ which uses push-based delivery and stores messages in RAM/on-disk with random access patterns u2014 typically 5-10x lower throughput than Kafka.

Question 3

When should you choose Kafka over Amazon SQS for a message queue?

Accepted Answer

Choose Kafka when: (1) Multiple consumer groups need to independently read the same messages u2014 Kafka retains messages for a configurable period (days/weeks) and allows replay; SQS deletes messages after consumption. (2) You need event replay/reprocessing u2014 Kafka consumers can reset their offset and reprocess historical events; SQS cannot. (3) Throughput requirements exceed SQS FIFO limits u2014 SQS FIFO maxes at 3,000 msg/sec per queue; Kafka scales to millions. (4) You need strict ordering across the full event history, or you're building event sourcing/CQRS. (5) You need stream processing (Kafka Streams, Flink reading from Kafka). Choose SQS when: (1) You want a fully managed service with no operational overhead u2014 Kafka requires cluster management (or MSK/Confluent Cloud). (2) You need a simple dead letter queue with automatic routing after N failures. (3) Messages should be deleted after consumption (you don't need replay). (4) Throughput is modest (< 10K msg/sec) and you don't want to size and manage Kafka partitions. In practice: use Kafka for event streaming and analytics pipelines; use SQS for task queues and microservice decoupling where replay isn't needed.

System Design Interview: Distributed Message Queue (Kafka / SQS)

Why Message Queues?

Kafka Architecture

Message Durability and Replication

Kafka vs SQS vs RabbitMQ

Consumer Groups and Delivery Semantics

Dead Letter Queues

Ordering Guarantees

Scaling Kafka

Companies That Ask This