Question 1

Why use WebSockets instead of HTTP polling for real-time chat?

Accepted Answer

HTTP polling: client sends a request every N seconds regardless of whether there are new messages. At 1-second polling with 1M users = 1M requests/second of load, mostly returning empty responses. WebSockets establish a persistent TCP connection u2014 the server pushes messages only when they exist. No polling overhead. Bidirectional: client can send messages without a new HTTP handshake. Round-trip latency is 10-50ms vs 500-1000ms for polling. The trade-off: WebSocket connections are stateful (each server holds connection state), complicating horizontal scaling. Long polling is a middle ground: works everywhere but has higher latency and overhead than WebSockets.

Question 2

How do you route a message to a user connected to a different chat server?

Accepted Answer

Use Redis Pub/Sub as a message bus between chat servers. When User A sends a message to User B: (1) Chat Server 1 (A's server) stores the message. (2) It looks up which server B is connected to (Redis: GET presence:{B_id} returns server_id). (3) It publishes the message to a Redis channel (e.g., user:{B_id}). (4) Chat Server 2 (B's server) subscribes to user:{B_id} and receives the message. (5) It pushes the message to B's WebSocket. For very high scale (millions of active users), use Kafka instead of Redis Pub/Sub u2014 Kafka is more durable and scales to millions of topics.

Question 3

How does a chat app handle messages sent while the recipient is offline?

Accepted Answer

When a message is sent, the system checks presence (Redis TTL key): if the key is missing, the user is offline. The message is stored in the message database (always). A Kafka event is published for the notification service. The notification service sends a push notification via FCM (Android) or APNs (iOS) with the message preview. When the user comes back online, their client sends the last seen message ID. The server returns all messages with sequence_num > last_seen from the message store. This "catch-up" pull ensures no messages are missed, even if multiple push notifications were received.

Question 4

How do you implement message ordering in a distributed chat system?

Accepted Answer

Assign a sequence number per conversation. Option 1: database auto-increment u2014 the message INSERT returns the auto-incremented ID which serves as the sequence number. Atomic and consistent but creates a write bottleneck for very high-volume group chats. Option 2: Redis atomic counter u2014 INCR conv:{id}:seq before inserting. Fast and distributed. Option 3: Snowflake ID (timestamp + machine ID + sequence) u2014 globally unique, roughly time-ordered, no central coordinator needed. For display: sort by sequence_num. Clients show messages in sequence order; if a message arrives out of order (race condition), buffer briefly and re-sort.

Question 5

How does WhatsApp scale to billions of users with persistent message storage?

Accepted Answer

WhatsApp uses Mnesia (Erlang distributed database) for presence and session data, and YAWS (Erlang web server) for connection handling. Messages are stored in Cassandra (wide column store) partitioned by conversation_id. Erlang's actor model (one lightweight process per connection) handles millions of concurrent WebSocket connections on a single server. For storage: each user's message history is partitioned by (user_id, year_month). WhatsApp is known for extreme efficiency u2014 at peak, it ran ~990M users on ~32 servers before the Facebook acquisition significantly changed the architecture.

System Design: Real-time Chat and Messaging System (WhatsApp/Slack) — WebSockets, Pub/Sub, Scale

Requirements

Core Architecture

WebSocket vs. Long Polling vs. SSE

Chat Server Architecture

Message Data Model

Message Ordering and Sequencing

Message Storage at Scale

User Presence

Push Notifications for Offline Users

Read Receipts

Scaling