Question 1

Why do chat systems use WebSockets instead of HTTP polling?

Accepted Answer

HTTP polling (client repeatedly requests "any new messages?") wastes bandwidth and adds latency: at 1-second intervals for 1M users, the server handles 1M empty HTTP responses per second u2014 even when there are no new messages. Long polling reduces wasted requests (server holds the connection open until a message arrives) but still has overhead: each message delivery requires a new HTTP connection setup. WebSockets solve this: one TCP connection per user, established via HTTP Upgrade handshake, then fully bidirectional with no HTTP overhead per message. A WebSocket frame for a 100-byte message adds only 6-14 bytes of overhead (versus ~500 bytes for HTTP headers). At 100K concurrent users, a single server handles all connections with ~6GB RAM and minimal CPU when idle. Server-Sent Events (SSE) are an alternative for server-to-client push only u2014 simpler than WebSockets, built on HTTP/2, good for notification streams but not bidirectional chat.

Question 2

How does WhatsApp deliver messages to offline users?

Accepted Answer

WhatsApp persists every message before attempting delivery. The message store (Cassandra, partitioned by conversation_id) retains undelivered messages indefinitely. When the sender's server cannot deliver to the recipient (no active WebSocket connection), it stores the message and marks status as "pending". When the recipient reconnects via WebSocket, the client sends its last_received_message_id in the connection handshake. The server queries Cassandra for messages with ID > last_received_message_id, ordered chronologically, and delivers the backlog in order. The client ACKs each batch and updates last_received_message_id. For mobile devices with the app in the background, WhatsApp sends a push notification via APNs (iOS) or FCM (Android) u2014 the payload contains only a notification trigger (not the message content for E2E encryption), prompting the device to open a WebSocket connection and pull the pending messages. Message retention: WhatsApp deletes messages from its servers after delivery (7-day fallback); iCloud and Google Drive backups are separate.

Question 3

How do you scale WebSocket connections to millions of concurrent users?

Accepted Answer

WebSocket connections are stateful u2014 the server must remember which users are connected to it. Scaling strategy: (1) horizontal scaling with a connection registry. Deploy N stateless chat servers; a Redis hash maps user_id to server_id (which server holds their WebSocket). When server A receives a message for user B, it looks up B's server_id in Redis and forwards the message via internal HTTP/gRPC to server B, which delivers over B's WebSocket. (2) Consistent hashing for connection routing: assign user IDs to servers via consistent hashing. The load balancer (Layer 4, preserving TCP connections) routes new WebSocket connections to the correct server. When a user reconnects after a server crash, the registry is stale u2014 the new connection updates the registry. (3) Capacity: each chat server handles 50K-100K concurrent WebSocket connections (limited by file descriptors and RAM at ~64KB per connection). For 10M concurrent users, deploy 100-200 chat servers. Use epoll (Linux) for efficient I/O multiplexing u2014 handle 100K connections with a single thread and O(1) event notification.

System Design Interview: Real-Time Chat System (WhatsApp / Slack)

Core Requirements

WebSocket Architecture

Message Flow: Sending a Message

Message Storage: Cassandra Data Model

Presence System

Group Chat Fan-Out

Message Delivery Guarantees

Media Messages

Scaling Considerations

Companies That Ask This