Core Requirements
A real-time chat system must: deliver messages in under 100ms end-to-end, support 1:1 and group chats, handle offline message delivery, show online presence, and support media (images, files). WhatsApp serves 100B messages/day across 2B users. Slack supports threaded conversations, channels, and workspaces. The fundamental challenge is maintaining a persistent connection per active user at massive scale.
WebSocket Architecture
HTTP request-response is unsuitable for real-time messaging — the server cannot push to clients. WebSockets provide a persistent bidirectional TCP connection established via HTTP upgrade. Connection lifecycle: client opens HTTP connection → sends Upgrade: websocket header → server responds 101 Switching Protocols → both sides can send frames at any time. Each connected user maintains one WebSocket connection to a stateful chat server. A server with 64GB RAM and 1Gbps NIC can hold ~100K concurrent WebSocket connections (each connection uses ~64KB RAM and negligible CPU when idle). For 10M concurrent users, you need 100 chat servers. Clients automatically reconnect with exponential backoff on disconnect.
Message Flow: Sending a Message
- Sender’s client sends the message over its WebSocket connection to the chat server.
- Chat server persists the message to the message store (Cassandra) and assigns a monotonically increasing message ID (Snowflake ID for global uniqueness + time-ordering).
- Chat server publishes the message to a Pub/Sub system (Redis Pub/Sub or Kafka) on the recipient’s channel (channel key = user_id or group_id).
- The chat server holding the recipient’s WebSocket connection is subscribed to that channel. It receives the pub/sub event and pushes the message over the WebSocket to the recipient in real time.
- If the recipient is offline (no active WebSocket), the message is queued in the message store. On reconnect, the client sends its last_seen_message_id; the server queries for messages with ID > last_seen and delivers them.
Message Storage: Cassandra Data Model
Messages have high write volume (100B/day = 1.15M writes/sec), require fast retrieval by conversation, and older messages are rarely read. Cassandra is ideal: writes are cheap (append to log), and reads by partition key are fast.
Schema: PRIMARY KEY ((conversation_id), message_id DESC). Partition = one conversation. Messages within a partition are sorted by message_id DESC (newest first). Fetching the last 50 messages = read first 50 rows of the partition — efficient single-partition query. Message IDs are Snowflake IDs (time-sortable), so DESC ordering gives reverse chronological order. For WhatsApp’s scale: 100B messages/day × 1KB/message = 100TB/day — use TTL (messages expire after 1 year) and tiered storage (hot messages in Cassandra, archived to S3 via compaction).
Presence System
Online/offline presence is a fan-out problem. When user A connects/disconnects, all of A’s contacts need to know. Naive approach: on connect, publish presence to each contact — O(|contacts|) writes per connection event. At scale with users having 500 contacts and 1M connect events/hour: 500M writes/hour.
Optimized approach: heartbeat-based with server-side inference. Each connected client sends a heartbeat every 30 seconds. The presence service stores last_heartbeat_at in Redis (key = user_id, TTL = 45 seconds). Presence is “online” if key exists, “offline” if expired. Clients poll or subscribe to presence updates only for contacts they are actively viewing. Group chats show presence only for the currently visible members. WhatsApp uses a gossip protocol among servers to propagate presence — each server knows which users are connected to it and shares that information with other servers that have subscribed contacts.
Group Chat Fan-Out
For a group with N members, sending one message requires delivering to N-1 recipients. A group with 500 members sends a message → the chat server must fan out to 500 connections, potentially spread across 50 different chat servers. Implementation: the group membership service stores group_id → [member_ids]. On message receipt, the chat server looks up all member IDs, maps each to its chat server (via a service registry like ZooKeeper or a Redis hash), and sends a delivery request to each relevant chat server via an internal RPC or pub/sub. Each server delivers to its locally connected members.
For very large groups (Slack channels with 10K members), store the group roster in a database (not in-memory), batch the fan-out, and tolerate slight delivery delays (eventually deliver via the message store if the real-time path fails).
Message Delivery Guarantees
At-least-once delivery: messages are persisted before delivery acknowledgment. If delivery fails, retry from the persistent store. Recipients deduplicate by message_id.
Read receipts: when the recipient’s client receives and renders a message, it sends an ACK back over the WebSocket. The sender’s server updates message status (delivered → read) in the database and notifies the sender. WhatsApp uses single checkmark (sent), double checkmark (delivered), blue double checkmark (read).
Message ordering: within a conversation, messages are ordered by Snowflake ID. Because Snowflake IDs embed a timestamp, concurrent messages from different senders in a group chat may arrive in slightly different orders on different clients — this is acceptable for chat. For strict ordering requirements, use a single sequence number per conversation assigned by the server.
Media Messages
Images and files are not sent over WebSocket (too large). Client uploads the file directly to object storage (S3) via a pre-signed URL, gets back a media URL, and sends a message containing the media URL (not the bytes). Recipients download media from the CDN-backed URL on demand. WhatsApp encrypts media client-side before upload and includes the decryption key in the message — the server never sees plaintext media.
Scaling Considerations
- Connection routing: a consistent hash ring maps user_id to chat server. When a message targets user X, any server can look up which server holds X’s connection and route the delivery.
- Horizontal scaling: add chat servers; reassign connection ranges via consistent hashing with minimal disruption.
- Mobile push notifications: when a user has no WebSocket connection (app in background), deliver via APNs (iOS) or FCM (Android). The notification service subscribes to the message event stream and forwards to push provider for offline users.
- Message encryption: WhatsApp uses Signal Protocol (X3DH key exchange + Double Ratchet). Keys are exchanged client-to-client; the server stores only encrypted ciphertext. End-to-end encryption means group key management becomes complex — each sender encrypts the message separately for each recipient’s public key.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “Why do chat systems use WebSockets instead of HTTP polling?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “HTTP polling (client repeatedly requests “any new messages?”) wastes bandwidth and adds latency: at 1-second intervals for 1M users, the server handles 1M empty HTTP responses per second u2014 even when there are no new messages. Long polling reduces wasted requests (server holds the connection open until a message arrives) but still has overhead: each message delivery requires a new HTTP connection setup. WebSockets solve this: one TCP connection per user, established via HTTP Upgrade handshake, then fully bidirectional with no HTTP overhead per message. A WebSocket frame for a 100-byte message adds only 6-14 bytes of overhead (versus ~500 bytes for HTTP headers). At 100K concurrent users, a single server handles all connections with ~6GB RAM and minimal CPU when idle. Server-Sent Events (SSE) are an alternative for server-to-client push only u2014 simpler than WebSockets, built on HTTP/2, good for notification streams but not bidirectional chat.”
}
},
{
“@type”: “Question”,
“name”: “How does WhatsApp deliver messages to offline users?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “WhatsApp persists every message before attempting delivery. The message store (Cassandra, partitioned by conversation_id) retains undelivered messages indefinitely. When the sender’s server cannot deliver to the recipient (no active WebSocket connection), it stores the message and marks status as “pending”. When the recipient reconnects via WebSocket, the client sends its last_received_message_id in the connection handshake. The server queries Cassandra for messages with ID > last_received_message_id, ordered chronologically, and delivers the backlog in order. The client ACKs each batch and updates last_received_message_id. For mobile devices with the app in the background, WhatsApp sends a push notification via APNs (iOS) or FCM (Android) u2014 the payload contains only a notification trigger (not the message content for E2E encryption), prompting the device to open a WebSocket connection and pull the pending messages. Message retention: WhatsApp deletes messages from its servers after delivery (7-day fallback); iCloud and Google Drive backups are separate.”
}
},
{
“@type”: “Question”,
“name”: “How do you scale WebSocket connections to millions of concurrent users?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “WebSocket connections are stateful u2014 the server must remember which users are connected to it. Scaling strategy: (1) horizontal scaling with a connection registry. Deploy N stateless chat servers; a Redis hash maps user_id to server_id (which server holds their WebSocket). When server A receives a message for user B, it looks up B’s server_id in Redis and forwards the message via internal HTTP/gRPC to server B, which delivers over B’s WebSocket. (2) Consistent hashing for connection routing: assign user IDs to servers via consistent hashing. The load balancer (Layer 4, preserving TCP connections) routes new WebSocket connections to the correct server. When a user reconnects after a server crash, the registry is stale u2014 the new connection updates the registry. (3) Capacity: each chat server handles 50K-100K concurrent WebSocket connections (limited by file descriptors and RAM at ~64KB per connection). For 10M concurrent users, deploy 100-200 chat servers. Use epoll (Linux) for efficient I/O multiplexing u2014 handle 100K connections with a single thread and O(1) event notification.”
}
}
]
}