System Design Interview: Real-Time Chat System (WhatsApp / Slack)

Core Requirements

A real-time chat system must: deliver messages in under 100ms end-to-end, support 1:1 and group chats, handle offline message delivery, show online presence, and support media (images, files). WhatsApp serves 100B messages/day across 2B users. Slack supports threaded conversations, channels, and workspaces. The fundamental challenge is maintaining a persistent connection per active user at massive scale.

WebSocket Architecture

HTTP request-response is unsuitable for real-time messaging — the server cannot push to clients. WebSockets provide a persistent bidirectional TCP connection established via HTTP upgrade. Connection lifecycle: client opens HTTP connection → sends Upgrade: websocket header → server responds 101 Switching Protocols → both sides can send frames at any time. Each connected user maintains one WebSocket connection to a stateful chat server. A server with 64GB RAM and 1Gbps NIC can hold ~100K concurrent WebSocket connections (each connection uses ~64KB RAM and negligible CPU when idle). For 10M concurrent users, you need 100 chat servers. Clients automatically reconnect with exponential backoff on disconnect.

Message Flow: Sending a Message

  1. Sender’s client sends the message over its WebSocket connection to the chat server.
  2. Chat server persists the message to the message store (Cassandra) and assigns a monotonically increasing message ID (Snowflake ID for global uniqueness + time-ordering).
  3. Chat server publishes the message to a Pub/Sub system (Redis Pub/Sub or Kafka) on the recipient’s channel (channel key = user_id or group_id).
  4. The chat server holding the recipient’s WebSocket connection is subscribed to that channel. It receives the pub/sub event and pushes the message over the WebSocket to the recipient in real time.
  5. If the recipient is offline (no active WebSocket), the message is queued in the message store. On reconnect, the client sends its last_seen_message_id; the server queries for messages with ID > last_seen and delivers them.

Message Storage: Cassandra Data Model

Messages have high write volume (100B/day = 1.15M writes/sec), require fast retrieval by conversation, and older messages are rarely read. Cassandra is ideal: writes are cheap (append to log), and reads by partition key are fast.

Schema: PRIMARY KEY ((conversation_id), message_id DESC). Partition = one conversation. Messages within a partition are sorted by message_id DESC (newest first). Fetching the last 50 messages = read first 50 rows of the partition — efficient single-partition query. Message IDs are Snowflake IDs (time-sortable), so DESC ordering gives reverse chronological order. For WhatsApp’s scale: 100B messages/day × 1KB/message = 100TB/day — use TTL (messages expire after 1 year) and tiered storage (hot messages in Cassandra, archived to S3 via compaction).

Presence System

Online/offline presence is a fan-out problem. When user A connects/disconnects, all of A’s contacts need to know. Naive approach: on connect, publish presence to each contact — O(|contacts|) writes per connection event. At scale with users having 500 contacts and 1M connect events/hour: 500M writes/hour.

Optimized approach: heartbeat-based with server-side inference. Each connected client sends a heartbeat every 30 seconds. The presence service stores last_heartbeat_at in Redis (key = user_id, TTL = 45 seconds). Presence is “online” if key exists, “offline” if expired. Clients poll or subscribe to presence updates only for contacts they are actively viewing. Group chats show presence only for the currently visible members. WhatsApp uses a gossip protocol among servers to propagate presence — each server knows which users are connected to it and shares that information with other servers that have subscribed contacts.

Group Chat Fan-Out

For a group with N members, sending one message requires delivering to N-1 recipients. A group with 500 members sends a message → the chat server must fan out to 500 connections, potentially spread across 50 different chat servers. Implementation: the group membership service stores group_id → [member_ids]. On message receipt, the chat server looks up all member IDs, maps each to its chat server (via a service registry like ZooKeeper or a Redis hash), and sends a delivery request to each relevant chat server via an internal RPC or pub/sub. Each server delivers to its locally connected members.

For very large groups (Slack channels with 10K members), store the group roster in a database (not in-memory), batch the fan-out, and tolerate slight delivery delays (eventually deliver via the message store if the real-time path fails).

Message Delivery Guarantees

At-least-once delivery: messages are persisted before delivery acknowledgment. If delivery fails, retry from the persistent store. Recipients deduplicate by message_id.

Read receipts: when the recipient’s client receives and renders a message, it sends an ACK back over the WebSocket. The sender’s server updates message status (delivered → read) in the database and notifies the sender. WhatsApp uses single checkmark (sent), double checkmark (delivered), blue double checkmark (read).

Message ordering: within a conversation, messages are ordered by Snowflake ID. Because Snowflake IDs embed a timestamp, concurrent messages from different senders in a group chat may arrive in slightly different orders on different clients — this is acceptable for chat. For strict ordering requirements, use a single sequence number per conversation assigned by the server.

Media Messages

Images and files are not sent over WebSocket (too large). Client uploads the file directly to object storage (S3) via a pre-signed URL, gets back a media URL, and sends a message containing the media URL (not the bytes). Recipients download media from the CDN-backed URL on demand. WhatsApp encrypts media client-side before upload and includes the decryption key in the message — the server never sees plaintext media.

Scaling Considerations

  • Connection routing: a consistent hash ring maps user_id to chat server. When a message targets user X, any server can look up which server holds X’s connection and route the delivery.
  • Horizontal scaling: add chat servers; reassign connection ranges via consistent hashing with minimal disruption.
  • Mobile push notifications: when a user has no WebSocket connection (app in background), deliver via APNs (iOS) or FCM (Android). The notification service subscribes to the message event stream and forwards to push provider for offline users.
  • Message encryption: WhatsApp uses Signal Protocol (X3DH key exchange + Double Ratchet). Keys are exchanged client-to-client; the server stores only encrypted ciphertext. End-to-end encryption means group key management becomes complex — each sender encrypts the message separately for each recipient’s public key.

  • Airbnb Interview Guide
  • Twitter/X Interview Guide
  • Atlassian Interview Guide
  • LinkedIn Interview Guide
  • Snap Interview Guide
  • Companies That Ask This

    Scroll to Top