Core Requirements
1:1 messaging and group chats. Real-time delivery (< 100ms latency). Message persistence: users can scroll back through history. Delivery and read receipts. Online/offline presence. Media attachments (images, files). At scale: WhatsApp handles 100 billion messages per day, Slack handles millions of simultaneous WebSocket connections.
Connection Layer
WebSocket connections are the foundation: persistent bidirectional TCP connections for real-time message push. Each client connects to a chat server and maintains the connection for the session duration. Connection servers are stateful (hold the WebSocket). Scale: one server can handle ~50,000 concurrent WebSocket connections. For 100M concurrent users: ~2000 servers. Use consistent hashing to route a user to the same server (or use a connection registry in Redis: user_id → server_id). When a message is sent to user B: look up B’s server in Redis, forward the message to that server, which pushes to B’s WebSocket. If B is offline: skip and store the message for later delivery.
Message Flow
Sender sends message via WebSocket to their connection server. Connection server assigns a message_id (snowflake ID: timestamp + server_id + sequence) and publishes the message to a Kafka topic partitioned by conversation_id. Two consumers: (1) Message storage service: writes to Cassandra (optimized for time-range reads by conversation). (2) Delivery service: reads from Kafka, looks up recipients, pushes to their connection servers (which push to WebSocket). For offline recipients: delivery service queues the message in a push notification queue (APNs/FCM). Message ordering: within a conversation, Cassandra stores messages ordered by (conversation_id, message_id). The snowflake message_id is monotonically increasing per server, guaranteeing order within a server.
Message Storage
Cassandra is the canonical choice for chat storage. Why: (1) high write throughput (millions of messages/sec), (2) time-ordered reads within a partition (conversation history), (3) linear scalability. Schema: primary key (conversation_id, message_id DESC) — fetches most recent messages first. Include: sender_id, content, type (TEXT, IMAGE, FILE, AUDIO), timestamp, status (SENT, DELIVERED, READ). For search: index messages in Elasticsearch asynchronously. For large media: store in S3, reference the URL in the message content. Retention: delete messages older than 7 years (compliance). Archive old data to S3 Glacier for cost savings.
Group Chats
Group chats add fan-out complexity. When a message is sent to a group with N members: deliver to N recipients. At N=10: trivial. At N=10,000 (large Slack channels): fan-out becomes expensive. Two approaches: push model (fan-out on write): for each message, enqueue N delivery tasks. Simple but expensive for large groups. Pull model (fan-out on read): don’t fan-out writes. Each member’s client polls for new messages in the group (cursor-based: “give me messages after message_id X”). Used by Slack for large channels. Hybrid: push for small groups (< 100 members), pull for large. Store a per-member last-read cursor to support read receipts and "unread count".
Presence and Delivery Receipts
Online presence: on WebSocket connect, SET user:{id}:online 1 EX 30 (Redis with 30s TTL). Client sends heartbeat every 10 seconds: EXPIRE user:{id}:online 30. On disconnect: DEL the key. Other users query presence: GET user:{id}:online. For scale: presence is eventually consistent — a user’s online status may be stale by up to 30 seconds (the TTL). Delivery receipts: on message delivery to the recipient’s device: send a delivery ACK message back through the WebSocket. Update message status to DELIVERED in Cassandra. On recipient opens/reads: send a read receipt. Update to READ. Broadcast the status update to the sender’s WebSocket so their UI updates the checkmarks (WhatsApp double-blue-check pattern).
Interview Tips
- WebSocket vs polling: WebSocket for real-time push (chat). Long-polling acceptable for low-frequency updates. SSE (Server-Sent Events) for server→client only (notifications).
- Message ID design: use Snowflake IDs (monotonic, sortable by time) rather than UUIDs for Cassandra partition ordering.
- At-least-once delivery: client retries on timeout with the same message_id. Server deduplicates by message_id on insert.
- End-to-end encryption: keys never leave devices. Server stores ciphertext only. Signal Protocol (used by WhatsApp) handles key exchange.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you route a message to the correct WebSocket server?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each chat server holds WebSocket connections for a subset of users. When server A needs to deliver a message to a user connected to server B, it must locate server B. Use a Redis hash: HSET user_connections user:{id} server:{id}. On connect: write useru2192server mapping. On disconnect: delete. To deliver: (1) message arrives at any server via Kafka, (2) lookup the recipient’s server_id from Redis, (3) forward the message via internal HTTP or a pub/sub channel to that server, (4) that server pushes to the user’s WebSocket. Alternative: use Redis pub/sub — when server B publishes a message for user X, all servers subscribe to their own channel and forward to the matching connection. This avoids the need for direct server-to-server HTTP calls.”
}
},
{
“@type”: “Question”,
“name”: “How does message ordering work in a distributed chat system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Message ordering is tricky in distributed systems. Two approaches: (1) Server-assigned sequence numbers: the chat server assigns a monotonically increasing sequence number per conversation before writing to Cassandra. All clients see messages in server-assigned order. Downside: requires a distributed counter (Redis INCR, or a sequence table with a lock). (2) Snowflake IDs: use time-based IDs (snowflake: timestamp + server_id + sequence). These are roughly time-ordered without coordination. Downside: clock skew between servers can cause slight reordering. In practice: most chat systems (WhatsApp, iMessage) use server timestamps and clients display messages in server-received order. Client-side timestamps are unreliable (clocks drift, time zones differ).”
}
},
{
“@type”: “Question”,
“name”: “How do you implement message search across chat history?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Full-text search on Cassandra is not supported natively. Index messages in Elasticsearch asynchronously: a Kafka consumer reads all messages and writes to Elasticsearch with fields: conversation_id, sender_id, content (analyzed text), timestamp. For search: query Elasticsearch with the search term filtered by the user’s accessible conversations (only conversations they are a member of). Elasticsearch returns matching message IDs and snippets. Fetch full context from Cassandra (surrounding messages) for display. Privacy: each user can only search within their own conversations — include user_id in the Elasticsearch filter to prevent cross-user access. For compliance (message archiving): retain messages in a separate cold store (S3 + Glue for SQL queries) beyond the Cassandra retention period.”
}
},
{
“@type”: “Question”,
“name”: “How do you scale a chat system to handle 100M concurrent users?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Connection servers: each handles 50,000 WebSocket connections = 2,000 servers for 100M users. Auto-scale based on connection count. Use a load balancer that supports WebSocket (sticky sessions via consistent hashing on user_id — routes the same user to the same server). Message bus: Kafka handles message routing at scale. Partition by conversation_id for ordering. Fan-out service: for group chats with many members, a dedicated fan-out service reads from Kafka and distributes delivery tasks. Cassandra: scale horizontally by adding nodes (linear scalability). Shard by conversation_id. Presence: Redis cluster sharded by user_id. Each shard handles presence for a subset of users. Monitoring: track per-server connection counts, message throughput per Kafka partition, Cassandra write latency.”
}
},
{
“@type”: “Question”,
“name”: “How does end-to-end encryption work in a chat system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “End-to-end encryption (E2EE): the server cannot read message content — only the sender and recipient can decrypt it. Signal Protocol (used by WhatsApp, Signal): each user has a long-term identity key pair and a set of one-time prekeys. On first message: sender fetches recipient’s public keys from the server (key bundle). Sender derives a shared secret using Diffie-Hellman key exchange (X3DH protocol). Both sides independently derive the same encryption key without ever sending it over the network. Messages are encrypted locally with this key before sending. The server stores and forwards ciphertext only. Ratcheting (Double Ratchet algorithm): keys change with each message, so compromising one message’s key does not reveal past or future messages (forward secrecy and break-in recovery).”
}
}
]
}
Asked at: Meta Interview Guide
Asked at: Twitter/X Interview Guide
Asked at: Snap Interview Guide
Asked at: LinkedIn Interview Guide