Question 1

How do you route a message to the correct WebSocket server?

Accepted Answer

Each chat server holds WebSocket connections for a subset of users. When server A needs to deliver a message to a user connected to server B, it must locate server B. Use a Redis hash: HSET user_connections user:{id} server:{id}. On connect: write useru2192server mapping. On disconnect: delete. To deliver: (1) message arrives at any server via Kafka, (2) lookup the recipient's server_id from Redis, (3) forward the message via internal HTTP or a pub/sub channel to that server, (4) that server pushes to the user's WebSocket. Alternative: use Redis pub/sub -- when server B publishes a message for user X, all servers subscribe to their own channel and forward to the matching connection. This avoids the need for direct server-to-server HTTP calls.

Question 2

How does message ordering work in a distributed chat system?

Accepted Answer

Message ordering is tricky in distributed systems. Two approaches: (1) Server-assigned sequence numbers: the chat server assigns a monotonically increasing sequence number per conversation before writing to Cassandra. All clients see messages in server-assigned order. Downside: requires a distributed counter (Redis INCR, or a sequence table with a lock). (2) Snowflake IDs: use time-based IDs (snowflake: timestamp + server_id + sequence). These are roughly time-ordered without coordination. Downside: clock skew between servers can cause slight reordering. In practice: most chat systems (WhatsApp, iMessage) use server timestamps and clients display messages in server-received order. Client-side timestamps are unreliable (clocks drift, time zones differ).

Question 3

How do you implement message search across chat history?

Accepted Answer

Full-text search on Cassandra is not supported natively. Index messages in Elasticsearch asynchronously: a Kafka consumer reads all messages and writes to Elasticsearch with fields: conversation_id, sender_id, content (analyzed text), timestamp. For search: query Elasticsearch with the search term filtered by the user's accessible conversations (only conversations they are a member of). Elasticsearch returns matching message IDs and snippets. Fetch full context from Cassandra (surrounding messages) for display. Privacy: each user can only search within their own conversations -- include user_id in the Elasticsearch filter to prevent cross-user access. For compliance (message archiving): retain messages in a separate cold store (S3 + Glue for SQL queries) beyond the Cassandra retention period.

Question 4

How do you scale a chat system to handle 100M concurrent users?

Accepted Answer

Connection servers: each handles 50,000 WebSocket connections = 2,000 servers for 100M users. Auto-scale based on connection count. Use a load balancer that supports WebSocket (sticky sessions via consistent hashing on user_id -- routes the same user to the same server). Message bus: Kafka handles message routing at scale. Partition by conversation_id for ordering. Fan-out service: for group chats with many members, a dedicated fan-out service reads from Kafka and distributes delivery tasks. Cassandra: scale horizontally by adding nodes (linear scalability). Shard by conversation_id. Presence: Redis cluster sharded by user_id. Each shard handles presence for a subset of users. Monitoring: track per-server connection counts, message throughput per Kafka partition, Cassandra write latency.

Question 5

How does end-to-end encryption work in a chat system?

Accepted Answer

End-to-end encryption (E2EE): the server cannot read message content -- only the sender and recipient can decrypt it. Signal Protocol (used by WhatsApp, Signal): each user has a long-term identity key pair and a set of one-time prekeys. On first message: sender fetches recipient's public keys from the server (key bundle). Sender derives a shared secret using Diffie-Hellman key exchange (X3DH protocol). Both sides independently derive the same encryption key without ever sending it over the network. Messages are encrypted locally with this key before sending. The server stores and forwards ciphertext only. Ratcheting (Double Ratchet algorithm): keys change with each message, so compromising one message's key does not reveal past or future messages (forward secrecy and break-in recovery).

System Design: Chat Application — Real-Time Messaging, Message Storage, and Presence (WhatsApp/Slack)

Core Requirements

Connection Layer

Message Flow

Message Storage

Group Chats

Presence and Delivery Receipts

Interview Tips