Core Requirements
1:1 messaging and group chats. Real-time delivery (< 100ms latency). Message persistence: users can scroll back through history. Delivery and read receipts. Online/offline presence. Media attachments (images, files). At scale: WhatsApp handles 100 billion messages per day, Slack handles millions of simultaneous WebSocket connections.
Connection Layer
WebSocket connections are the foundation: persistent bidirectional TCP connections for real-time message push. Each client connects to a chat server and maintains the connection for the session duration. Connection servers are stateful (hold the WebSocket). Scale: one server can handle ~50,000 concurrent WebSocket connections. For 100M concurrent users: ~2000 servers. Use consistent hashing to route a user to the same server (or use a connection registry in Redis: user_id → server_id). When a message is sent to user B: look up B’s server in Redis, forward the message to that server, which pushes to B’s WebSocket. If B is offline: skip and store the message for later delivery.
Message Flow
Sender sends message via WebSocket to their connection server. Connection server assigns a message_id (snowflake ID: timestamp + server_id + sequence) and publishes the message to a Kafka topic partitioned by conversation_id. Two consumers: (1) Message storage service: writes to Cassandra (optimized for time-range reads by conversation). (2) Delivery service: reads from Kafka, looks up recipients, pushes to their connection servers (which push to WebSocket). For offline recipients: delivery service queues the message in a push notification queue (APNs/FCM). Message ordering: within a conversation, Cassandra stores messages ordered by (conversation_id, message_id). The snowflake message_id is monotonically increasing per server, guaranteeing order within a server.
Message Storage
Cassandra is the canonical choice for chat storage. Why: (1) high write throughput (millions of messages/sec), (2) time-ordered reads within a partition (conversation history), (3) linear scalability. Schema: primary key (conversation_id, message_id DESC) — fetches most recent messages first. Include: sender_id, content, type (TEXT, IMAGE, FILE, AUDIO), timestamp, status (SENT, DELIVERED, READ). For search: index messages in Elasticsearch asynchronously. For large media: store in S3, reference the URL in the message content. Retention: delete messages older than 7 years (compliance). Archive old data to S3 Glacier for cost savings.
Group Chats
Group chats add fan-out complexity. When a message is sent to a group with N members: deliver to N recipients. At N=10: trivial. At N=10,000 (large Slack channels): fan-out becomes expensive. Two approaches: push model (fan-out on write): for each message, enqueue N delivery tasks. Simple but expensive for large groups. Pull model (fan-out on read): don’t fan-out writes. Each member’s client polls for new messages in the group (cursor-based: “give me messages after message_id X”). Used by Slack for large channels. Hybrid: push for small groups (< 100 members), pull for large. Store a per-member last-read cursor to support read receipts and "unread count".
Presence and Delivery Receipts
Online presence: on WebSocket connect, SET user:{id}:online 1 EX 30 (Redis with 30s TTL). Client sends heartbeat every 10 seconds: EXPIRE user:{id}:online 30. On disconnect: DEL the key. Other users query presence: GET user:{id}:online. For scale: presence is eventually consistent — a user’s online status may be stale by up to 30 seconds (the TTL). Delivery receipts: on message delivery to the recipient’s device: send a delivery ACK message back through the WebSocket. Update message status to DELIVERED in Cassandra. On recipient opens/reads: send a read receipt. Update to READ. Broadcast the status update to the sender’s WebSocket so their UI updates the checkmarks (WhatsApp double-blue-check pattern).
Interview Tips
- WebSocket vs polling: WebSocket for real-time push (chat). Long-polling acceptable for low-frequency updates. SSE (Server-Sent Events) for server→client only (notifications).
- Message ID design: use Snowflake IDs (monotonic, sortable by time) rather than UUIDs for Cassandra partition ordering.
- At-least-once delivery: client retries on timeout with the same message_id. Server deduplicates by message_id on insert.
- End-to-end encryption: keys never leave devices. Server stores ciphertext only. Signal Protocol (used by WhatsApp) handles key exchange.
Asked at: Meta Interview Guide
Asked at: Twitter/X Interview Guide
Asked at: Snap Interview Guide
Asked at: LinkedIn Interview Guide