System Design Interview: Design WhatsApp / Real-Time Messaging
Designing a real-time messaging system like WhatsApp or iMessage is a top-tier system design question. It requires deep knowledge of WebSocket connections, message delivery guarantees, end-to-end encryption, and distributed storage for chat history at massive scale.
Functional Requirements
- One-on-one and group messaging
- Message delivery status: sent, delivered, read (double blue ticks)
- Media sharing: images, videos, documents
- Online/last-seen status
- Push notifications for offline users
- Message history with pagination
Non-Functional Requirements
- WhatsApp serves 2B+ users; 100B+ messages per day
- Messages per second: ~1.2M msgs/sec average, peaks 3-5×
- End-to-end encrypted: server cannot read message content
- Message ordering guaranteed within a conversation
- Offline delivery: messages delivered when recipient comes online
- Low latency: <100ms for online-to-online message delivery
High-Level Architecture
[Client A] [Client B]
│ │
│ WebSocket │ WebSocket
▼ ▼
[Chat Server A] ◄─── Pub/Sub ──► [Chat Server B]
│ (Redis/Kafka) │
│ │
▼ ▼
[Message Store] [Notification Service]
(Cassandra) (APNs/FCM/SMS)
│
[Media Storage]
(S3 + CDN)
Core Design: WebSocket Connection Management
import asyncio
import json
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import uuid
@dataclass
class Message:
message_id: str = field(default_factory=lambda: str(uuid.uuid4()))
sender_id: str = ""
recipient_id: str = "" # user_id or group_id
conversation_id: str = ""
content: str = "" # Encrypted ciphertext in real system
media_url: Optional[str] = None
timestamp: datetime = field(default_factory=datetime.utcnow)
message_type: str = "text" # text, image, video, document
status: str = "sent" # sent, delivered, read
class ConnectionManager:
"""Tracks active WebSocket connections per user on this server."""
def __init__(self):
self._connections: dict[str, set] = {} # user_id -> {websocket}
self._lock = asyncio.Lock()
async def connect(self, user_id: str, websocket):
async with self._lock:
if user_id not in self._connections:
self._connections[user_id] = set()
self._connections[user_id].add(websocket)
await presence_service.set_online(user_id)
async def disconnect(self, user_id: str, websocket):
async with self._lock:
if user_id in self._connections:
self._connections[user_id].discard(websocket)
if not self._connections[user_id]:
del self._connections[user_id]
await presence_service.set_offline(user_id)
def is_connected(self, user_id: str) -> bool:
return user_id in self._connections and bool(self._connections[user_id])
async def send_to_user(self, user_id: str, message: dict) -> bool:
"""Send message to all active connections for user. Returns True if delivered."""
if not self.is_connected(user_id):
return False
payload = json.dumps(message)
connections = list(self._connections.get(user_id, []))
for ws in connections:
try:
await ws.send(payload)
except Exception:
await self.disconnect(user_id, ws)
return True
Message Flow: Online-to-Online
class ChatServer:
def __init__(self, server_id: str):
self.server_id = server_id
self.conn_mgr = ConnectionManager()
self.pubsub = RedisPubSub() # Cross-server message routing
async def handle_send_message(self, sender_id: str, payload: dict):
"""
Message flow:
1. Validate and persist message
2. Try to deliver directly (same server)
3. If recipient on different server: publish to Pub/Sub
4. If recipient offline: enqueue for push notification
"""
msg = Message(
sender_id=sender_id,
recipient_id=payload['recipient_id'],
conversation_id=payload['conversation_id'],
content=payload['content'], # Already E2E encrypted by client
message_type=payload.get('type', 'text'),
)
# 1. Persist (fire-and-forget with acknowledgement)
await message_store.save(msg)
await self.conn_mgr.send_to_user(sender_id, {
'type': 'ack', 'message_id': msg.message_id, 'status': 'sent'
})
# 2. Try direct delivery (recipient on this server)
delivered = await self.conn_mgr.send_to_user(
msg.recipient_id,
{'type': 'message', 'message': msg.__dict__}
)
if delivered:
await self._update_status(msg.message_id, 'delivered')
return
# 3. Check routing table: which server hosts recipient?
recipient_server = await routing_table.get_server(msg.recipient_id)
if recipient_server:
await self.pubsub.publish(
channel=f"server:{recipient_server}",
data={'type': 'route_message', 'message': msg.__dict__}
)
else:
# 4. Recipient offline: push notification
await notification_service.send_push(
user_id=msg.recipient_id,
title=f"New message",
body="You have a new message", # Don't leak content
data={'conversation_id': msg.conversation_id}
)
Message Storage: Cassandra Schema
-- Messages partitioned by conversation_id
-- Within partition, ordered by timestamp DESC for efficient recent-first reads
CREATE TABLE messages (
conversation_id UUID,
message_id TIMEUUID, -- Time-ordered UUID (acts as timestamp + unique ID)
sender_id UUID,
content BLOB, -- Encrypted ciphertext
message_type TEXT,
media_url TEXT,
status TEXT,
PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
AND default_time_to_live = 0; -- No auto-expiry; app-level retention policies
-- Conversations table
CREATE TABLE conversations (
user_id UUID,
conversation_id UUID,
other_user_id UUID, -- For 1:1 chats
group_id UUID, -- For group chats
last_message_id TIMEUUID,
last_read_id TIMEUUID,
PRIMARY KEY (user_id, conversation_id)
) WITH CLUSTERING ORDER BY (conversation_id DESC);
Why Cassandra?
- Write-heavy: 100B msgs/day = ~1.2M writes/sec — Cassandra’s append-only LSM tree is optimal
- Time-series access pattern: “get last 50 messages for conversation X” maps perfectly to Cassandra’s clustering key ordering
- Horizontal scaling: consistent hashing distributes conversations across nodes; no hot spots
- WhatsApp uses Erlang + Mnesia for routing and Cassandra for persistence
Message Delivery Guarantees
class MessageDeliveryService:
"""
At-least-once delivery with client-side deduplication.
"""
async def deliver_with_retry(self, message: Message, max_attempts: int = 5):
for attempt in range(max_attempts):
try:
delivered = await self._try_deliver(message)
if delivered:
return
# Exponential backoff
await asyncio.sleep(2 ** attempt)
except Exception:
if attempt == max_attempts - 1:
# Store for delivery when user reconnects
await offline_queue.enqueue(message)
return
async def on_user_connect(self, user_id: str):
"""Deliver queued messages when user comes online."""
pending = await offline_queue.get_all(user_id)
for msg in pending:
await self.conn_mgr.send_to_user(user_id, {'type': 'message', 'message': msg})
await offline_queue.remove(msg.message_id)
await self._update_status(msg.message_id, 'delivered')
Read Receipts and Presence
class PresenceService:
def __init__(self, redis_client):
self.redis = redis_client
async def set_online(self, user_id: str):
await self.redis.setex(f"online:{user_id}", 60, "1") # TTL 60s; refresh on heartbeat
await self.redis.set(f"last_seen:{user_id}", datetime.utcnow().isoformat())
# Broadcast online status to active conversations
await self._broadcast_presence(user_id, online=True)
async def set_offline(self, user_id: str):
await self.redis.delete(f"online:{user_id}")
await self._broadcast_presence(user_id, online=False)
async def is_online(self, user_id: str) -> bool:
return bool(await self.redis.get(f"online:{user_id}"))
async def get_last_seen(self, user_id: str) -> Optional[str]:
return await self.redis.get(f"last_seen:{user_id}")
class ReadReceiptService:
async def mark_read(self, user_id: str, conversation_id: str,
up_to_message_id: str):
# Update last_read_id for this user in this conversation
await conversations_db.update_last_read(user_id, conversation_id, up_to_message_id)
# Notify message senders (for double blue tick)
unread_messages = await message_store.get_unread(
conversation_id, user_id, up_to_message_id
)
for msg in unread_messages:
await self.conn_mgr.send_to_user(msg.sender_id, {
'type': 'read_receipt',
'conversation_id': conversation_id,
'message_id': msg.message_id,
'read_by': user_id
})
Group Messaging
class GroupMessageService:
"""
Group messages require fan-out to all members.
Strategy: server-side fan-out for small groups (<500 members)
For large groups (channels): fan-out on read.
"""
async def send_group_message(self, sender_id: str, group_id: str,
message: Message):
# Persist once for the group
await message_store.save(message)
# Get all group members
members = await group_service.get_members(group_id)
members = [m for m in members if m != sender_id]
# Fan-out delivery to all members
tasks = [
self.deliver_to_member(message, member_id)
for member_id in members
]
await asyncio.gather(*tasks, return_exceptions=True)
async def deliver_to_member(self, message: Message, member_id: str):
delivered = await self.conn_mgr.send_to_user(member_id, {
'type': 'group_message', 'message': message.__dict__
})
if not delivered:
await notification_service.send_push(member_id, message)
End-to-End Encryption (Signal Protocol)
WhatsApp uses the Signal Protocol for E2E encryption. Key concepts:
- Key exchange: X3DH (Extended Triple Diffie-Hellman) — establishes a shared secret between devices without the server ever seeing it
- Message encryption: Double Ratchet Algorithm — generates a new encryption key for each message; past messages cannot be decrypted even if current key is compromised (forward secrecy)
- Server role: Stores and relays ciphertext only. Cannot decrypt content. Stores public keys (identity keys, one-time prekeys) to enable key exchange for offline recipients.
Interview Discussion Points
- Message ordering: Use vector clocks or logical timestamps (Lamport timestamps) for causal ordering across distributed servers
- Exactly-once delivery: Client-side deduplication using message_id prevents duplicate display if network retries deliver twice
- Scaling WebSockets: Each chat server handles ~50K-100K concurrent connections; server routing table (Redis) maps user_id to server_id
- Media messages: Client uploads directly to S3; sends only the media URL in the message; recipient downloads independently from CDN
- Message search: Client-side search over locally stored messages (E2E encrypted prevents server-side indexing)
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does WhatsApp deliver messages in real time?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”WhatsApp uses persistent WebSocket connections between clients and chat servers. When user A sends a message, their chat server persists it to Cassandra, acknowledges receipt to the sender, then checks if recipient B is connected to the same server (direct delivery) or a different server (routed via Redis pub/sub). If B is offline, the message is queued and a push notification is sent via APNs/FCM. When B reconnects, queued messages are delivered and delivery receipts sent back to A.”}},{“@type”:”Question”,”name”:”What database does WhatsApp use for message storage?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”WhatsApp uses Cassandra for message storage. Messages are partitioned by conversation_id (all messages in a conversation on the same partition) and clustered by TIMEUUID (time-ordered UUID) in descending order. This maps perfectly to the primary access pattern: “get the last 50 messages for conversation X.” Cassandra’s LSM-tree write model handles 1M+ messages/second. For each message, the server stores ciphertext only — WhatsApp cannot read message content due to end-to-end encryption.”}},{“@type”:”Question”,”name”:”How does WhatsApp implement end-to-end encryption?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”WhatsApp uses the Signal Protocol. Key exchange uses X3DH (Extended Triple Diffie-Hellman) to establish a shared secret between two devices without the server seeing plaintext. Message encryption uses the Double Ratchet Algorithm which derives a new unique key for each message, providing forward secrecy (past messages cannot be decrypted even if current keys are compromised). The server stores and relays ciphertext only; it stores public keys (identity keys, one-time prekeys) to enable offline key exchange.”}},{“@type”:”Question”,”name”:”How does WhatsApp handle group messaging at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For small groups (up to 1024 members in WhatsApp), the server uses fan-out on write: when a message is sent, the server pushes it to all active members’ WebSocket connections and queues it for offline members. The message is stored once in the group’s conversation partition; each member maintains their own read pointer. For very large groups or broadcast channels, fan-out on read is used instead — the message is fetched when members open the conversation, avoiding O(n) write amplification.”}}]}
🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
🏢 Asked at: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale
🏢 Asked at: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale
🏢 Asked at: Snap Interview Guide
🏢 Asked at: Apple Interview Guide 2026: iOS Systems, Hardware-Software Integration, and iCloud Architecture
🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture