System Design Interview: Design WhatsApp / Real-Time Messaging

System Design Interview: Design WhatsApp / Real-Time Messaging

Designing a real-time messaging system like WhatsApp or iMessage is a top-tier system design question. It requires deep knowledge of WebSocket connections, message delivery guarantees, end-to-end encryption, and distributed storage for chat history at massive scale.

Functional Requirements

  • One-on-one and group messaging
  • Message delivery status: sent, delivered, read (double blue ticks)
  • Media sharing: images, videos, documents
  • Online/last-seen status
  • Push notifications for offline users
  • Message history with pagination

Non-Functional Requirements

  • WhatsApp serves 2B+ users; 100B+ messages per day
  • Messages per second: ~1.2M msgs/sec average, peaks 3-5×
  • End-to-end encrypted: server cannot read message content
  • Message ordering guaranteed within a conversation
  • Offline delivery: messages delivered when recipient comes online
  • Low latency: <100ms for online-to-online message delivery

High-Level Architecture

[Client A]                    [Client B]
    │                              │
    │ WebSocket                    │ WebSocket
    ▼                              ▼
[Chat Server A] ◄─── Pub/Sub ──► [Chat Server B]
    │              (Redis/Kafka)       │
    │                                 │
    ▼                                 ▼
[Message Store]              [Notification Service]
 (Cassandra)                  (APNs/FCM/SMS)
    │
[Media Storage]
  (S3 + CDN)

Core Design: WebSocket Connection Management

import asyncio
import json
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import uuid

@dataclass
class Message:
    message_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    sender_id: str = ""
    recipient_id: str = ""     # user_id or group_id
    conversation_id: str = ""
    content: str = ""          # Encrypted ciphertext in real system
    media_url: Optional[str] = None
    timestamp: datetime = field(default_factory=datetime.utcnow)
    message_type: str = "text"  # text, image, video, document
    status: str = "sent"        # sent, delivered, read

class ConnectionManager:
    """Tracks active WebSocket connections per user on this server."""
    def __init__(self):
        self._connections: dict[str, set] = {}  # user_id -> {websocket}
        self._lock = asyncio.Lock()

    async def connect(self, user_id: str, websocket):
        async with self._lock:
            if user_id not in self._connections:
                self._connections[user_id] = set()
            self._connections[user_id].add(websocket)
            await presence_service.set_online(user_id)

    async def disconnect(self, user_id: str, websocket):
        async with self._lock:
            if user_id in self._connections:
                self._connections[user_id].discard(websocket)
                if not self._connections[user_id]:
                    del self._connections[user_id]
                    await presence_service.set_offline(user_id)

    def is_connected(self, user_id: str) -> bool:
        return user_id in self._connections and bool(self._connections[user_id])

    async def send_to_user(self, user_id: str, message: dict) -> bool:
        """Send message to all active connections for user. Returns True if delivered."""
        if not self.is_connected(user_id):
            return False
        payload = json.dumps(message)
        connections = list(self._connections.get(user_id, []))
        for ws in connections:
            try:
                await ws.send(payload)
            except Exception:
                await self.disconnect(user_id, ws)
        return True

Message Flow: Online-to-Online

class ChatServer:
    def __init__(self, server_id: str):
        self.server_id = server_id
        self.conn_mgr = ConnectionManager()
        self.pubsub = RedisPubSub()  # Cross-server message routing

    async def handle_send_message(self, sender_id: str, payload: dict):
        """
        Message flow:
        1. Validate and persist message
        2. Try to deliver directly (same server)
        3. If recipient on different server: publish to Pub/Sub
        4. If recipient offline: enqueue for push notification
        """
        msg = Message(
            sender_id=sender_id,
            recipient_id=payload['recipient_id'],
            conversation_id=payload['conversation_id'],
            content=payload['content'],  # Already E2E encrypted by client
            message_type=payload.get('type', 'text'),
        )

        # 1. Persist (fire-and-forget with acknowledgement)
        await message_store.save(msg)
        await self.conn_mgr.send_to_user(sender_id, {
            'type': 'ack', 'message_id': msg.message_id, 'status': 'sent'
        })

        # 2. Try direct delivery (recipient on this server)
        delivered = await self.conn_mgr.send_to_user(
            msg.recipient_id,
            {'type': 'message', 'message': msg.__dict__}
        )

        if delivered:
            await self._update_status(msg.message_id, 'delivered')
            return

        # 3. Check routing table: which server hosts recipient?
        recipient_server = await routing_table.get_server(msg.recipient_id)
        if recipient_server:
            await self.pubsub.publish(
                channel=f"server:{recipient_server}",
                data={'type': 'route_message', 'message': msg.__dict__}
            )
        else:
            # 4. Recipient offline: push notification
            await notification_service.send_push(
                user_id=msg.recipient_id,
                title=f"New message",
                body="You have a new message",  # Don't leak content
                data={'conversation_id': msg.conversation_id}
            )

Message Storage: Cassandra Schema

-- Messages partitioned by conversation_id
-- Within partition, ordered by timestamp DESC for efficient recent-first reads
CREATE TABLE messages (
    conversation_id UUID,
    message_id      TIMEUUID,   -- Time-ordered UUID (acts as timestamp + unique ID)
    sender_id       UUID,
    content         BLOB,       -- Encrypted ciphertext
    message_type    TEXT,
    media_url       TEXT,
    status          TEXT,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
  AND default_time_to_live = 0;  -- No auto-expiry; app-level retention policies

-- Conversations table
CREATE TABLE conversations (
    user_id         UUID,
    conversation_id UUID,
    other_user_id   UUID,       -- For 1:1 chats
    group_id        UUID,       -- For group chats
    last_message_id TIMEUUID,
    last_read_id    TIMEUUID,
    PRIMARY KEY (user_id, conversation_id)
) WITH CLUSTERING ORDER BY (conversation_id DESC);

Why Cassandra?

  • Write-heavy: 100B msgs/day = ~1.2M writes/sec — Cassandra’s append-only LSM tree is optimal
  • Time-series access pattern: “get last 50 messages for conversation X” maps perfectly to Cassandra’s clustering key ordering
  • Horizontal scaling: consistent hashing distributes conversations across nodes; no hot spots
  • WhatsApp uses Erlang + Mnesia for routing and Cassandra for persistence

Message Delivery Guarantees

class MessageDeliveryService:
    """
    At-least-once delivery with client-side deduplication.
    """
    async def deliver_with_retry(self, message: Message, max_attempts: int = 5):
        for attempt in range(max_attempts):
            try:
                delivered = await self._try_deliver(message)
                if delivered:
                    return
                # Exponential backoff
                await asyncio.sleep(2 ** attempt)
            except Exception:
                if attempt == max_attempts - 1:
                    # Store for delivery when user reconnects
                    await offline_queue.enqueue(message)
                    return

    async def on_user_connect(self, user_id: str):
        """Deliver queued messages when user comes online."""
        pending = await offline_queue.get_all(user_id)
        for msg in pending:
            await self.conn_mgr.send_to_user(user_id, {'type': 'message', 'message': msg})
            await offline_queue.remove(msg.message_id)
            await self._update_status(msg.message_id, 'delivered')

Read Receipts and Presence

class PresenceService:
    def __init__(self, redis_client):
        self.redis = redis_client

    async def set_online(self, user_id: str):
        await self.redis.setex(f"online:{user_id}", 60, "1")  # TTL 60s; refresh on heartbeat
        await self.redis.set(f"last_seen:{user_id}", datetime.utcnow().isoformat())
        # Broadcast online status to active conversations
        await self._broadcast_presence(user_id, online=True)

    async def set_offline(self, user_id: str):
        await self.redis.delete(f"online:{user_id}")
        await self._broadcast_presence(user_id, online=False)

    async def is_online(self, user_id: str) -> bool:
        return bool(await self.redis.get(f"online:{user_id}"))

    async def get_last_seen(self, user_id: str) -> Optional[str]:
        return await self.redis.get(f"last_seen:{user_id}")

class ReadReceiptService:
    async def mark_read(self, user_id: str, conversation_id: str,
                        up_to_message_id: str):
        # Update last_read_id for this user in this conversation
        await conversations_db.update_last_read(user_id, conversation_id, up_to_message_id)

        # Notify message senders (for double blue tick)
        unread_messages = await message_store.get_unread(
            conversation_id, user_id, up_to_message_id
        )
        for msg in unread_messages:
            await self.conn_mgr.send_to_user(msg.sender_id, {
                'type': 'read_receipt',
                'conversation_id': conversation_id,
                'message_id': msg.message_id,
                'read_by': user_id
            })

Group Messaging

class GroupMessageService:
    """
    Group messages require fan-out to all members.
    Strategy: server-side fan-out for small groups (<500 members)
    For large groups (channels): fan-out on read.
    """
    async def send_group_message(self, sender_id: str, group_id: str,
                                  message: Message):
        # Persist once for the group
        await message_store.save(message)

        # Get all group members
        members = await group_service.get_members(group_id)
        members = [m for m in members if m != sender_id]

        # Fan-out delivery to all members
        tasks = [
            self.deliver_to_member(message, member_id)
            for member_id in members
        ]
        await asyncio.gather(*tasks, return_exceptions=True)

    async def deliver_to_member(self, message: Message, member_id: str):
        delivered = await self.conn_mgr.send_to_user(member_id, {
            'type': 'group_message', 'message': message.__dict__
        })
        if not delivered:
            await notification_service.send_push(member_id, message)

End-to-End Encryption (Signal Protocol)

WhatsApp uses the Signal Protocol for E2E encryption. Key concepts:

  • Key exchange: X3DH (Extended Triple Diffie-Hellman) — establishes a shared secret between devices without the server ever seeing it
  • Message encryption: Double Ratchet Algorithm — generates a new encryption key for each message; past messages cannot be decrypted even if current key is compromised (forward secrecy)
  • Server role: Stores and relays ciphertext only. Cannot decrypt content. Stores public keys (identity keys, one-time prekeys) to enable key exchange for offline recipients.

Interview Discussion Points

  • Message ordering: Use vector clocks or logical timestamps (Lamport timestamps) for causal ordering across distributed servers
  • Exactly-once delivery: Client-side deduplication using message_id prevents duplicate display if network retries deliver twice
  • Scaling WebSockets: Each chat server handles ~50K-100K concurrent connections; server routing table (Redis) maps user_id to server_id
  • Media messages: Client uploads directly to S3; sends only the media URL in the message; recipient downloads independently from CDN
  • Message search: Client-side search over locally stored messages (E2E encrypted prevents server-side indexing)

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does WhatsApp deliver messages in real time?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”WhatsApp uses persistent WebSocket connections between clients and chat servers. When user A sends a message, their chat server persists it to Cassandra, acknowledges receipt to the sender, then checks if recipient B is connected to the same server (direct delivery) or a different server (routed via Redis pub/sub). If B is offline, the message is queued and a push notification is sent via APNs/FCM. When B reconnects, queued messages are delivered and delivery receipts sent back to A.”}},{“@type”:”Question”,”name”:”What database does WhatsApp use for message storage?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”WhatsApp uses Cassandra for message storage. Messages are partitioned by conversation_id (all messages in a conversation on the same partition) and clustered by TIMEUUID (time-ordered UUID) in descending order. This maps perfectly to the primary access pattern: “get the last 50 messages for conversation X.” Cassandra’s LSM-tree write model handles 1M+ messages/second. For each message, the server stores ciphertext only — WhatsApp cannot read message content due to end-to-end encryption.”}},{“@type”:”Question”,”name”:”How does WhatsApp implement end-to-end encryption?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”WhatsApp uses the Signal Protocol. Key exchange uses X3DH (Extended Triple Diffie-Hellman) to establish a shared secret between two devices without the server seeing plaintext. Message encryption uses the Double Ratchet Algorithm which derives a new unique key for each message, providing forward secrecy (past messages cannot be decrypted even if current keys are compromised). The server stores and relays ciphertext only; it stores public keys (identity keys, one-time prekeys) to enable offline key exchange.”}},{“@type”:”Question”,”name”:”How does WhatsApp handle group messaging at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For small groups (up to 1024 members in WhatsApp), the server uses fan-out on write: when a message is sent, the server pushes it to all active members’ WebSocket connections and queues it for offline members. The message is stored once in the group’s conversation partition; each member maintains their own read pointer. For very large groups or broadcast channels, fan-out on read is used instead — the message is fetched when members open the conversation, avoiding O(n) write amplification.”}}]}

🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

🏢 Asked at: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

🏢 Asked at: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale

🏢 Asked at: Snap Interview Guide

🏢 Asked at: Apple Interview Guide 2026: iOS Systems, Hardware-Software Integration, and iCloud Architecture

🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

Scroll to Top