System Design: Real-time Chat and Messaging System (WhatsApp/Slack) — WebSockets, Pub/Sub, Scale

Requirements

Functional: 1-on-1 and group messaging, real-time message delivery, message persistence and history, read receipts (sent/delivered/read), user presence (online/offline/last seen), media attachments, push notifications for offline users.

Non-functional: messages must be delivered in order, at-least-once delivery, low latency (< 100ms for online users), scale to 100M users with billions of messages/day.

Core Architecture

WebSocket vs. Long Polling vs. SSE

  • WebSocket: bidirectional, persistent TCP connection. Best for chat — low overhead per message, server can push at any time. Used by WhatsApp, Slack, Discord.
  • Long Polling: client makes HTTP request, server holds it open until a message arrives (or timeout). Simpler to implement, works through all firewalls. Higher latency, more overhead per message. Fallback for environments that block WebSockets.
  • Server-Sent Events (SSE): server push over HTTP/1.1, unidirectional. Good for notifications but not chat (can’t send from client).

Chat Server Architecture

User A  Chat Server 1  Chat Server 2  User B
                            |                              |
                       [Message Store]              [Message Store]
                            |                              |
                       [Redis Pub/Sub or Kafka]

User A connects to Chat Server 1. User B connects to Chat Server 2 (different node). When A sends a message to B: (1) Chat Server 1 persists the message. (2) Chat Server 1 publishes to Redis Pub/Sub channel user:{B_id}. (3) Chat Server 2 subscribes to that channel and receives the message. (4) Chat Server 2 pushes it to B’s WebSocket connection.

Message Data Model

@dataclass
class Message:
    message_id: str         # UUID, globally unique
    conversation_id: str    # groups messages into a conversation
    sender_id: str
    content: str
    message_type: str       # 'text' | 'image' | 'video' | 'file'
    media_url: Optional[str]
    sent_at: datetime       # client-side timestamp
    server_at: datetime     # server-received timestamp (for ordering)
    sequence_num: int       # per-conversation monotonic sequence for ordering

@dataclass
class Conversation:
    conversation_id: str
    type: str               # 'direct' | 'group'
    participant_ids: List[str]
    created_at: datetime
    last_message_id: Optional[str]
    last_activity: datetime

@dataclass
class MessageStatus:
    message_id: str
    user_id: str
    status: str             # 'delivered' | 'read'
    timestamp: datetime

Message Ordering and Sequencing

Challenge: two users sending simultaneously — which message comes first? Options:

  • Server-assigned sequence number: a sequence service (or database auto-increment) assigns a monotonically increasing sequence_num per conversation. Messages are displayed sorted by sequence_num. Race condition: two near-simultaneous messages from different servers may get sequence numbers out of order.
  • Logical clock (Lamport timestamp): each message has a logical clock value. On send, increment clock; on receive, set clock = max(local, received) + 1. Total ordering across all clients.
  • Client timestamp + sequence number: hybrid — use sequence number for ordering within a session; use server_at for cross-session ordering. Good enough for most chat apps.

Message Storage at Scale

Facebook Messenger uses HBase; WhatsApp uses Mnesia (Erlang); Discord uses Cassandra. Key requirements: write-heavy (every message), sequential reads (load conversation history), range queries (messages since timestamp X).

# Cassandra schema: partition by conversation, cluster by sequence_num
CREATE TABLE messages (
    conversation_id UUID,
    sequence_num    BIGINT,
    message_id      UUID,
    sender_id       UUID,
    content         TEXT,
    sent_at         TIMESTAMP,
    PRIMARY KEY (conversation_id, sequence_num)
) WITH CLUSTERING ORDER BY (sequence_num DESC);
-- Query last 50 messages: SELECT * FROM messages WHERE conversation_id = ? LIMIT 50

User Presence

class PresenceService:
    ONLINE_TTL = 30  # seconds; heartbeat every 15s

    def user_online(self, user_id: str, server_id: str):
        r.setex(f"presence:{user_id}", self.ONLINE_TTL, server_id)
        r.publish(f"presence_updates", f"{user_id}:online")

    def user_offline(self, user_id: str):
        r.delete(f"presence:{user_id}")
        r.set(f"last_seen:{user_id}", datetime.utcnow().isoformat())
        r.publish(f"presence_updates", f"{user_id}:offline")

    def is_online(self, user_id: str) -> bool:
        return r.exists(f"presence:{user_id}") > 0

    def get_server(self, user_id: str) -> Optional[str]:
        """Which chat server is this user connected to?"""
        return r.get(f"presence:{user_id}")

Push Notifications for Offline Users

When a user is offline (no WebSocket connection), fall back to push notifications:

  1. Message arrives at chat server. Check presence: r.exists(f"presence:{recipient_id}").
  2. If online: push via WebSocket through Pub/Sub routing.
  3. If offline: publish a “push_notification” event to Kafka. A notification worker consumes it and sends to FCM (Android) or APNs (iOS).
  4. When the user comes back online, they pull unread messages from the message store (catch-up).

Read Receipts

def mark_delivered(message_id: str, user_id: str):
    db.upsert(MessageStatus(message_id, user_id, 'delivered', datetime.utcnow()))
    # Notify sender
    sender_id = get_message(message_id).sender_id
    push_status_update(sender_id, message_id, 'delivered')

def mark_read(conversation_id: str, user_id: str, up_to_sequence: int):
    """Batch mark all messages up to sequence_num as read."""
    db.execute(
        "UPDATE message_status SET status='read', timestamp=NOW() "
        "WHERE conversation_id=%s AND user_id=%s AND sequence_num <= %s AND status != 'read'",
        [conversation_id, user_id, up_to_sequence]
    )
    push_read_receipt(conversation_id, user_id, up_to_sequence)

Scaling

  • Chat servers: stateless except for WebSocket connections. Use consistent hashing to route a user_id to a specific chat server (sticky sessions for Pub/Sub efficiency). Auto-scale based on connection count.
  • Message fan-out in groups: for a group with 1000 members, sending a message requires 1000 Pub/Sub publishes. Cap group size or use a separate “group delivery” service. For very large groups (> 10K), use server-side fan-out via a precomputed member list stored in Redis.
  • Hot conversations: a very active group chat (10K messages/minute) can overwhelm one Cassandra partition. Shard by (conversation_id, time_bucket) to spread load across multiple partitions.

Asked at: Meta Interview Guide

Asked at: Snap Interview Guide

Asked at: LinkedIn Interview Guide

Asked at: Twitter/X Interview Guide

Asked at: Atlassian Interview Guide

Scroll to Top