System Design Interview: Design a Real-Time Chat Application (WhatsApp/Slack)

What Is a Real-Time Chat System?

A real-time chat system enables persistent messaging between users with delivery guarantees and online presence. Examples: WhatsApp (2B users), Slack (20M DAU), Discord. Core challenges: message ordering and deduplication, online presence at scale, end-to-end encryption, and efficient message storage for billions of conversations.

  • Atlassian Interview Guide
  • Airbnb Interview Guide
  • LinkedIn Interview Guide
  • Twitter Interview Guide
  • Snap Interview Guide
  • Meta Interview Guide
  • System Requirements

    Functional

    • One-on-one and group messages (up to 256 members)
    • Message delivery receipts: sent, delivered, read
    • Online/offline presence indicators
    • Message history: searchable, persistent
    • Media: images, videos, voice messages

    Non-Functional

    • 2B users, 100B messages/day
    • Message delivery latency: <500ms
    • Message ordering guaranteed within a conversation
    • 7-year message retention

    Connection Architecture

    Client ──WebSocket──► Gateway Server (connection layer)
                                  │
                           Message Service ──► Kafka ──► Delivery Workers
                                  │
                             Cassandra (message store)
                                  │
                           Presence Service ──► Redis (online status)
    

    Each client maintains a persistent WebSocket connection to a gateway server. Gateway servers are stateless message routers — they look up which gateway holds the recipient’s connection (via Redis hash: user_id → gateway_server_id) and forward the message. With 2B users and 50K connections per gateway server: 40,000 gateway servers.

    Message Flow

    Sender sends message
      → Gateway accepts, assigns message_id (UUID), timestamps
      → Writes to Kafka (topic: messages, partitioned by conversation_id)
      → Returns acknowledgment to sender (message sent)
      → Delivery worker reads from Kafka
      → Writes message to Cassandra (durable storage)
      → Looks up recipient's gateway server in Redis
      → Pushes message to recipient's WebSocket
      → Recipient's client sends delivery receipt
      → Read receipt propagated back to sender
    

    Message Storage: Cassandra

    messages: (conversation_id, message_id, sender_id, content,
               type, status, created_at)
    PRIMARY KEY (conversation_id, message_id)
    CLUSTERING ORDER BY (message_id DESC)
    

    Cassandra is ideal for chat: wide rows model (all messages for a conversation in one partition), linear write scalability, and fast range reads for message history. Message IDs use time-ordered UUIDs (UUID v1 or Snowflake) for natural chronological ordering within a conversation.

    Message Ordering and Deduplication

    Within a conversation: Kafka partitioning by conversation_id ensures a single consumer processes all messages for a conversation in order. Each message carries a client-generated UUID — if the delivery worker receives the same UUID twice (Kafka at-least-once delivery), it deduplicates using a unique constraint on message_id in Cassandra (INSERT IF NOT EXISTS).

    Presence Service

    Online status stored in Redis: SETEX presence:{user_id} 30 “online”. Clients send a heartbeat every 15 seconds to renew TTL. If TTL expires: user is offline. Presence queries: when opening a conversation, client requests presence for all members. With 500M online users, each updating every 15 seconds: 33M writes/second → shard Redis by user_id.

    Offline Message Delivery

    Recipient is offline: message stored in Cassandra (already done). Send a push notification (APNs/FCM) to the recipient’s device. When recipient comes online: their client connects via WebSocket, fetches unread messages from Cassandra using last_seen_message_id as a cursor.

    Group Messaging

    A group message must be delivered to N members. Options:

    • Fan-out on write: create N delivery tasks in Kafka. Simple, but expensive for large groups.
    • Fan-out on read: store one message, each member’s client fetches on connect. Cheaper writes, but requires per-member read cursors.

    WhatsApp uses fan-out on write for groups up to 256 members (bounded fan-out). Slack uses fan-out on read for large channels (workspace with 10K members would require 10K delivery tasks per message).

    Media Storage

    Images/videos uploaded to S3. Message stores only the S3 URL. On delivery: CDN serves media directly to recipients. Deduplication: hash media content (SHA-256); if already stored, reuse the URL (saves storage for viral memes forwarded millions of times).

    Interview Tips

    • WebSocket + gateway servers for persistent connections is the foundation.
    • Cassandra (conversation_id, message_id) is the canonical chat storage schema.
    • Presence via Redis SETEX with heartbeat renewal is the standard pattern.
    • Distinguish group fan-out on write (small groups) vs read (large channels).
    Scroll to Top