System Design Interview: Design a Real-Time Chat Application (WhatsApp/Slack)

What Is a Real-Time Chat System?

A real-time chat system enables persistent messaging between users with delivery guarantees and online presence. Examples: WhatsApp (2B users), Slack (20M DAU), Discord. Core challenges: message ordering and deduplication, online presence at scale, end-to-end encryption, and efficient message storage for billions of conversations.

  • Atlassian Interview Guide
  • Airbnb Interview Guide
  • LinkedIn Interview Guide
  • Twitter Interview Guide
  • Snap Interview Guide
  • Meta Interview Guide
  • System Requirements

    Functional

    • One-on-one and group messages (up to 256 members)
    • Message delivery receipts: sent, delivered, read
    • Online/offline presence indicators
    • Message history: searchable, persistent
    • Media: images, videos, voice messages

    Non-Functional

    • 2B users, 100B messages/day
    • Message delivery latency: <500ms
    • Message ordering guaranteed within a conversation
    • 7-year message retention

    Connection Architecture

    Client ──WebSocket──► Gateway Server (connection layer)
                                  │
                           Message Service ──► Kafka ──► Delivery Workers
                                  │
                             Cassandra (message store)
                                  │
                           Presence Service ──► Redis (online status)
    

    Each client maintains a persistent WebSocket connection to a gateway server. Gateway servers are stateless message routers — they look up which gateway holds the recipient’s connection (via Redis hash: user_id → gateway_server_id) and forward the message. With 2B users and 50K connections per gateway server: 40,000 gateway servers.

    Message Flow

    Sender sends message
      → Gateway accepts, assigns message_id (UUID), timestamps
      → Writes to Kafka (topic: messages, partitioned by conversation_id)
      → Returns acknowledgment to sender (message sent)
      → Delivery worker reads from Kafka
      → Writes message to Cassandra (durable storage)
      → Looks up recipient's gateway server in Redis
      → Pushes message to recipient's WebSocket
      → Recipient's client sends delivery receipt
      → Read receipt propagated back to sender
    

    Message Storage: Cassandra

    messages: (conversation_id, message_id, sender_id, content,
               type, status, created_at)
    PRIMARY KEY (conversation_id, message_id)
    CLUSTERING ORDER BY (message_id DESC)
    

    Cassandra is ideal for chat: wide rows model (all messages for a conversation in one partition), linear write scalability, and fast range reads for message history. Message IDs use time-ordered UUIDs (UUID v1 or Snowflake) for natural chronological ordering within a conversation.

    Message Ordering and Deduplication

    Within a conversation: Kafka partitioning by conversation_id ensures a single consumer processes all messages for a conversation in order. Each message carries a client-generated UUID — if the delivery worker receives the same UUID twice (Kafka at-least-once delivery), it deduplicates using a unique constraint on message_id in Cassandra (INSERT IF NOT EXISTS).

    Presence Service

    Online status stored in Redis: SETEX presence:{user_id} 30 “online”. Clients send a heartbeat every 15 seconds to renew TTL. If TTL expires: user is offline. Presence queries: when opening a conversation, client requests presence for all members. With 500M online users, each updating every 15 seconds: 33M writes/second → shard Redis by user_id.

    Offline Message Delivery

    Recipient is offline: message stored in Cassandra (already done). Send a push notification (APNs/FCM) to the recipient’s device. When recipient comes online: their client connects via WebSocket, fetches unread messages from Cassandra using last_seen_message_id as a cursor.

    Group Messaging

    A group message must be delivered to N members. Options:

    • Fan-out on write: create N delivery tasks in Kafka. Simple, but expensive for large groups.
    • Fan-out on read: store one message, each member’s client fetches on connect. Cheaper writes, but requires per-member read cursors.

    WhatsApp uses fan-out on write for groups up to 256 members (bounded fan-out). Slack uses fan-out on read for large channels (workspace with 10K members would require 10K delivery tasks per message).

    Media Storage

    Images/videos uploaded to S3. Message stores only the S3 URL. On delivery: CDN serves media directly to recipients. Deduplication: hash media content (SHA-256); if already stored, reuse the URL (saves storage for viral memes forwarded millions of times).

    Interview Tips

    • WebSocket + gateway servers for persistent connections is the foundation.
    • Cassandra (conversation_id, message_id) is the canonical chat storage schema.
    • Presence via Redis SETEX with heartbeat renewal is the standard pattern.
    • Distinguish group fan-out on write (small groups) vs read (large channels).

    {
    “@context”: “https://schema.org”,
    “@type”: “FAQPage”,
    “mainEntity”: [
    {
    “@type”: “Question”,
    “name”: “How do you route a message to the correct gateway server holding the recipient's WebSocket connection?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “With thousands of gateway servers each holding different client connections, the message router needs to know which server has the recipient connected. Solution: maintain a presence/routing table in Redis. When a client connects: SETEX conn:{user_id} 300 "gateway-server-42" (TTL refreshed on each heartbeat). When a message arrives for user B: message service does GET conn:{user_id} → "gateway-server-42". It then makes an internal gRPC call to gateway-server-42: DeliverMessage(user_id=B, message=…). Gateway-server-42 looks up user B's WebSocket connection in its local connection map and sends the message. If GET conn:{user_id} returns nil (user offline): store the message in Cassandra and send a push notification (APNs/FCM) instead. This architecture keeps gateway servers stateless from the routing perspective — all routing state lives in Redis, not in individual servers. Adding new gateway servers is just adding more connection capacity; no re-routing is needed for existing connections.” }
    },
    {
    “@type”: “Question”,
    “name”: “How does Cassandra's data model support efficient message history retrieval for chat?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Chat message retrieval has two access patterns: (1) "Load the most recent 50 messages for conversation X" — happens on every conversation open. (2) "Load the next 50 older messages" — infinite scroll backward. Cassandra data model: PRIMARY KEY (conversation_id, message_id) CLUSTERING ORDER BY message_id DESC. This stores all messages for a conversation in a single partition, sorted by message_id in descending order. Query (1): SELECT * FROM messages WHERE conversation_id = X LIMIT 50 — returns the 50 most recent messages in O(1) (single partition, no scatter-gather). Query (2): SELECT * FROM messages WHERE conversation_id = X AND message_id < last_cursor LIMIT 50 — cursor-based pagination. This design works because Cassandra partitions are optimized for sequential reads within a partition key. The message_id should be a time-ordered UUID (Snowflake ID or UUIDv1) so descending order = chronological reverse order. Partitions can grow large (a popular group chat with millions of messages) — Cassandra handles wide rows well, but consider time-bucketing (conversation_id, year_month, message_id) for extremely active conversations.” }
    },
    {
    “@type”: “Question”,
    “name”: “How does message ordering work across devices in a distributed chat system?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Message ordering has two components: sender-to-server ordering (ensuring messages from one sender arrive in order) and global conversation ordering (all participants see messages in the same order). Sender-to-server: each message includes a client_sequence_number incremented per conversation per device. The server detects gaps (if sequence 5 arrives before 4, hold 5 until 4 arrives or a timeout). This handles out-of-order TCP segments (rare but possible). Server-to-database: Kafka partitioning by conversation_id ensures all messages for a conversation are processed by a single consumer, preserving order. Message IDs use Snowflake (timestamp + server_id + sequence) — monotonically increasing, so insertion order = chronological order in Cassandra. Global ordering: all clients fetch messages from Cassandra sorted by message_id. Since all messages go through a single Kafka partition per conversation, there is a total order. The "last seen message_id" cursor ensures clients fetch exactly the messages they missed, in order, with no duplicates or gaps.” }
    }
    ]
    }

    Scroll to Top