What Is a Real-Time Chat System?
A real-time chat system enables persistent messaging between users with delivery guarantees and online presence. Examples: WhatsApp (2B users), Slack (20M DAU), Discord. Core challenges: message ordering and deduplication, online presence at scale, end-to-end encryption, and efficient message storage for billions of conversations.
System Requirements
Functional
- One-on-one and group messages (up to 256 members)
- Message delivery receipts: sent, delivered, read
- Online/offline presence indicators
- Message history: searchable, persistent
- Media: images, videos, voice messages
Non-Functional
- 2B users, 100B messages/day
- Message delivery latency: <500ms
- Message ordering guaranteed within a conversation
- 7-year message retention
Connection Architecture
Client ──WebSocket──► Gateway Server (connection layer)
│
Message Service ──► Kafka ──► Delivery Workers
│
Cassandra (message store)
│
Presence Service ──► Redis (online status)
Each client maintains a persistent WebSocket connection to a gateway server. Gateway servers are stateless message routers — they look up which gateway holds the recipient’s connection (via Redis hash: user_id → gateway_server_id) and forward the message. With 2B users and 50K connections per gateway server: 40,000 gateway servers.
Message Flow
Sender sends message
→ Gateway accepts, assigns message_id (UUID), timestamps
→ Writes to Kafka (topic: messages, partitioned by conversation_id)
→ Returns acknowledgment to sender (message sent)
→ Delivery worker reads from Kafka
→ Writes message to Cassandra (durable storage)
→ Looks up recipient's gateway server in Redis
→ Pushes message to recipient's WebSocket
→ Recipient's client sends delivery receipt
→ Read receipt propagated back to sender
Message Storage: Cassandra
messages: (conversation_id, message_id, sender_id, content,
type, status, created_at)
PRIMARY KEY (conversation_id, message_id)
CLUSTERING ORDER BY (message_id DESC)
Cassandra is ideal for chat: wide rows model (all messages for a conversation in one partition), linear write scalability, and fast range reads for message history. Message IDs use time-ordered UUIDs (UUID v1 or Snowflake) for natural chronological ordering within a conversation.
Message Ordering and Deduplication
Within a conversation: Kafka partitioning by conversation_id ensures a single consumer processes all messages for a conversation in order. Each message carries a client-generated UUID — if the delivery worker receives the same UUID twice (Kafka at-least-once delivery), it deduplicates using a unique constraint on message_id in Cassandra (INSERT IF NOT EXISTS).
Presence Service
Online status stored in Redis: SETEX presence:{user_id} 30 “online”. Clients send a heartbeat every 15 seconds to renew TTL. If TTL expires: user is offline. Presence queries: when opening a conversation, client requests presence for all members. With 500M online users, each updating every 15 seconds: 33M writes/second → shard Redis by user_id.
Offline Message Delivery
Recipient is offline: message stored in Cassandra (already done). Send a push notification (APNs/FCM) to the recipient’s device. When recipient comes online: their client connects via WebSocket, fetches unread messages from Cassandra using last_seen_message_id as a cursor.
Group Messaging
A group message must be delivered to N members. Options:
- Fan-out on write: create N delivery tasks in Kafka. Simple, but expensive for large groups.
- Fan-out on read: store one message, each member’s client fetches on connect. Cheaper writes, but requires per-member read cursors.
WhatsApp uses fan-out on write for groups up to 256 members (bounded fan-out). Slack uses fan-out on read for large channels (workspace with 10K members would require 10K delivery tasks per message).
Media Storage
Images/videos uploaded to S3. Message stores only the S3 URL. On delivery: CDN serves media directly to recipients. Deduplication: hash media content (SHA-256); if already stored, reuse the URL (saves storage for viral memes forwarded millions of times).
Interview Tips
- WebSocket + gateway servers for persistent connections is the foundation.
- Cassandra (conversation_id, message_id) is the canonical chat storage schema.
- Presence via Redis SETEX with heartbeat renewal is the standard pattern.
- Distinguish group fan-out on write (small groups) vs read (large channels).