Requirements
Functional: 1-on-1 and group messaging, real-time message delivery, message persistence and history, read receipts (sent/delivered/read), user presence (online/offline/last seen), media attachments, push notifications for offline users.
Non-functional: messages must be delivered in order, at-least-once delivery, low latency (< 100ms for online users), scale to 100M users with billions of messages/day.
Core Architecture
WebSocket vs. Long Polling vs. SSE
- WebSocket: bidirectional, persistent TCP connection. Best for chat — low overhead per message, server can push at any time. Used by WhatsApp, Slack, Discord.
- Long Polling: client makes HTTP request, server holds it open until a message arrives (or timeout). Simpler to implement, works through all firewalls. Higher latency, more overhead per message. Fallback for environments that block WebSockets.
- Server-Sent Events (SSE): server push over HTTP/1.1, unidirectional. Good for notifications but not chat (can’t send from client).
Chat Server Architecture
User A Chat Server 1 Chat Server 2 User B
| |
[Message Store] [Message Store]
| |
[Redis Pub/Sub or Kafka]
User A connects to Chat Server 1. User B connects to Chat Server 2 (different node). When A sends a message to B: (1) Chat Server 1 persists the message. (2) Chat Server 1 publishes to Redis Pub/Sub channel user:{B_id}. (3) Chat Server 2 subscribes to that channel and receives the message. (4) Chat Server 2 pushes it to B’s WebSocket connection.
Message Data Model
@dataclass
class Message:
message_id: str # UUID, globally unique
conversation_id: str # groups messages into a conversation
sender_id: str
content: str
message_type: str # 'text' | 'image' | 'video' | 'file'
media_url: Optional[str]
sent_at: datetime # client-side timestamp
server_at: datetime # server-received timestamp (for ordering)
sequence_num: int # per-conversation monotonic sequence for ordering
@dataclass
class Conversation:
conversation_id: str
type: str # 'direct' | 'group'
participant_ids: List[str]
created_at: datetime
last_message_id: Optional[str]
last_activity: datetime
@dataclass
class MessageStatus:
message_id: str
user_id: str
status: str # 'delivered' | 'read'
timestamp: datetime
Message Ordering and Sequencing
Challenge: two users sending simultaneously — which message comes first? Options:
- Server-assigned sequence number: a sequence service (or database auto-increment) assigns a monotonically increasing sequence_num per conversation. Messages are displayed sorted by sequence_num. Race condition: two near-simultaneous messages from different servers may get sequence numbers out of order.
- Logical clock (Lamport timestamp): each message has a logical clock value. On send, increment clock; on receive, set clock = max(local, received) + 1. Total ordering across all clients.
- Client timestamp + sequence number: hybrid — use sequence number for ordering within a session; use server_at for cross-session ordering. Good enough for most chat apps.
Message Storage at Scale
Facebook Messenger uses HBase; WhatsApp uses Mnesia (Erlang); Discord uses Cassandra. Key requirements: write-heavy (every message), sequential reads (load conversation history), range queries (messages since timestamp X).
# Cassandra schema: partition by conversation, cluster by sequence_num
CREATE TABLE messages (
conversation_id UUID,
sequence_num BIGINT,
message_id UUID,
sender_id UUID,
content TEXT,
sent_at TIMESTAMP,
PRIMARY KEY (conversation_id, sequence_num)
) WITH CLUSTERING ORDER BY (sequence_num DESC);
-- Query last 50 messages: SELECT * FROM messages WHERE conversation_id = ? LIMIT 50
User Presence
class PresenceService:
ONLINE_TTL = 30 # seconds; heartbeat every 15s
def user_online(self, user_id: str, server_id: str):
r.setex(f"presence:{user_id}", self.ONLINE_TTL, server_id)
r.publish(f"presence_updates", f"{user_id}:online")
def user_offline(self, user_id: str):
r.delete(f"presence:{user_id}")
r.set(f"last_seen:{user_id}", datetime.utcnow().isoformat())
r.publish(f"presence_updates", f"{user_id}:offline")
def is_online(self, user_id: str) -> bool:
return r.exists(f"presence:{user_id}") > 0
def get_server(self, user_id: str) -> Optional[str]:
"""Which chat server is this user connected to?"""
return r.get(f"presence:{user_id}")
Push Notifications for Offline Users
When a user is offline (no WebSocket connection), fall back to push notifications:
- Message arrives at chat server. Check presence:
r.exists(f"presence:{recipient_id}"). - If online: push via WebSocket through Pub/Sub routing.
- If offline: publish a “push_notification” event to Kafka. A notification worker consumes it and sends to FCM (Android) or APNs (iOS).
- When the user comes back online, they pull unread messages from the message store (catch-up).
Read Receipts
def mark_delivered(message_id: str, user_id: str):
db.upsert(MessageStatus(message_id, user_id, 'delivered', datetime.utcnow()))
# Notify sender
sender_id = get_message(message_id).sender_id
push_status_update(sender_id, message_id, 'delivered')
def mark_read(conversation_id: str, user_id: str, up_to_sequence: int):
"""Batch mark all messages up to sequence_num as read."""
db.execute(
"UPDATE message_status SET status='read', timestamp=NOW() "
"WHERE conversation_id=%s AND user_id=%s AND sequence_num <= %s AND status != 'read'",
[conversation_id, user_id, up_to_sequence]
)
push_read_receipt(conversation_id, user_id, up_to_sequence)
Scaling
- Chat servers: stateless except for WebSocket connections. Use consistent hashing to route a user_id to a specific chat server (sticky sessions for Pub/Sub efficiency). Auto-scale based on connection count.
- Message fan-out in groups: for a group with 1000 members, sending a message requires 1000 Pub/Sub publishes. Cap group size or use a separate “group delivery” service. For very large groups (> 10K), use server-side fan-out via a precomputed member list stored in Redis.
- Hot conversations: a very active group chat (10K messages/minute) can overwhelm one Cassandra partition. Shard by (conversation_id, time_bucket) to spread load across multiple partitions.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “Why use WebSockets instead of HTTP polling for real-time chat?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “HTTP polling: client sends a request every N seconds regardless of whether there are new messages. At 1-second polling with 1M users = 1M requests/second of load, mostly returning empty responses. WebSockets establish a persistent TCP connection u2014 the server pushes messages only when they exist. No polling overhead. Bidirectional: client can send messages without a new HTTP handshake. Round-trip latency is 10-50ms vs 500-1000ms for polling. The trade-off: WebSocket connections are stateful (each server holds connection state), complicating horizontal scaling. Long polling is a middle ground: works everywhere but has higher latency and overhead than WebSockets.”
}
},
{
“@type”: “Question”,
“name”: “How do you route a message to a user connected to a different chat server?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use Redis Pub/Sub as a message bus between chat servers. When User A sends a message to User B: (1) Chat Server 1 (A’s server) stores the message. (2) It looks up which server B is connected to (Redis: GET presence:{B_id} returns server_id). (3) It publishes the message to a Redis channel (e.g., user:{B_id}). (4) Chat Server 2 (B’s server) subscribes to user:{B_id} and receives the message. (5) It pushes the message to B’s WebSocket. For very high scale (millions of active users), use Kafka instead of Redis Pub/Sub u2014 Kafka is more durable and scales to millions of topics.”
}
},
{
“@type”: “Question”,
“name”: “How does a chat app handle messages sent while the recipient is offline?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a message is sent, the system checks presence (Redis TTL key): if the key is missing, the user is offline. The message is stored in the message database (always). A Kafka event is published for the notification service. The notification service sends a push notification via FCM (Android) or APNs (iOS) with the message preview. When the user comes back online, their client sends the last seen message ID. The server returns all messages with sequence_num > last_seen from the message store. This “catch-up” pull ensures no messages are missed, even if multiple push notifications were received.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement message ordering in a distributed chat system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Assign a sequence number per conversation. Option 1: database auto-increment u2014 the message INSERT returns the auto-incremented ID which serves as the sequence number. Atomic and consistent but creates a write bottleneck for very high-volume group chats. Option 2: Redis atomic counter u2014 INCR conv:{id}:seq before inserting. Fast and distributed. Option 3: Snowflake ID (timestamp + machine ID + sequence) u2014 globally unique, roughly time-ordered, no central coordinator needed. For display: sort by sequence_num. Clients show messages in sequence order; if a message arrives out of order (race condition), buffer briefly and re-sort.”
}
},
{
“@type”: “Question”,
“name”: “How does WhatsApp scale to billions of users with persistent message storage?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “WhatsApp uses Mnesia (Erlang distributed database) for presence and session data, and YAWS (Erlang web server) for connection handling. Messages are stored in Cassandra (wide column store) partitioned by conversation_id. Erlang’s actor model (one lightweight process per connection) handles millions of concurrent WebSocket connections on a single server. For storage: each user’s message history is partitioned by (user_id, year_month). WhatsApp is known for extreme efficiency u2014 at peak, it ran ~990M users on ~32 servers before the Facebook acquisition significantly changed the architecture.”
}
}
]
}
Asked at: Meta Interview Guide
Asked at: Snap Interview Guide
Asked at: LinkedIn Interview Guide
Asked at: Twitter/X Interview Guide
Asked at: Atlassian Interview Guide