Requirements
A live comments system allows users to post comments on content (articles, videos, livestreams) and see other users’ comments in real time. Scale: YouTube live chat reaches 10,000+ messages/minute for major events. Key challenges: real-time delivery to all viewers, chronological ordering, spam/abuse filtering, and pagination of existing comments without re-fetching the entire history. Functional requirements: post a comment, receive new comments in real time, load previous comments (paginated), upvote/downvote, reply threading, and moderation (delete, mute user). Non-functional: < 500ms latency from post to display for other viewers, 99.9% availability, no message loss.
Real-Time Delivery Architecture
WebSocket connections for real-time comment delivery. Each viewer maintains a persistent WebSocket connection to a comment server. Comments are partitioned by content_id — all viewers of the same content connect to servers subscribed to the same channel. Architecture: comment submission → API server → validates + stores in database → publishes to Redis Pub/Sub channel “comments:{content_id}”. Comment server nodes are subscribed to Redis Pub/Sub for the channels of their active connections. When a new comment is published: all subscribed server nodes receive it via Pub/Sub and push it to their connected viewers. Scaling WebSocket connections: 1M concurrent viewers = 1M persistent connections. At 10,000 connections per server: 100 servers needed. Use a load balancer that supports sticky sessions (or any WebSocket-capable LB like HAProxy, Nginx) to distribute connections. Connection state (which content_id each connection is watching) is stored in the server process memory (not shared state — scale-out by adding more servers).
Comment Storage and Pagination
Schema: Comment: comment_id (UUID), content_id, user_id, parent_id (nullable — for threading), body, created_at, status (ACTIVE, DELETED, FLAGGED), upvotes, downvotes. Index: (content_id, created_at DESC) for chronological feeds. Keyset pagination for live feeds: the client sends cursor = last_seen_comment_id + last_seen_created_at. Query: SELECT * FROM comments WHERE content_id=:id AND status=’ACTIVE’ AND created_at < :cursor_time ORDER BY created_at DESC LIMIT 50. For live streams: the initial load fetches the last N comments. The WebSocket stream delivers new comments as they arrive. The client displays a "Load more" button to fetch older comments on demand — avoiding infinite scroll for fast-moving live chats. Top-level + replies: top-level comments fetched separately from replies. Replies loaded on-demand when a user expands a thread. Parent comment count caching: store reply_count on the parent comment (incremented via trigger or async counter), displayed in the collapsed thread view.
Spam and Abuse Prevention
Rate limiting per user: max 5 comments per 10 seconds per user_id. Store in Redis: INCR rate:{user_id} with TTL 10s. If count > 5: return 429. For unauthenticated users: rate limit by IP. Automated spam detection: (1) Duplicate comment detection: SHA256 hash of comment body. If the same hash appears from the same user within 60 seconds: block as duplicate. Store recent (user_id, body_hash, timestamp) in Redis. (2) Link spam: detect URLs in comments. New accounts ( threshold → auto-remove. Score in gray zone → send to moderation queue. Community reporting: users flag comments. If a comment receives 5+ flags: auto-hide pending review. Moderator dashboard: queue of flagged comments, auto-hidden comments, and comments from new accounts for human review. Moderator actions: approve (unhide), delete, mute user (30 days, 90 days, permanent).
Ordered Delivery and Message Loss
Redis Pub/Sub does not persist messages. If a comment server restarts or a viewer’s connection drops and reconnects: they may miss comments published during the gap. Solution: each comment gets a monotonic sequence number (per content_id): Redis INCR “seq:{content_id}” → seq_number. Stored on the Comment row. On WebSocket reconnect: client sends last_received_seq. Server queries: SELECT * FROM comments WHERE content_id=:id AND seq_number > :last_seq ORDER BY seq_number ASC. Delivers any missed comments. For high-volume live streams: batch the “catchup” query — return all missed in one response, not one by one. Client-side ordering: display comments in seq_number order, not arrival order. Comments can arrive slightly out of order over WebSocket (network). Buffer for 100ms before rendering to handle minor jitter.
Asked at: Twitter/X Interview Guide
Asked at: Snap Interview Guide
Asked at: Netflix Interview Guide
Asked at: Meta Interview Guide