Q: How do you implement comment threading (nested replies) efficiently?

Two approaches: adjacency list and closure table. Adjacency list: store parent_id on each comment. To fetch a thread: recursive query or application-side tree reconstruction. PostgreSQL recursive CTEs handle arbitrary depth efficiently. Shallow threading (max 2-3 levels like Reddit): adjacency list with application-side reconstruction works fine. Closure table: store all ancestor-descendant relationships: (ancestor_id, descendant_id, depth). To find all descendants of comment X: SELECT descendant_id FROM comment_closure WHERE ancestor_id=X. Efficiently supports any-depth queries without recursion. More writes on insert, but faster reads for deep trees. For most comment systems: limit threading depth to 2 levels (top-level + one reply level). Deep nesting confuses users and complicates the UI. Implementation: top-level comments fetched separately, replies loaded per parent comment on-demand. Display count of replies on each top-level comment. On expand: fetch all direct children (parent_id=comment_id). Ordering within a thread: chronological (oldest first for threaded discussion), or by upvotes (most popular replies first).

Q: How do you design comment moderation to comply with content regulations like the DSA?

The EU Digital Services Act (DSA) requires large platforms to: (1) Provide transparent content moderation policies and appeals. (2) Remove illegal content within defined timeframes (CSAM: 1 hour; terrorist content: 1 hour; other illegal: within 24h after notice). (3) Maintain records of content moderation decisions with reasoning. (4) Provide meaningful appeals mechanisms for users whose content is removed. Implementation: moderation decision logging: every moderation action (hide, delete, warn, ban) is logged with: moderator_id, reason_code (standardized taxonomy: SPAM, HATE_SPEECH, HARASSMENT, ILLEGAL_CONTENT, etc.), timestamp, evidence (text snippet, screenshot), and the specific policy violated. User notification: when a comment is removed, notify the user with the reason and the relevant policy. Include a link to the appeals process. Appeals: users can submit an appeal. A different moderator reviews it. Decisions can be reversed. All appeals and outcomes are logged. Audit trail: generate compliance reports on request from regulators: "all moderation actions in country X in the past 30 days." This requires the moderation decisions table to be indexed by (country, timestamp, action_type).

Question 1

How do you handle comment ordering in a live stream where millions of messages arrive per minute?

Accepted Answer

At extreme scale (10M+ messages/min for major live events), strict global ordering is impractical -- the latency of coordinating order across all writers exceeds user expectations. Practical approaches: (1) Per-shard ordering: partition the comment stream into shards (by content_id or sub-shards). Within each shard, comments are sequenced by a monotonic counter (Redis INCR). Clients display comments in shard-sequence order. Comments across shards may display slightly out of global order, but users rarely notice. (2) Approximate ordering: assign a server-side timestamp with millisecond precision. Sort by (timestamp_ms, comment_id) for tie-breaking. Small clock skew (100 accounts created in the same 1-hour window post on the same content. High correlation = likely coordinated. (3) Content similarity: multiple accounts posting identical or near-identical negative comments (Levenshtein distance < 5 between messages). Flag for review. (4) IP correlation: multiple accounts from the same /24 IP subnet posting on the same content. Network: if the comment section is embeddable, block embedding on sites known for brigading. Auto-hide: when a target user receives 50+ negative comments in 1 minute: auto-hide the section for that user (not for other viewers). Give the target a "shelter" while moderators respond. Shadowban: reduce the visibility of flagged accounts (their comments are visible only to themselves) rather than hard-banning -- harder to detect and circumvent. Real-time response: have an on-call moderator channel (Slack/PagerDuty) that fires when the comment velocity/brigade detectors trigger.

Question 2

How do you implement comment threading (nested replies) efficiently?

Accepted Answer

Two approaches: adjacency list and closure table. Adjacency list: store parent_id on each comment. To fetch a thread: recursive query or application-side tree reconstruction. PostgreSQL recursive CTEs handle arbitrary depth efficiently. Shallow threading (max 2-3 levels like Reddit): adjacency list with application-side reconstruction works fine. Closure table: store all ancestor-descendant relationships: (ancestor_id, descendant_id, depth). To find all descendants of comment X: SELECT descendant_id FROM comment_closure WHERE ancestor_id=X. Efficiently supports any-depth queries without recursion. More writes on insert, but faster reads for deep trees. For most comment systems: limit threading depth to 2 levels (top-level + one reply level). Deep nesting confuses users and complicates the UI. Implementation: top-level comments fetched separately, replies loaded per parent comment on-demand. Display count of replies on each top-level comment. On expand: fetch all direct children (parent_id=comment_id). Ordering within a thread: chronological (oldest first for threaded discussion), or by upvotes (most popular replies first).

Question 3

How do you design comment moderation to comply with content regulations like the DSA?

Accepted Answer

The EU Digital Services Act (DSA) requires large platforms to: (1) Provide transparent content moderation policies and appeals. (2) Remove illegal content within defined timeframes (CSAM: 1 hour; terrorist content: 1 hour; other illegal: within 24h after notice). (3) Maintain records of content moderation decisions with reasoning. (4) Provide meaningful appeals mechanisms for users whose content is removed. Implementation: moderation decision logging: every moderation action (hide, delete, warn, ban) is logged with: moderator_id, reason_code (standardized taxonomy: SPAM, HATE_SPEECH, HARASSMENT, ILLEGAL_CONTENT, etc.), timestamp, evidence (text snippet, screenshot), and the specific policy violated. User notification: when a comment is removed, notify the user with the reason and the relevant policy. Include a link to the appeals process. Appeals: users can submit an appeal. A different moderator reviews it. Decisions can be reversed. All appeals and outcomes are logged. Audit trail: generate compliance reports on request from regulators: "all moderation actions in country X in the past 30 days." This requires the moderation decisions table to be indexed by (country, timestamp, action_type).

System Design: Live Comments — Real-Time Delivery, Moderation, and Spam Prevention at Scale

Requirements

Real-Time Delivery Architecture

Spam and Abuse Prevention

Ordered Delivery and Message Loss

Requirements

Real-Time Delivery Architecture

Comment Storage and Pagination

Spam and Abuse Prevention

Ordered Delivery and Message Loss