Question 1

How do you design automated content moderation at scale?

Accepted Answer

Multi-stage pipeline: (1) Fast pre-filters: hash-based lookup for known bad content (PhotoDNA for CSAM, hash lists for known spam URLs). O(1) lookup, zero ML inference cost. (2) ML classifiers: text toxicity model, image NSFW classifier, video frame sampling + audio transcription. Each returns a confidence score. (3) Threshold routing: high confidence -> auto-action; medium confidence -> human review queue; low confidence -> auto-approve. (4) Human review: priority queue ordered by severity and virality. At 100M posts/day and 3% escalation rate: 3M items/day for human review. With 30-second average review time and 8-hour shifts: about 1000 full-time moderators needed.

Question 2

How do you prevent over-removal and under-removal in content moderation?

Accepted Answer

Calibrate classifier thresholds using precision-recall curves. For auto-removal: maximize precision (minimize false positives -- incorrectly removed legitimate content causes user trust damage). For human review escalation: maximize recall (catch as many true violations as possible; humans handle false positives). Track precision, recall, false positive rate, and false negative rate per category. Implement a shadow mode: run a new classifier in parallel with the existing one, log its decisions, measure agreement. Promote the new classifier only after measuring acceptable precision/recall. Regular audit sampling: randomly sample auto-approved content for human review to estimate the false negative rate.

Question 3

How does the appeals process work for content moderation?

Accepted Answer

User submits an appeal with a reason. System checks: is the appeal within the appeal window (e.g., 30 days)? Is there already a pending appeal? Does the user own the content? Valid appeals enter a human review queue with a fresh reviewer (different from the original moderator to reduce anchoring bias). Reviewer sees: original content, original violation reason, user appeal text, author history, and the original moderator decision. Reviewer decides: UPHOLD (removal stands) or OVERTURN (restore content, reverse strike). On overturn: restore content visibility, decrement strike count, notify user. Track overturn rates per moderator -- high overturn rates indicate calibration issues.

Question 4

How do you handle repeat offenders in a content moderation system?

Accepted Answer

Implement a strike system with escalating consequences. Track strikes per user with a rolling window (e.g., 90 days). Thresholds: 3 strikes -> 24-hour suspension. 5 strikes -> 7-day suspension. 7 strikes -> permanent ban. Strikes carry different weights by severity (CSAM = immediate permanent ban regardless of history; hate speech = 1 strike; spam = 0.5 strike). When a strike expires (rolling window), decrement the count. Store all strikes with their content_id for transparency in appeals. For ban evasion detection: device fingerprinting, IP clustering, behavioral patterns. New accounts from banned users get elevated scrutiny.

Question 5

How would you scale the human review queue to handle traffic spikes?

Accepted Answer

The review queue is a priority queue stored in a database or Redis sorted set. Priority = f(severity, virality, wait_time). Virality: content seen by >10K users in the last hour gets highest priority -- prevent widespread harm. Wait time: increase priority for items waiting > 2 hours (starvation prevention). During traffic spikes (viral events, coordinated attacks): auto-scale the reviewer pool by pulling from adjacent categories after training. Use a contractor workforce on-demand for surge capacity. For extremely high-severity content (credible threats, self-harm): alert an on-call team directly via PagerDuty, bypassing the queue. Batch low-priority items (old spam) into off-peak processing windows.

Low-Level Design: Content Moderation System — Automated Filtering, Human Review, and Appeals

Core Entities

Moderation Pipeline

Decision Thresholds

Human Review Queue

Action Enforcement

Appeals System

Scaling Considerations