System Design Interview: Design a Content Moderation System

What Is a Content Moderation System?

A content moderation system detects and removes harmful content (hate speech, spam, CSAM, misinformation, violence) from user-generated content platforms. Examples: Facebook’s moderation at 100B+ posts, YouTube’s content ID system, Twitter’s spam filters. Core challenges: scale (millions of posts per hour), latency (pre-publish blocking vs. post-publish removal), accuracy (minimize false positives that silence legitimate users), and adversarial content (content designed to evade detection).

  • Databricks Interview Guide
  • Cloudflare Interview Guide
  • Netflix Interview Guide
  • Snap Interview Guide
  • Twitter Interview Guide
  • Meta Interview Guide
  • System Requirements

    Functional

    • Classify text/images/video as: safe, borderline, violating
    • Auto-remove high-confidence violations immediately
    • Queue borderline content for human review
    • Appeals: users can contest removal decisions
    • Hash-based detection for known violating content (CSAM hashes)

    Non-Functional

    • 1M posts/hour, <500ms for pre-publish text classification
    • Human reviewers handle 100K items/day
    • False positive rate <0.1% (do not remove legitimate content)

    Multi-Layer Moderation Pipeline

    Content submission
           │
           ▼
    [Layer 1] Hash matching (PhotoDNA, MD5)
       → exact match: block immediately (O(1))
           │
           ▼
    [Layer 2] ML classifier (text: BERT-based, image: CNN)
       → score: high confidence bad → block
       → score: medium confidence → human review queue
       → score: low confidence → allow
           │
           ▼
    [Layer 3] Human review workers (for borderline content)
           │
           ▼
    [Layer 4] Appeals (for removed content)
    

    Hash-Based Detection

    For known illegal content (CSAM), use perceptual hashing (PhotoDNA). Unlike cryptographic hashes (MD5), perceptual hashes are similar for visually similar images — resizing, cropping, or color-adjusting a photo produces a nearly identical perceptual hash. Store known violating hashes in a Bloom filter for O(1) lookup at submission time. The Bloom filter can hold 1B hashes in ~1.2 GB with a 1% false positive rate. Any Bloom filter hit triggers exact hash verification before blocking.

    ML Classification

    Text: fine-tuned BERT or RoBERTa model, running on GPU inference servers. Pre-publish path: synchronous call with 200ms timeout. If timeout: fall back to allow (accept false negatives rather than blocking legitimate content). Post-publish: re-run classification asynchronously with a more expensive model. Image: ResNet/EfficientNet CNN. Video: sample frames at 1fps, classify each, aggregate frame scores. Return confidence score 0-1 with violation categories.

    Human Review Queue

    Borderline content (confidence 0.3-0.8) goes to the review queue. Prioritization: order by: (1) content visibility (viral content reviewed first), (2) severity of potential violation, (3) submission time. Each item shown to a reviewer with context: user history, report count, policy reference. Reviewer actions: remove, allow, mark for policy update. Quality control: 5% of reviewed items sent to a senior reviewer to measure inter-rater agreement. Reviewers with low agreement undergo retraining.

    Signals for Classification

    • Content features: text toxicity, image nudity score, audio transcription
    • User signals: account age, prior violations, follower/following ratio
    • Graph signals: how many of this user’s posts were reported, by whom
    • Velocity signals: posting 100 identical messages in an hour = spam

    Appeals and Feedback Loop

    Users appeal removed content via a form. Appeals go to a senior review queue. If overturned: content restored, model prediction logged as false positive. These false positive examples are added to the training set with correct label. Continuous retraining pipeline ingests reviewer decisions weekly and updates the classifier. This closes the feedback loop — the model improves from human decisions.

    Scaling Human Review

    At 1M posts/hour with a 10% borderline rate, that is 100K items/hour. At 200 items/reviewer/hour, need 500 concurrent reviewers. Geographic distribution: native speakers for non-English content. Outsource to moderation vendors (Accenture, Teleperformance) for scale. Protect reviewer mental health: mandatory breaks, psychological support, session content diversity limits.

    Interview Tips

    • Multi-layer pipeline is the key insight: cheap checks first (hash), expensive last (ML).
    • Perceptual hashing + Bloom filter for known-bad content = O(1) rejection.
    • Pre-publish vs. post-publish trade-off: latency vs. accuracy.
    • Human review and the feedback loop complete the system — don’t design without them.
    Scroll to Top