Core Entities
Content: content_id, type (TEXT, IMAGE, VIDEO), author_id, platform, raw_content / storage_url, created_at. ModerationDecision: decision_id, content_id, verdict (APPROVED, REMOVED, ESCALATED), confidence_score, reason_codes[], decided_by (model_id or reviewer_id), decided_at. Appeal: appeal_id, content_id, user_id, reason_text, status (PENDING, UPHELD, OVERTURNED), reviewer_id. ReviewQueue: priority-ordered queue of content awaiting human review.
Moderation Pipeline
Content ingestion → Automated ML classifiers → Threshold decision → Human review queue (if escalated) → Final decision → Action enforcement → Appeal handling.
Automated classifiers: text classifiers (toxicity, spam, hate speech), image classifiers (NSFW, violence, graphic content), video frame sampling + audio transcription. Classifiers return a confidence score [0, 1] per violation category.
Decision Thresholds
Define per-category thresholds: if confidence >= HIGH_THRESHOLD (e.g., 0.95): auto-remove. If confidence >= LOW_THRESHOLD (e.g., 0.6): escalate to human review. If confidence < LOW_THRESHOLD: auto-approve. Thresholds are tunable without code changes (stored in config). Calibrate thresholds using precision/recall trade-off: high precision (few false positives) for auto-removal; lower precision acceptable for escalation (humans catch false positives).
Human Review Queue
Priority queue ordered by: escalation reason severity (CSAM > violence > hate speech > spam), content virality (high-impression content reviewed first), time in queue (prevent starvation). Reviewers are assigned content from their approved categories (CSAM reviewers have specialized training). Track reviewer decisions and inter-rater agreement — flag reviewers with low agreement for calibration. Time-box each review (e.g., 60 seconds) — unreviewable content within time goes back to queue with escalated priority.
Action Enforcement
On REMOVED decision: hide content from feed immediately (soft delete — set is_visible=false). Notify the author with the reason code. Apply strike to the author account. After N strikes within 30 days: temporary suspension. After M total strikes: permanent ban. Strike records are part of the appeal. Action enforcement is a separate service that subscribes to ModerationDecision events — decoupled from the moderation pipeline.
Appeals System
class AppealService:
def submit_appeal(self, content_id, user_id, reason):
content = self.db.get_content(content_id)
if content.author_id != user_id:
raise PermissionError("Can only appeal own content")
existing = self.db.get_appeal(content_id)
if existing and existing.status == AppealStatus.PENDING:
raise ValueError("Appeal already pending")
appeal = Appeal(content_id=content_id, user_id=user_id,
reason=reason, status=AppealStatus.PENDING)
self.db.insert(appeal)
self.queue.enqueue(appeal, priority=Priority.NORMAL)
return appeal
def resolve_appeal(self, appeal_id, reviewer_id, decision):
appeal = self.db.get_appeal(appeal_id)
appeal.status = decision # UPHELD or OVERTURNED
appeal.reviewer_id = reviewer_id
if decision == AppealDecision.OVERTURNED:
self.enforcement.restore_content(appeal.content_id)
self.enforcement.reverse_strike(appeal.user_id)
self.db.update(appeal)
self.notify(appeal)
Scaling Considerations
At 100M posts/day: the automated pipeline must process about 1,200 items/second. Use a Kafka topic per content type; classifier workers auto-scale by consumer lag. ML inference is the bottleneck — GPU workers with batch inference (batch 32-64 items per forward pass) improve throughput significantly. Cache model predictions for duplicate content (hash-based deduplication: same image/text seen before gets the cached verdict). Human review queue is smaller — only 2-5% of content escalates at typical platforms.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you design automated content moderation at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Multi-stage pipeline: (1) Fast pre-filters: hash-based lookup for known bad content (PhotoDNA for CSAM, hash lists for known spam URLs). O(1) lookup, zero ML inference cost. (2) ML classifiers: text toxicity model, image NSFW classifier, video frame sampling + audio transcription. Each returns a confidence score. (3) Threshold routing: high confidence -> auto-action; medium confidence -> human review queue; low confidence -> auto-approve. (4) Human review: priority queue ordered by severity and virality. At 100M posts/day and 3% escalation rate: 3M items/day for human review. With 30-second average review time and 8-hour shifts: about 1000 full-time moderators needed.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent over-removal and under-removal in content moderation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Calibrate classifier thresholds using precision-recall curves. For auto-removal: maximize precision (minimize false positives — incorrectly removed legitimate content causes user trust damage). For human review escalation: maximize recall (catch as many true violations as possible; humans handle false positives). Track precision, recall, false positive rate, and false negative rate per category. Implement a shadow mode: run a new classifier in parallel with the existing one, log its decisions, measure agreement. Promote the new classifier only after measuring acceptable precision/recall. Regular audit sampling: randomly sample auto-approved content for human review to estimate the false negative rate.”
}
},
{
“@type”: “Question”,
“name”: “How does the appeals process work for content moderation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “User submits an appeal with a reason. System checks: is the appeal within the appeal window (e.g., 30 days)? Is there already a pending appeal? Does the user own the content? Valid appeals enter a human review queue with a fresh reviewer (different from the original moderator to reduce anchoring bias). Reviewer sees: original content, original violation reason, user appeal text, author history, and the original moderator decision. Reviewer decides: UPHOLD (removal stands) or OVERTURN (restore content, reverse strike). On overturn: restore content visibility, decrement strike count, notify user. Track overturn rates per moderator — high overturn rates indicate calibration issues.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle repeat offenders in a content moderation system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Implement a strike system with escalating consequences. Track strikes per user with a rolling window (e.g., 90 days). Thresholds: 3 strikes -> 24-hour suspension. 5 strikes -> 7-day suspension. 7 strikes -> permanent ban. Strikes carry different weights by severity (CSAM = immediate permanent ban regardless of history; hate speech = 1 strike; spam = 0.5 strike). When a strike expires (rolling window), decrement the count. Store all strikes with their content_id for transparency in appeals. For ban evasion detection: device fingerprinting, IP clustering, behavioral patterns. New accounts from banned users get elevated scrutiny.”
}
},
{
“@type”: “Question”,
“name”: “How would you scale the human review queue to handle traffic spikes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The review queue is a priority queue stored in a database or Redis sorted set. Priority = f(severity, virality, wait_time). Virality: content seen by >10K users in the last hour gets highest priority — prevent widespread harm. Wait time: increase priority for items waiting > 2 hours (starvation prevention). During traffic spikes (viral events, coordinated attacks): auto-scale the reviewer pool by pulling from adjacent categories after training. Use a contractor workforce on-demand for surge capacity. For extremely high-severity content (credible threats, self-harm): alert an on-call team directly via PagerDuty, bypassing the queue. Batch low-priority items (old spam) into off-peak processing windows.”
}
}
]
}
Asked at: Meta Interview Guide
Asked at: Twitter/X Interview Guide
Asked at: Snap Interview Guide
Asked at: Airbnb Interview Guide