System Design Interview: Recommendation Engine at Scale

Recommendation systems power product discovery at Netflix, Spotify, Amazon, and LinkedIn. Designing one at scale involves a multi-stage architecture that balances relevance quality with latency constraints serving millions of users.

Requirements

Functional: Return N personalized item recommendations per user, support real-time signals (recent clicks), handle cold start for new users/items, support multiple contexts (homepage, item detail, email).

Non-functional: < 100ms p99 for recommendation serving, handle 10M DAU, update models daily (batch) and recommendations in near-real-time (stream), support A/B testing of recommendation algorithms.

Multi-Stage Recommendation Architecture

All items (millions)
    │
    ▼
[Candidate Generation]  ← retrieval: narrow to ~1000 candidates
    │  (ANN search, collaborative filtering, content rules)
    │
    ▼
[Pre-Ranking / Filtering]  ← remove seen, blocked, low-quality
    │
    ▼
[Ranking Model]  ← ML model scores each candidate (100ms budget)
    │  (gradient boosted trees, deep learning ranker)
    │
    ▼
[Post-Processing]  ← diversity injection, business rules, A/B
    │
    ▼
Final N recommendations

Candidate Generation: Collaborative Filtering

Matrix Factorization

User-Item interaction matrix R (sparse):
    Items →
U  [5, ?, 3, ?, 1]
s  [?, 4, ?, 5, ?]
e  [1, ?, ?, 3, 5]
r  ...
s
↓
Factorize: R ≈ U × Vᵀ
  U: user embeddings (n_users × k)
  V: item embeddings (n_items × k)
  k = 64-256 latent factors

Prediction: r̂_ui = u_i · v_j (dot product)

Training: minimize Σ(r_ui - u_i · v_j)² + λ(‖u_i‖² + ‖v_j‖²)
  via ALS (Alternating Least Squares) or SGD

Two-Tower Neural Model

User Tower                Item Tower
────────────              ──────────
user_id embedding         item_id embedding
+ user features           + item features
+ recent history          + content features
     │                         │
  Dense layers              Dense layers
     │                         │
  user vector (256d) ──── item vector (256d)
                    cosine similarity → score

Training: contrastive loss (positive pairs from interactions,
          negative sampling from random items)

Two-tower models allow pre-computing item embeddings offline. At serving time, only the user tower runs, and ANN search finds nearest item vectors in milliseconds.

Problem: find top-K closest item vectors to user vector
         from 10M item embeddings in < 10ms

Algorithms:
  HNSW (Hierarchical Navigable Small World):
    - Multi-layer graph; traverse from top layer to bottom
    - ~95% recall at 10x speedup vs brute force
    - Memory: O(n × d × 4 bytes) = 10M × 256 × 4 = 10GB

  FAISS (Facebook AI Similarity Search):
    - Supports IVF (inverted file index): cluster items, search top clusters
    - IVFFlat: exact within clusters, approximate overall
    - GPU acceleration for batch serving

Services: Pinecone, Weaviate, Milvus, Elasticsearch kNN

Ranking Model

Input features per (user, item) pair:
  User features:  age, country, device, account_age, preferences
  Item features:  category, popularity, recency, avg_rating
  Interaction:    historical CTR for this user × category
  Context:        time_of_day, surface (homepage vs search)
  Cross features: user_category_affinity, item_user_overlap

Model: LightGBM (fast, interpretable) or
       Deep & Cross Network (DCN) for feature crosses

Output: P(click), P(purchase), P(watch_completion)
        → weighted combination = ranking score

Serving: pre-score at candidate generation time (batch for
         returning users), real-time for personalization signals

Cold Start Problem

New User Cold Start

  • Onboarding signals: ask users to rate seed items or select interests
  • Demographic-based: serve popular items in user’s country/age cohort
  • Session-based: update recommendations after first 2-3 interactions using session context model (RNN/Transformer over click sequence)

New Item Cold Start

  • Content-based: embed item using text/image features; find nearest existing items in embedding space
  • Exploration injection: insert new items in ranked list with boosted score for first N impressions to gather interaction data
  • Warm-up period: use content features for ranking; switch to collaborative features after 100+ interactions

Real-Time Personalization

User clicks item → Kafka event
  → Feature pipeline: update user session features (Redis, TTL 30min)
  → Stream processor: rerank current recommendation slate
  → A/B framework: route to correct model variant
  → Cache invalidation: bust user's cached recommendations

Architecture:
  Flink consumer → user session store (Redis)
                 → online feature store (Feast/Tecton)
                 → trigger re-ranking if session changed significantly

Diversity and Serendipity

Pure relevance optimization creates filter bubbles and repetitive recommendations. Post-processing injects diversity:

  • MMR (Maximal Marginal Relevance): iteratively select next item maximizing λ × relevance – (1-λ) × similarity to already selected items
  • Category caps: max 2 items per category in top-10
  • Freshness injection: reserve slots for items newer than 7 days
  • Exploration slots: ε-greedy: 5% of slots serve exploratory items to combat popularity bias

Offline Evaluation vs Online Metrics

Metric Stage Measures
Recall@K, NDCG@K Offline Ranking quality on held-out interactions
AUC-ROC Offline Discriminative power of ranker
CTR, CVR Online A/B User engagement
Session length, D7 retention Online A/B Long-term user satisfaction
Catalog coverage Online Diversity of items surfaced

Survivorship bias: offline metrics only evaluate on items users saw (logged policy). Use inverse propensity scoring or counterfactual evaluation to correct for this.

System Design Summary

  • Scale: Two-tower model for retrieval (ANN over 10M item embeddings), LightGBM/DCN for ranking ~1000 candidates
  • Latency: ANN search < 10ms, ranking < 50ms, post-processing < 10ms = 70ms total
  • Freshness: Flink pipeline for real-time user session features; model retrained daily
  • Infrastructure: Kafka → Flink → Feast (feature store) → Serving layer (TF Serving / Triton)

  • Airbnb Interview Guide
  • Databricks Interview Guide
  • Twitter/X Interview Guide
  • Snap Interview Guide
  • Meta Interview Guide
  • LinkedIn Interview Guide
  • Companies That Ask This

    Asked at: Uber Interview Guide

    Scroll to Top