System Design Interview: Design a Fraud Detection System

What Is a Fraud Detection System?

A fraud detection system identifies and blocks fraudulent transactions, account takeovers, and abuse in real time. Examples: Stripe Radar, PayPal fraud detection, Google reCAPTCHA. Core challenges: sub-100ms decisions on transactions, handling highly imbalanced data (fraud is 0.1% of transactions), and adapting to adversarial actors who continuously evolve their tactics.

  • DoorDash Interview Guide
  • Lyft Interview Guide
  • Meta Interview Guide
  • Airbnb Interview Guide
  • Uber Interview Guide
  • Shopify Interview Guide
  • Coinbase Interview Guide
  • Stripe Interview Guide
  • System Requirements

    Functional

    • Score each transaction in real time: allow, flag for review, or block
    • Account takeover detection: unusual login patterns
    • Rules engine: configurable business rules without code deploys
    • Case management: human review of flagged transactions
    • Model retraining pipeline as new fraud patterns emerge

    Non-Functional

    • 10K transactions/second, decision in <100ms
    • False positive rate <1% (legitimate transactions blocked)
    • High recall on fraud (miss as little fraud as possible)

    Decision Pipeline

    Transaction arrives
           │
      [Blocklist check] ── O(1), known bad cards/IPs → block
           │
      [Rules engine] ── configurable rules, O(ms) → allow/block/flag
           │
      [ML model] ── gradient boosting or neural net, O(10ms) → fraud score
           │
      [Decision] ── score threshold → allow/review queue/block
           │
      [Velocity checks] ── post-decision async update of counters
    

    Feature Engineering

    Fraud models rely on real-time features computed at decision time:

    • Transaction features: amount, merchant category, currency, time of day
    • Velocity features: transactions in last 1/5/60 minutes for this card/account/IP
    • Historical features: average transaction amount for this user (last 30 days), most common merchant categories
    • Network features: how many accounts share this device fingerprint, this IP, this email domain
    • Behavioral features: typing speed, mouse movement entropy (bot vs human)

    Velocity features require real-time counters. Store in Redis: INCR tx_count:{card_id}:{minute_bucket}. TTL = 60 minutes. Query the last 5 buckets to get 5-minute velocity.

    Rules Engine

    Rules are configured by fraud analysts without code deploys. Examples:

    • Block if: country = high-risk AND amount > $500 AND account_age < 7 days
    • Flag if: 3+ transactions in 10 minutes from different countries
    • Allow: if user has 2-year history AND verified bank account AND amount < $100

    Implement as a decision tree evaluated against the transaction feature vector. Store rules in a DB; load into memory on change (hot reload). Rules run before ML to handle obvious cases cheaply.

    ML Model

    Gradient Boosted Decision Trees (XGBoost, LightGBM) work well for tabular fraud data: handle missing values, require no feature normalization, interpretable feature importances. Training data is highly imbalanced (1K fraud vs 999K legit). Techniques: oversampling fraud (SMOTE), undersampling legit, class_weight parameter. Evaluate with precision-recall curve (not accuracy — 99.9% accuracy by always predicting “legit” is meaningless). Optimize for F1 or business-specific cost function (cost of false positive = lost revenue, cost of false negative = fraud loss).

    Feature Store

    Features must be consistent between training and serving. A feature store provides: offline features (batch computed, e.g., 30-day average amount per user) and online features (real-time computed, e.g., velocity in last 5 minutes). At serving time: fetch both from the feature store, concatenate, feed to model. This prevents training-serving skew — the most common ML production failure mode.

    Feedback Loop

    Fraud decisions generate labels: a blocked transaction later confirmed as fraud (true positive) or as legitimate (false positive). These labels feed back into the training pipeline. Weekly retraining cycle: collect last 7 days of labeled decisions, retrain model, A/B test against production model (shadow mode), promote if better. Alert on model performance degradation: if recall drops below threshold, retrain immediately.

    Account Takeover Detection

    Signals: login from new device/IP, login from different country than usual, password reset followed immediately by transaction, simultaneous sessions from different geolocations. On suspicious login: require step-up authentication (SMS OTP, email verification) before allowing transactions.

    Interview Tips

    • Multi-layer pipeline: blocklist → rules → ML — cheap checks first.
    • Feature store prevents training-serving skew — a key production ML concept.
    • Imbalanced data: always discuss SMOTE, class weights, and evaluation metric (not accuracy).
    • Feedback loop closes the system — without it, the model decays as fraud patterns evolve.
    Scroll to Top