Question 1

What is a feature store and why does it prevent training-serving skew in fraud detection?

Accepted Answer

Training-serving skew is the most common production ML failure: the model is trained on features computed one way but served using features computed differently, causing the model to behave unexpectedly in production. In fraud detection: a model trained on "transactions in last 5 minutes" computed from historical logs must see the exact same feature at serving time — but the production system computes velocity from Redis counters, not logs. If the computation differs even slightly (e.g., different time bucketing), the model receives out-of-distribution features and performs poorly. A feature store solves this by providing a single source of feature definitions shared between training and serving. Offline store (e.g., S3 + Spark): stores historical feature values for training. Online store (e.g., Redis, DynamoDB): serves low-latency feature values for inference. The same feature pipeline code writes to both stores. When the model requests "5-minute transaction count for card X," it gets the same value whether training or serving. This is foundational to reliable ML systems — Uber, DoorDash, and Stripe all cite feature stores as critical infrastructure.

Question 2

How do you evaluate a fraud detection model and why is accuracy the wrong metric?

Accepted Answer

Accuracy is misleading for imbalanced fraud data. If 0.1% of transactions are fraud, a model that always predicts "not fraud" is 99.9% accurate but catches zero fraud. The correct metrics: Precision = true positives / (true positives + false positives). How many flagged transactions are actually fraud? Low precision = many legitimate transactions blocked (false positives, bad UX). Recall = true positives / (true positives + false negatives). What fraction of actual fraud did we catch? Low recall = fraud slips through. F1 Score = 2 * precision * recall / (precision + recall). Harmonic mean, balances both. In practice: use a precision-recall curve and choose the operating point based on business cost. The cost asymmetry: false positive (blocking a good transaction) costs ~$5 in lost revenue and customer churn. False negative (missing fraud) costs $100+ in chargebacks and liability. This asymmetry means you should accept somewhat lower precision to achieve high recall. Use AUCPR (area under precision-recall curve) as the overall model quality metric for imbalanced datasets, not AUROC.

Question 3

How do velocity checks catch fraud and how do you implement them at 10K transactions/second?

Accepted Answer

Velocity checks flag unusual transaction rates for a card, account, or IP address. Examples: card used 5 times in 2 minutes (rapid sequential fraud), same IP making 100 transactions in an hour (bot activity), card used in 3 countries in 30 minutes (physically impossible travel). Implementation with Redis sliding window counters: for each transaction, increment multiple counters. INCR tx_count:{card_id}:{minute_bucket} with TTL = 75 minutes. To get 5-minute velocity: sum the last 5 minute buckets in O(1) using MGET. Store in Redis: key = "vel:{entity_type}:{entity_id}:{minute_bucket}", value = count. Lookups for 5 entities (card, account, email, IP, device) across 5 time windows = 25 Redis operations per transaction. At 10K TPS: 250K Redis operations/second. A 3-node Redis cluster with replication handles 500K ops/sec easily. The velocity feature vector (5 entities × 5 windows = 25 features) is computed in ~2ms and feeds directly into the ML model as input features.

System Design Interview: Design a Fraud Detection System

What Is a Fraud Detection System?

System Requirements

Functional

Non-Functional

Decision Pipeline

Feature Engineering

Rules Engine

ML Model

Feature Store

Feedback Loop

Account Takeover Detection

Interview Tips