What Is a Fraud Detection System?
A fraud detection system identifies and blocks fraudulent transactions, account takeovers, and abuse in real time. Examples: Stripe Radar, PayPal fraud detection, Google reCAPTCHA. Core challenges: sub-100ms decisions on transactions, handling highly imbalanced data (fraud is 0.1% of transactions), and adapting to adversarial actors who continuously evolve their tactics.
System Requirements
Functional
- Score each transaction in real time: allow, flag for review, or block
- Account takeover detection: unusual login patterns
- Rules engine: configurable business rules without code deploys
- Case management: human review of flagged transactions
- Model retraining pipeline as new fraud patterns emerge
Non-Functional
- 10K transactions/second, decision in <100ms
- False positive rate <1% (legitimate transactions blocked)
- High recall on fraud (miss as little fraud as possible)
Decision Pipeline
Transaction arrives
│
[Blocklist check] ── O(1), known bad cards/IPs → block
│
[Rules engine] ── configurable rules, O(ms) → allow/block/flag
│
[ML model] ── gradient boosting or neural net, O(10ms) → fraud score
│
[Decision] ── score threshold → allow/review queue/block
│
[Velocity checks] ── post-decision async update of counters
Feature Engineering
Fraud models rely on real-time features computed at decision time:
- Transaction features: amount, merchant category, currency, time of day
- Velocity features: transactions in last 1/5/60 minutes for this card/account/IP
- Historical features: average transaction amount for this user (last 30 days), most common merchant categories
- Network features: how many accounts share this device fingerprint, this IP, this email domain
- Behavioral features: typing speed, mouse movement entropy (bot vs human)
Velocity features require real-time counters. Store in Redis: INCR tx_count:{card_id}:{minute_bucket}. TTL = 60 minutes. Query the last 5 buckets to get 5-minute velocity.
Rules Engine
Rules are configured by fraud analysts without code deploys. Examples:
- Block if: country = high-risk AND amount > $500 AND account_age < 7 days
- Flag if: 3+ transactions in 10 minutes from different countries
- Allow: if user has 2-year history AND verified bank account AND amount < $100
Implement as a decision tree evaluated against the transaction feature vector. Store rules in a DB; load into memory on change (hot reload). Rules run before ML to handle obvious cases cheaply.
ML Model
Gradient Boosted Decision Trees (XGBoost, LightGBM) work well for tabular fraud data: handle missing values, require no feature normalization, interpretable feature importances. Training data is highly imbalanced (1K fraud vs 999K legit). Techniques: oversampling fraud (SMOTE), undersampling legit, class_weight parameter. Evaluate with precision-recall curve (not accuracy — 99.9% accuracy by always predicting “legit” is meaningless). Optimize for F1 or business-specific cost function (cost of false positive = lost revenue, cost of false negative = fraud loss).
Feature Store
Features must be consistent between training and serving. A feature store provides: offline features (batch computed, e.g., 30-day average amount per user) and online features (real-time computed, e.g., velocity in last 5 minutes). At serving time: fetch both from the feature store, concatenate, feed to model. This prevents training-serving skew — the most common ML production failure mode.
Feedback Loop
Fraud decisions generate labels: a blocked transaction later confirmed as fraud (true positive) or as legitimate (false positive). These labels feed back into the training pipeline. Weekly retraining cycle: collect last 7 days of labeled decisions, retrain model, A/B test against production model (shadow mode), promote if better. Alert on model performance degradation: if recall drops below threshold, retrain immediately.
Account Takeover Detection
Signals: login from new device/IP, login from different country than usual, password reset followed immediately by transaction, simultaneous sessions from different geolocations. On suspicious login: require step-up authentication (SMS OTP, email verification) before allowing transactions.
Interview Tips
- Multi-layer pipeline: blocklist → rules → ML — cheap checks first.
- Feature store prevents training-serving skew — a key production ML concept.
- Imbalanced data: always discuss SMOTE, class weights, and evaluation metric (not accuracy).
- Feedback loop closes the system — without it, the model decays as fraud patterns evolve.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a feature store and why does it prevent training-serving skew in fraud detection?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Training-serving skew is the most common production ML failure: the model is trained on features computed one way but served using features computed differently, causing the model to behave unexpectedly in production. In fraud detection: a model trained on "transactions in last 5 minutes" computed from historical logs must see the exact same feature at serving time — but the production system computes velocity from Redis counters, not logs. If the computation differs even slightly (e.g., different time bucketing), the model receives out-of-distribution features and performs poorly. A feature store solves this by providing a single source of feature definitions shared between training and serving. Offline store (e.g., S3 + Spark): stores historical feature values for training. Online store (e.g., Redis, DynamoDB): serves low-latency feature values for inference. The same feature pipeline code writes to both stores. When the model requests "5-minute transaction count for card X," it gets the same value whether training or serving. This is foundational to reliable ML systems — Uber, DoorDash, and Stripe all cite feature stores as critical infrastructure.” }
},
{
“@type”: “Question”,
“name”: “How do you evaluate a fraud detection model and why is accuracy the wrong metric?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Accuracy is misleading for imbalanced fraud data. If 0.1% of transactions are fraud, a model that always predicts "not fraud" is 99.9% accurate but catches zero fraud. The correct metrics: Precision = true positives / (true positives + false positives). How many flagged transactions are actually fraud? Low precision = many legitimate transactions blocked (false positives, bad UX). Recall = true positives / (true positives + false negatives). What fraction of actual fraud did we catch? Low recall = fraud slips through. F1 Score = 2 * precision * recall / (precision + recall). Harmonic mean, balances both. In practice: use a precision-recall curve and choose the operating point based on business cost. The cost asymmetry: false positive (blocking a good transaction) costs ~$5 in lost revenue and customer churn. False negative (missing fraud) costs $100+ in chargebacks and liability. This asymmetry means you should accept somewhat lower precision to achieve high recall. Use AUCPR (area under precision-recall curve) as the overall model quality metric for imbalanced datasets, not AUROC.” }
},
{
“@type”: “Question”,
“name”: “How do velocity checks catch fraud and how do you implement them at 10K transactions/second?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Velocity checks flag unusual transaction rates for a card, account, or IP address. Examples: card used 5 times in 2 minutes (rapid sequential fraud), same IP making 100 transactions in an hour (bot activity), card used in 3 countries in 30 minutes (physically impossible travel). Implementation with Redis sliding window counters: for each transaction, increment multiple counters. INCR tx_count:{card_id}:{minute_bucket} with TTL = 75 minutes. To get 5-minute velocity: sum the last 5 minute buckets in O(1) using MGET. Store in Redis: key = "vel:{entity_type}:{entity_id}:{minute_bucket}", value = count. Lookups for 5 entities (card, account, email, IP, device) across 5 time windows = 25 Redis operations per transaction. At 10K TPS: 250K Redis operations/second. A 3-node Redis cluster with replication handles 500K ops/sec easily. The velocity feature vector (5 entities × 5 windows = 25 features) is computed in ~2ms and feeds directly into the ML model as input features.” }
}
]
}