Question 1

What is point-in-time correctness and why does it matter for ML training?

Accepted Answer

Point-in-time correctness means that when creating a training dataset, each training example uses only feature values that were available at the time the label was generated -- no future information. Violation example: predicting whether a user will purchase in the next 7 days. If you join user features using their current values (computed today) rather than their values at the time of prediction, you leak future signals (account upgrades, recent purchases made after the label window). This causes the model to appear accurate in offline evaluation but fail in production. Point-in-time join: for each (entity, label_timestamp) pair, look up the feature value with the latest timestamp

Question 2

What is the difference between the online store and offline store in a feature store?

Accepted Answer

The offline store stores historical feature values with timestamps -- it is used for model training and batch scoring. Data format: Parquet files in S3 or rows in a data warehouse. Query pattern: range queries over time (give me all feature values for user X between Jan 1 and Mar 1). High throughput, high latency (seconds to minutes). The online store stores only the latest feature value for each entity -- it is used for real-time model inference. Data format: key-value store (Redis, DynamoDB). Query pattern: point lookups by entity ID. Low latency (sub-millisecond to 5ms), high throughput (millions of QPS). Write path: streaming pipelines write to both stores simultaneously. Batch pipelines write to the offline store, then a sync job copies the latest values to the online store.

Question 3

How does a feature store prevent training-serving skew?

Accepted Answer

Training-serving skew occurs when features are computed differently at training time vs serving time. Common causes: data preprocessing logic in the training notebook differs from the production feature pipeline; raw data schema changes after training data was generated; time zone bugs in timestamp handling. Prevention: (1) Single source of truth: the feature registry stores the canonical feature definition as code (SQL, PySpark, or Python). Both the training pipeline and serving layer execute this exact code. (2) Logged features: at serving time, log the exact feature values sent to the model. Use these logged features (not recomputed ones) as training data for the next model version. (3) Integration tests: compare feature values computed by the batch pipeline vs the online pipeline for the same entity at the same timestamp. Alert on discrepancies.

Question 4

How do you serve features at low latency for real-time ML inference?

Accepted Answer

Low-latency feature serving requires: (1) Online store in fast key-value storage: Redis for sub-millisecond reads (p99  {age: 28, days_since_purchase: 3, account_tier: GOLD}. (2) Batch pre-fetching: for features that are needed for every request (e.g., user profile features), pre-fetch and cache in the model server memory for the duration of the request. (3) Feature caching at the model server: cache frequently accessed entity features for 60 seconds in local memory to avoid a Redis call on every inference. (4) Asynchronous feature loading: while the user request is being validated, begin loading features in parallel. By the time validation completes, features may already be ready. Target: feature lookup adds under 5ms to inference latency.

Question 5

What are streaming features and when do you need them?

Accepted Answer

Streaming features are computed from real-time event streams with very low latency -- seconds to minutes of freshness. Examples: items a user viewed in the last 5 minutes (recency signal for recommendations), current fraud score based on the last 10 transactions, real-time cart value, number of failed login attempts in the last hour. Computed using a stream processing framework (Apache Flink, Spark Structured Streaming, or Kafka Streams). The feature is computed incrementally as events arrive and written to the online store immediately. Contrast with batch features (computed daily -- fine for slowly-changing features like lifetime purchase count or account age). Use streaming when: the feature value changes significantly within minutes, and stale values would meaningfully hurt model accuracy (e.g., fraud detection, real-time recommendations).

System Design: ML Feature Store — Feature Computation, Storage, Serving, and Point-in-Time Correctness

What Is a Feature Store?

Offline Store

Online Store

Feature Computation Pipelines

Training-Serving Skew Prevention

Interview Tips