What Is a Feature Store?
A feature store is a centralized platform for computing, storing, serving, and sharing ML features. Without a feature store: every team re-computes the same features (waste), training data uses features computed differently from serving time (training-serving skew), features are unavailable for other models. With a feature store: shared feature definitions, consistency between training and serving, low-latency online lookup. Key components: feature registry (metadata, definitions), offline store (historical features for training), online store (latest features for real-time serving), feature pipeline (computation).
Offline Store
The offline store provides historical features for model training and batch inference. Data warehouse (BigQuery, Snowflake, Redshift) or data lake (S3 + Parquet). Features are stored as time-series: (entity_id, timestamp, feature_value). Point-in-time correct joins: when creating a training dataset, for each training label at timestamp T, fetch feature values as they existed at time T (not the current values). This prevents data leakage (using future information during training). Implementation: for each (entity, label_timestamp) in the training set: find the most recent feature row with timestamp <= label_timestamp. SQL: SELECT f.feature_value FROM features f WHERE f.entity_id = e.entity_id AND f.timestamp <= e.label_timestamp ORDER BY f.timestamp DESC LIMIT 1.
Online Store
The online store serves the latest feature values at low latency for real-time model inference. Storage: Redis (sub-millisecond reads), DynamoDB, or Cassandra. Schema: {entity_id -> {feature_name: value}}. Latency target: under 5ms p99 for feature lookup. Write path: streaming feature pipeline computes features and writes to the online store. Or: a sync job periodically copies the latest values from the offline store to the online store. Freshness: real-time features (user clicked this item 5 minutes ago) require streaming pipelines. Slowly-changing features (user age, account tier) can tolerate daily batch updates. Pre-fetch: for high-traffic entities, pre-populate the online store before the model server requests them.
Feature Computation Pipelines
Batch features: computed by scheduled jobs (Apache Spark, dbt). Examples: user lifetime value, 30-day purchase count. Run daily or hourly. Write to offline store, sync latest values to online store. Streaming features: computed from real-time event streams (Kafka + Flink/Spark Streaming). Examples: items viewed in the last 5 minutes, current cart value, real-time fraud signals. Write directly to the online store. Low-latency features: computed on-demand at serving time from raw data (not pre-computed). Examples: distance between user and restaurant (requires live GPS). Computed in the feature serving layer using a fast in-memory lookup. The feature store supports all three types; the choice depends on the freshness requirement and computation cost.
Training-Serving Skew Prevention
Training-serving skew: the model is trained on features computed one way and served with features computed differently. This is a top cause of model performance degradation in production. Prevention: (1) Use the same feature definitions for both training and serving. The feature registry stores the canonical definition (SQL, Spark, or Python code). Both the training pipeline and serving layer run this same definition. (2) Log served features: when the model serves a prediction, log the feature values used. Use these logged features as training data for future model versions (ensures training data matches serving distribution exactly). (3) Shadow evaluation: during training data generation, compute features using both the current pipeline and the legacy pipeline; alert on discrepancies.
Interview Tips
- Feature stores solve the dual problem of offline (training) and online (serving) consistency — explain both stores.
- Point-in-time correctness is the critical concept that prevents data leakage in training.
- Training-serving skew is the #1 production ML bug — feature stores prevent it by sharing definitions.
- Platforms: Feast (open-source), Tecton (managed), Vertex AI Feature Store (GCP), SageMaker Feature Store (AWS).
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is point-in-time correctness and why does it matter for ML training?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Point-in-time correctness means that when creating a training dataset, each training example uses only feature values that were available at the time the label was generated — no future information. Violation example: predicting whether a user will purchase in the next 7 days. If you join user features using their current values (computed today) rather than their values at the time of prediction, you leak future signals (account upgrades, recent purchases made after the label window). This causes the model to appear accurate in offline evaluation but fail in production. Point-in-time join: for each (entity, label_timestamp) pair, look up the feature value with the latest timestamp <= label_timestamp. Feature stores provide this via temporal joins on the offline store."
}
},
{
"@type": "Question",
"name": "What is the difference between the online store and offline store in a feature store?",
"acceptedAnswer": {
"@type": "Answer",
"text": "The offline store stores historical feature values with timestamps — it is used for model training and batch scoring. Data format: Parquet files in S3 or rows in a data warehouse. Query pattern: range queries over time (give me all feature values for user X between Jan 1 and Mar 1). High throughput, high latency (seconds to minutes). The online store stores only the latest feature value for each entity — it is used for real-time model inference. Data format: key-value store (Redis, DynamoDB). Query pattern: point lookups by entity ID. Low latency (sub-millisecond to 5ms), high throughput (millions of QPS). Write path: streaming pipelines write to both stores simultaneously. Batch pipelines write to the offline store, then a sync job copies the latest values to the online store."
}
},
{
"@type": "Question",
"name": "How does a feature store prevent training-serving skew?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Training-serving skew occurs when features are computed differently at training time vs serving time. Common causes: data preprocessing logic in the training notebook differs from the production feature pipeline; raw data schema changes after training data was generated; time zone bugs in timestamp handling. Prevention: (1) Single source of truth: the feature registry stores the canonical feature definition as code (SQL, PySpark, or Python). Both the training pipeline and serving layer execute this exact code. (2) Logged features: at serving time, log the exact feature values sent to the model. Use these logged features (not recomputed ones) as training data for the next model version. (3) Integration tests: compare feature values computed by the batch pipeline vs the online pipeline for the same entity at the same timestamp. Alert on discrepancies."
}
},
{
"@type": "Question",
"name": "How do you serve features at low latency for real-time ML inference?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Low-latency feature serving requires: (1) Online store in fast key-value storage: Redis for sub-millisecond reads (p99 {age: 28, days_since_purchase: 3, account_tier: GOLD}. (2) Batch pre-fetching: for features that are needed for every request (e.g., user profile features), pre-fetch and cache in the model server memory for the duration of the request. (3) Feature caching at the model server: cache frequently accessed entity features for 60 seconds in local memory to avoid a Redis call on every inference. (4) Asynchronous feature loading: while the user request is being validated, begin loading features in parallel. By the time validation completes, features may already be ready. Target: feature lookup adds under 5ms to inference latency.”
}
},
{
“@type”: “Question”,
“name”: “What are streaming features and when do you need them?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Streaming features are computed from real-time event streams with very low latency — seconds to minutes of freshness. Examples: items a user viewed in the last 5 minutes (recency signal for recommendations), current fraud score based on the last 10 transactions, real-time cart value, number of failed login attempts in the last hour. Computed using a stream processing framework (Apache Flink, Spark Structured Streaming, or Kafka Streams). The feature is computed incrementally as events arrive and written to the online store immediately. Contrast with batch features (computed daily — fine for slowly-changing features like lifetime purchase count or account age). Use streaming when: the feature value changes significantly within minutes, and stale values would meaningfully hurt model accuracy (e.g., fraud detection, real-time recommendations).”
}
}
]
}
Asked at: Databricks Interview Guide
Asked at: Netflix Interview Guide
Asked at: Uber Interview Guide
Asked at: LinkedIn Interview Guide