Question 1

What is the difference between event sourcing and traditional state storage?

Accepted Answer

Traditional state storage persists the current state of each entity (e.g., the orders table stores one row with the current order status). When the status changes, the row is overwritten. History is lost unless you add audit tables manually. Event sourcing persists the sequence of events that led to the current state (OrderPlaced, PaymentReceived, OrderShipped). The current state is computed by replaying events. History is always available. You can query "what was the state at time T?" by replaying events up to T. The trade-off: event sourcing is more complex (replay logic, snapshot management, eventual consistency in projections) but provides a complete audit trail, temporal queries, and the ability to derive new read models from the same event history.

Question 2

How do snapshots work in event sourcing and when should you use them?

Accepted Answer

A snapshot is a serialized checkpoint of an aggregate's state at a specific sequence number. Stored alongside events: snapshot_id, aggregate_id, sequence_number, state_json, created_at. On load: find the latest snapshot for the aggregate, deserialize the state, then replay only events with sequence_number > snapshot.sequence_number. Without snapshots: replaying 10,000 events on every read adds significant latency. With snapshots every 100 events: maximum 100 events are replayed, plus one snapshot deserialization. When to snapshot: when the number of events per aggregate exceeds a few hundred, or when replay latency becomes noticeable. Snapshot frequency is tunable per aggregate type -- high-frequency aggregates (like orders with many state transitions) benefit more.

Question 3

How does CQRS improve scalability compared to a standard CRUD architecture?

Accepted Answer

In CRUD, the same data model and database serve both writes (commands) and reads (queries). Complex read queries compete with write transactions, causing contention. The schema is a compromise between write normalization and read query efficiency. CQRS separates these: the write side uses a normalized model (or event store) optimized for consistency and transaction integrity. The read side uses denormalized projection tables optimized for specific queries -- often pre-joined, pre-aggregated, and indexed exactly for the query patterns. The read and write sides can scale independently: read replicas, different databases (PostgreSQL writes, Elasticsearch reads), or different services. The trade-off is eventual consistency: the read model lags behind the write model by milliseconds to seconds.

Question 4

How do you handle eventual consistency between the event store and read projections?

Accepted Answer

Projections consume events asynchronously (via Kafka, polling, or database triggers) and update the read model. During the lag window (milliseconds to seconds), the read model may not reflect the latest write. Strategies: (1) Accept eventual consistency for non-critical reads (analytics dashboards, search indexes). (2) For the user who just took an action: use a "read-your-own-writes" token -- pass the latest event sequence number in the response; the frontend uses it to poll until the projection catches up. (3) For critical reads (payment confirmation): query the write model directly (bypass the projection) for the specific aggregate just modified. (4) Projection catch-up indicator: track projection lag as a metric; alert when it exceeds a threshold (e.g., 5 seconds).

Question 5

What are common pitfalls when implementing event sourcing?

Accepted Answer

(1) Schema evolution: events are immutable. When the event schema changes (add a field, rename a field), old events still have the old schema. Solution: upcasters -- functions that transform old event versions to the current version during replay. Plan for this from day one. (2) Event explosion: modeling too fine-grained events (e.g., FieldUpdated instead of OrderUpdated). This floods the event store with noise. Model events at the business level (meaningful state transitions), not the field level. (3) Projection rebuild time: if you have 1 billion events, rebuilding a projection takes hours. Mitigate with snapshots and parallel rebuild workers. (4) Forgetting idempotency: at-least-once event delivery means projections can receive duplicate events. Always check if the event was already processed before applying it.

System Design: Event Sourcing and CQRS — Append-Only Events, Projections, and Read Models

What is Event Sourcing?

Event Store Design

Rebuilding State (Replay)

CQRS: Command Query Responsibility Segregation

Projections and Read Models

When to Use Event Sourcing

Interview Tips