What is Event Sourcing?
In traditional systems, the database stores the current state: the orders table has one row per order showing its current status. In event sourcing, the database stores a log of immutable events that caused the current state: OrderPlaced, PaymentReceived, OrderShipped, OrderCancelled. The current state is derived by replaying all events for a given entity. The event log is the source of truth. Events are append-only — never updated or deleted. This is the same idea as a bank ledger: the ledger records every transaction; the account balance is computed by summing all transactions.
Event Store Design
The event store is an append-only log partitioned by aggregate_id (e.g., order_id). Schema: event_id (UUID), aggregate_id, aggregate_type (ORDER, ACCOUNT), event_type (OrderPlaced, PaymentReceived), event_data (JSON), sequence_number (per aggregate, monotonic), occurred_at. The sequence_number ensures events are replayed in order and enables optimistic concurrency control: when writing a new event, assert that the latest sequence_number for this aggregate is N-1. If another process wrote event N first, the write fails with a conflict error (retry or fail fast). This prevents two processes from writing conflicting events simultaneously.
Rebuilding State (Replay)
To get the current state of an aggregate: load all events for that aggregate_id ordered by sequence_number. Apply each event to an initial empty state, transitioning through the state machine. This is the apply() pattern:
class Order:
def apply(self, event):
if event.type == "OrderPlaced":
self.status = "PENDING"
self.items = event.data["items"]
elif event.type == "PaymentReceived":
self.status = "PAID"
elif event.type == "OrderShipped":
self.status = "SHIPPED"
self.tracking_number = event.data["tracking_number"]
elif event.type == "OrderCancelled":
self.status = "CANCELLED"
Snapshots: replaying thousands of events per request is slow. Periodically take a snapshot: serialize the current state and store it with the latest sequence_number. On load: start from the latest snapshot, then replay only events after that sequence_number. Snapshot every 100-500 events.
CQRS: Command Query Responsibility Segregation
CQRS separates the write model (commands that change state) from the read model (queries that return data). The write side: accepts commands (PlaceOrder, CancelOrder), validates business rules, emits events to the event store. The read side: maintains denormalized “projection” tables optimized for specific queries. A projection consumes events from the event store (via Kafka or polling) and updates the read model. Example: an OrderSummaryProjection consumes all order events and maintains a denormalized orders_summary table with precomputed totals, statuses, and customer info — optimized for the order list page query. Different projections can serve different use cases without changing the write model.
Projections and Read Models
Each projection is a separate read database optimized for its query pattern. Projections are eventually consistent: there is a lag between an event being written and the projection being updated (typically milliseconds to seconds). The projection processor tracks its position (last processed event sequence) and can always rebuild from scratch by replaying all events. This makes projections resilient: if a projection becomes corrupted or needs to change its schema, drop and rebuild it from the event log. Multiple projections from the same event stream: order events feed into an OrderSummary projection, an InventoryProjection (track items reserved), an AnalyticsProjection (daily order counts), and a SearchProjection (Elasticsearch documents).
When to Use Event Sourcing
Use event sourcing when: audit trails are a core requirement (financial systems, compliance), you need to replay history to debug or backfill analytics, business processes are naturally modeled as state machines, you anticipate needing multiple read models in the future. Avoid when: the domain is CRUD-heavy with no meaningful state transitions (a simple blog), the team is unfamiliar with the pattern (steep learning curve), eventual consistency in read models is unacceptable.
Interview Tips
- Event sourcing vs. change data capture (CDC): CDC captures changes from an existing relational database (via Debezium reading the transaction log). Event sourcing is designed from the start around events. CDC is a retrofit; event sourcing is first-class.
- Idempotent event processing: projections must handle duplicate events (at-least-once delivery). Track processed event_ids; skip duplicates.
- Temporal queries: “what was the state of this order at 3pm yesterday?” — replay events up to that timestamp. Impossible in a state-only system without audit tables.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between event sourcing and traditional state storage?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Traditional state storage persists the current state of each entity (e.g., the orders table stores one row with the current order status). When the status changes, the row is overwritten. History is lost unless you add audit tables manually. Event sourcing persists the sequence of events that led to the current state (OrderPlaced, PaymentReceived, OrderShipped). The current state is computed by replaying events. History is always available. You can query “what was the state at time T?” by replaying events up to T. The trade-off: event sourcing is more complex (replay logic, snapshot management, eventual consistency in projections) but provides a complete audit trail, temporal queries, and the ability to derive new read models from the same event history.”
}
},
{
“@type”: “Question”,
“name”: “How do snapshots work in event sourcing and when should you use them?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A snapshot is a serialized checkpoint of an aggregate’s state at a specific sequence number. Stored alongside events: snapshot_id, aggregate_id, sequence_number, state_json, created_at. On load: find the latest snapshot for the aggregate, deserialize the state, then replay only events with sequence_number > snapshot.sequence_number. Without snapshots: replaying 10,000 events on every read adds significant latency. With snapshots every 100 events: maximum 100 events are replayed, plus one snapshot deserialization. When to snapshot: when the number of events per aggregate exceeds a few hundred, or when replay latency becomes noticeable. Snapshot frequency is tunable per aggregate type — high-frequency aggregates (like orders with many state transitions) benefit more.”
}
},
{
“@type”: “Question”,
“name”: “How does CQRS improve scalability compared to a standard CRUD architecture?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “In CRUD, the same data model and database serve both writes (commands) and reads (queries). Complex read queries compete with write transactions, causing contention. The schema is a compromise between write normalization and read query efficiency. CQRS separates these: the write side uses a normalized model (or event store) optimized for consistency and transaction integrity. The read side uses denormalized projection tables optimized for specific queries — often pre-joined, pre-aggregated, and indexed exactly for the query patterns. The read and write sides can scale independently: read replicas, different databases (PostgreSQL writes, Elasticsearch reads), or different services. The trade-off is eventual consistency: the read model lags behind the write model by milliseconds to seconds.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle eventual consistency between the event store and read projections?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Projections consume events asynchronously (via Kafka, polling, or database triggers) and update the read model. During the lag window (milliseconds to seconds), the read model may not reflect the latest write. Strategies: (1) Accept eventual consistency for non-critical reads (analytics dashboards, search indexes). (2) For the user who just took an action: use a “read-your-own-writes” token — pass the latest event sequence number in the response; the frontend uses it to poll until the projection catches up. (3) For critical reads (payment confirmation): query the write model directly (bypass the projection) for the specific aggregate just modified. (4) Projection catch-up indicator: track projection lag as a metric; alert when it exceeds a threshold (e.g., 5 seconds).”
}
},
{
“@type”: “Question”,
“name”: “What are common pitfalls when implementing event sourcing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “(1) Schema evolution: events are immutable. When the event schema changes (add a field, rename a field), old events still have the old schema. Solution: upcasters — functions that transform old event versions to the current version during replay. Plan for this from day one. (2) Event explosion: modeling too fine-grained events (e.g., FieldUpdated instead of OrderUpdated). This floods the event store with noise. Model events at the business level (meaningful state transitions), not the field level. (3) Projection rebuild time: if you have 1 billion events, rebuilding a projection takes hours. Mitigate with snapshots and parallel rebuild workers. (4) Forgetting idempotency: at-least-once event delivery means projections can receive duplicate events. Always check if the event was already processed before applying it.”
}
}
]
}
Asked at: Databricks Interview Guide
Asked at: Stripe Interview Guide
Asked at: Coinbase Interview Guide
Asked at: Atlassian Interview Guide