The Problem with Distributed Transactions
In a monolith, a database transaction is atomic: either all changes commit or all roll back. In microservices, an operation spans multiple services with separate databases. Example: placing an order requires: (1) deduct inventory, (2) charge payment, (3) create order record, (4) send confirmation email. If step 3 fails after step 2 succeeds, money is charged but no order exists. Maintaining atomicity across multiple independent databases is the distributed transaction problem.
Two-Phase Commit (2PC)
2PC uses a coordinator to achieve atomicity across multiple participants. Phase 1 (Prepare): coordinator sends “prepare” to all participants. Each participant executes the transaction locally, writes to a durable log, and replies “ready” or “abort”. Phase 2 (Commit): if all replied “ready”, coordinator sends “commit” — all participants finalize. If any replied “abort”, coordinator sends “rollback” — all abort.
Problems: Blocking protocol — if the coordinator fails after prepare but before commit, participants are stuck holding locks indefinitely (they have said “ready” and cannot unilaterally abort). Performance: two network round-trips for every transaction; participants hold locks during both phases. Use in practice: 2PC is used within a single database cluster (PostgreSQL, MySQL for distributed setups) but avoided across microservices. XA protocol is the standard 2PC API supported by most databases.
Saga Pattern
A saga is a sequence of local transactions. Each step has a compensating transaction that undoes its effect. If step N fails, execute compensating transactions for steps N-1, N-2, …, 1 in reverse. No locks held across services — each service completes its local transaction immediately. Eventual consistency: the system may be in an intermediate state between steps.
Two coordination models: Choreography (event-driven): each service publishes an event after its local transaction. Other services listen and react. Order service publishes OrderCreated → Inventory service decrements stock, publishes InventoryReserved → Payment service charges, publishes PaymentCharged → Notification service sends email. On failure: Payment publishes PaymentFailed → Inventory publishes InventoryReleased (compensating). Pro: decoupled. Con: hard to track saga state, distributed logic.
Orchestration: a central saga orchestrator sends commands to each service and waits for responses. Orchestrator tracks state: STARTED → INVENTORY_RESERVED → PAYMENT_CHARGED → ORDER_CREATED → COMPLETED. On failure at any step, orchestrator sends compensating commands. Pro: clear state, easy to monitor and debug. Con: orchestrator is a central point (but not a single point of failure if stateless with persistent state in a DB).
The Outbox Pattern
The dual-write problem: after a local transaction, you must publish an event to Kafka. If the event publish fails after the transaction commits, the event is lost (other services never know the step completed). Outbox pattern: write the event to an outbox table inside the same database transaction as the business operation. A separate outbox processor reads the table and publishes to Kafka, deleting rows after successful publish. This gives exactly-once event publishing relative to the database transaction: if the transaction commits, the event is in the outbox; the publisher will eventually deliver it. If the transaction rolls back, the event is never in the outbox.
Outbox processor: poll the outbox table every 100ms, or use CDC (Change Data Capture) like Debezium to stream changes from the database’s WAL (write-ahead log) to Kafka. CDC is more efficient (no polling) and has lower latency. The outbox table: (id, event_type, aggregate_id, payload JSON, created_at, published_at). Index on published_at IS NULL for efficient polling.
Idempotency Across Services
Sagas may retry failed steps. Each service must be idempotent: receiving the same command twice produces the same result as receiving it once. Technique: include an idempotency_key (saga_id + step_name) in each command. Service stores (idempotency_key, result) in a table. On receipt: check if this key was already processed. If yes, return the stored result. If no, process and store. This allows safe retries without side effects (no double charges, no double reservations).
Interview Tips
- 2PC vs Saga: 2PC gives strong consistency but is slow and blocking. Saga gives eventual consistency but requires idempotency and compensation logic. Use Saga for microservices.
- Compensating transactions: not always possible (cannot un-send an email). Use best-effort compensation (send a “sorry” email) or prevent the issue (only send the email at the last step).
- Outbox pattern is essential for reliable event publishing — always use it when a database transaction must trigger an event.
Asked at: Stripe Interview Guide
Asked at: Shopify Interview Guide
Asked at: Uber Interview Guide
Asked at: Coinbase Interview Guide