System Design Interview: Distributed Transactions, 2PC, and the Saga Pattern

System Design Interview: Distributed Transactions, 2PC, and the Saga Pattern

When a business operation spans multiple microservices or databases, maintaining consistency is hard. ACID transactions do not cross service boundaries. This guide covers the two main patterns for distributed transactions — Two-Phase Commit (2PC) and the Saga Pattern — along with the trade-offs that make one or the other the right choice.

Why Distributed Transactions Are Hard

Consider an e-commerce order: deduct inventory, charge the credit card, create the order record. If these live in separate services, any step can fail after the previous ones succeeded. Without coordination:

  • Inventory deducted, payment fails → customer cannot pay, item held indefinitely
  • Payment charged, order creation crashes → money taken, no order created

We need atomicity across services: either all steps succeed or all are rolled back.

Two-Phase Commit (2PC)

2PC uses a coordinator to achieve distributed atomicity:

Phase 1 — Prepare:
  Coordinator → "Can you commit?" → Service A, Service B, Service C
  Each service: acquire locks, write to WAL, respond "Yes" or "No"

Phase 2 — Commit/Abort:
  If all Yes: Coordinator → "Commit!" → all services apply the changes, release locks
  If any No:  Coordinator → "Abort!"  → all services roll back

Strengths: strong consistency, atomic commit across all participants.

Weaknesses:

  • Blocking: if the coordinator crashes after Phase 1 but before Phase 2, participants hold locks indefinitely — the system is stuck
  • Availability: all participants must be available. One unavailable service blocks the entire transaction
  • Latency: two round trips across all participants adds significant overhead
  • Not microservice-friendly: requires all services to implement the 2PC protocol and share a coordinator

Use 2PC only when you control all participants and can tolerate reduced availability. Used by: distributed databases (Spanner, CockroachDB) internally, XA transactions in Java EE.

The Saga Pattern

A Saga breaks a distributed transaction into a sequence of local transactions, each with a compensating transaction that undoes it if something later fails.

Order Flow:
  Step 1: Reserve inventory  → compensate: release inventory
  Step 2: Charge payment     → compensate: issue refund
  Step 3: Create order       → compensate: cancel order

Success path: 1 → 2 → 3 ✓
Failure at Step 3: execute compensations in reverse: refund (2) → release inventory (1)

Each step is a local ACID transaction on a single service. No distributed lock. No coordinator blocking.

Choreography-Based Saga

Services communicate via events. Each service listens for events, performs its step, and publishes the next event:

Order Service publishes: OrderPlaced
  → Inventory Service listens, reserves stock, publishes: InventoryReserved
  → Payment Service listens, charges card, publishes: PaymentCharged
  → Order Service listens, creates order record, publishes: OrderCreated

Failure: Payment Service publishes: PaymentFailed
  → Inventory Service listens, releases stock, publishes: InventoryReleased
  → Order Service listens, marks order as cancelled

Pros: loose coupling, no central coordinator, each service only knows about its own step.
Cons: hard to track overall workflow state, difficult to debug, event chains can become complex.

Orchestration-Based Saga

A central Saga Orchestrator explicitly tells each service what to do:

OrderSagaOrchestrator:
  1. Send "ReserveInventory" to Inventory Service → await result
  2. Send "ChargePayment" to Payment Service → await result
  3. Send "CreateOrder" to Order Service → await result

On failure at step N:
  For i from N-1 down to 1: send compensation command to service i

Pros: workflow is visible and testable in one place, easier to debug.
Cons: orchestrator becomes a central point of coupling (but not failure, since it is stateless).

Saga vs. 2PC: When to Use Which

Criterion 2PC Saga
Consistency Strong (ACID) Eventual (BASE)
Availability Lower (all must be up) Higher (partial failures handled)
Complexity Protocol complexity Compensation logic complexity
Cross-service Difficult Natural fit
Best for Same database cluster Microservices, long-running workflows

Idempotency in Sagas

Saga steps must be idempotent because they may be retried on failure. Use idempotency keys:

# Charge payment step:
INSERT INTO payments (saga_id, step, status)
VALUES ('saga_123', 'charge', 'processing')
ON CONFLICT (saga_id, step) DO NOTHING;
-- If this saga_id + step already exists, it was already processed → skip
-- Return the previously stored result

Interview Tips

  • Always start with why standard transactions don't work across services — sets the context
  • Explain the coordinator failure scenario in 2PC — this is the blocking problem interviewers probe
  • For Sagas: define “compensating transaction” clearly — it is the undo of a step, not a rollback
  • Recommend orchestration over choreography for complex workflows — it is easier to debug
  • Emphasize idempotency: “Saga steps must be designed so that running them twice produces the same result”

  • Coinbase Interview Guide
  • Airbnb Interview Guide
  • DoorDash Interview Guide
  • Uber Interview Guide
  • Shopify Interview Guide
  • Stripe Interview Guide
  • Scroll to Top