System Design Interview: Distributed Transactions, 2PC, and the Saga Pattern

System Design Interview: Distributed Transactions, 2PC, and the Saga Pattern

When a business operation spans multiple microservices or databases, maintaining consistency is hard. ACID transactions do not cross service boundaries. This guide covers the two main patterns for distributed transactions — Two-Phase Commit (2PC) and the Saga Pattern — along with the trade-offs that make one or the other the right choice.

Why Distributed Transactions Are Hard

Consider an e-commerce order: deduct inventory, charge the credit card, create the order record. If these live in separate services, any step can fail after the previous ones succeeded. Without coordination:

  • Inventory deducted, payment fails → customer cannot pay, item held indefinitely
  • Payment charged, order creation crashes → money taken, no order created

We need atomicity across services: either all steps succeed or all are rolled back.

Two-Phase Commit (2PC)

2PC uses a coordinator to achieve distributed atomicity:

Phase 1 — Prepare:
  Coordinator → "Can you commit?" → Service A, Service B, Service C
  Each service: acquire locks, write to WAL, respond "Yes" or "No"

Phase 2 — Commit/Abort:
  If all Yes: Coordinator → "Commit!" → all services apply the changes, release locks
  If any No:  Coordinator → "Abort!"  → all services roll back

Strengths: strong consistency, atomic commit across all participants.

Weaknesses:

  • Blocking: if the coordinator crashes after Phase 1 but before Phase 2, participants hold locks indefinitely — the system is stuck
  • Availability: all participants must be available. One unavailable service blocks the entire transaction
  • Latency: two round trips across all participants adds significant overhead
  • Not microservice-friendly: requires all services to implement the 2PC protocol and share a coordinator

Use 2PC only when you control all participants and can tolerate reduced availability. Used by: distributed databases (Spanner, CockroachDB) internally, XA transactions in Java EE.

The Saga Pattern

A Saga breaks a distributed transaction into a sequence of local transactions, each with a compensating transaction that undoes it if something later fails.

Order Flow:
  Step 1: Reserve inventory  → compensate: release inventory
  Step 2: Charge payment     → compensate: issue refund
  Step 3: Create order       → compensate: cancel order

Success path: 1 → 2 → 3 ✓
Failure at Step 3: execute compensations in reverse: refund (2) → release inventory (1)

Each step is a local ACID transaction on a single service. No distributed lock. No coordinator blocking.

Choreography-Based Saga

Services communicate via events. Each service listens for events, performs its step, and publishes the next event:

Order Service publishes: OrderPlaced
  → Inventory Service listens, reserves stock, publishes: InventoryReserved
  → Payment Service listens, charges card, publishes: PaymentCharged
  → Order Service listens, creates order record, publishes: OrderCreated

Failure: Payment Service publishes: PaymentFailed
  → Inventory Service listens, releases stock, publishes: InventoryReleased
  → Order Service listens, marks order as cancelled

Pros: loose coupling, no central coordinator, each service only knows about its own step.
Cons: hard to track overall workflow state, difficult to debug, event chains can become complex.

Orchestration-Based Saga

A central Saga Orchestrator explicitly tells each service what to do:

OrderSagaOrchestrator:
  1. Send "ReserveInventory" to Inventory Service → await result
  2. Send "ChargePayment" to Payment Service → await result
  3. Send "CreateOrder" to Order Service → await result

On failure at step N:
  For i from N-1 down to 1: send compensation command to service i

Pros: workflow is visible and testable in one place, easier to debug.
Cons: orchestrator becomes a central point of coupling (but not failure, since it is stateless).

Saga vs. 2PC: When to Use Which

Criterion 2PC Saga
Consistency Strong (ACID) Eventual (BASE)
Availability Lower (all must be up) Higher (partial failures handled)
Complexity Protocol complexity Compensation logic complexity
Cross-service Difficult Natural fit
Best for Same database cluster Microservices, long-running workflows

Idempotency in Sagas

Saga steps must be idempotent because they may be retried on failure. Use idempotency keys:

# Charge payment step:
INSERT INTO payments (saga_id, step, status)
VALUES ('saga_123', 'charge', 'processing')
ON CONFLICT (saga_id, step) DO NOTHING;
-- If this saga_id + step already exists, it was already processed → skip
-- Return the previously stored result

Interview Tips

  • Always start with why standard transactions don't work across services — sets the context
  • Explain the coordinator failure scenario in 2PC — this is the blocking problem interviewers probe
  • For Sagas: define “compensating transaction” clearly — it is the undo of a step, not a rollback
  • Recommend orchestration over choreography for complex workflows — it is easier to debug
  • Emphasize idempotency: “Saga steps must be designed so that running them twice produces the same result”

  • Coinbase Interview Guide
  • Airbnb Interview Guide
  • DoorDash Interview Guide
  • Uber Interview Guide
  • Shopify Interview Guide
  • Stripe Interview Guide
  • {
    “@context”: “https://schema.org”,
    “@type”: “FAQPage”,
    “mainEntity”: [
    {
    “@type”: “Question”,
    “name”: “What is the difference between the Saga pattern and Two-Phase Commit?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Two-Phase Commit (2PC) achieves strong ACID consistency across distributed participants using a coordinator. Phase 1: coordinator asks all services "can you commit?" and waits for Yes/No. Phase 2: if all Yes, commit; if any No, abort. Problem: if the coordinator crashes after Phase 1, participants hold locks indefinitely (blocking). Also, all participants must be available simultaneously. Saga pattern: breaks the transaction into a sequence of local transactions, each with a compensating transaction (undo). No distributed locks. If step 3 fails, run compensations for steps 2 and 1 in reverse. Saga gives eventual consistency – there is a window where the system is partially applied. Use 2PC when you need strong consistency and control all participants. Use Saga for microservices where availability matters more than immediate consistency.” }
    },
    {
    “@type”: “Question”,
    “name”: “What is a compensating transaction in the Saga pattern?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “A compensating transaction is the semantic undo of a completed local transaction. It is NOT a database rollback – the original transaction is already committed. Example: in an e-commerce order saga, Step 2 is "charge payment card" (local transaction on Payment Service). Its compensating transaction is "issue refund" (another local transaction on Payment Service). If Step 3 (create order record) fails, the saga executes: refund payment (compensates Step 2), then release inventory (compensates Step 1). Compensating transactions must be idempotent and eventually successful – they cannot fail permanently. If a compensation fails, the saga is stuck in an inconsistent state, requiring manual intervention or a dead letter queue.” }
    },
    {
    “@type”: “Question”,
    “name”: “When should you choose choreography vs. orchestration for saga implementation?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Choreography: services communicate by publishing and subscribing to events. No central coordinator. Order Service publishes OrderPlaced, Inventory Service listens and publishes InventoryReserved, Payment Service listens and publishes PaymentCharged. Pros: loose coupling, no single point of failure, services are independently deployable. Cons: hard to track overall workflow state, debugging requires tracing events across multiple services, complex failure scenarios are hard to reason about. Orchestration: a Saga Orchestrator explicitly sends commands to each service and handles the response. The orchestrator holds the workflow state machine. Pros: workflow is visible in one place, easier to debug and test, explicit error handling. Cons: orchestrator is a coupling point (though not a failure point if stateless). Recommendation: use orchestration for complex workflows with many steps or frequent failure paths. Use choreography for simple 2-3 step workflows.” }
    }
    ]
    }

    Scroll to Top