Question 1

What is the main problem with Two-Phase Commit (2PC)?

Accepted Answer

2PC has two critical problems. (1) Blocking: after Phase 1 (all participants say YES), if the Coordinator crashes before sending Phase 2, all participants are stuck holding locks indefinitely -- they cannot commit (no COMMIT message) or rollback (they said YES). Other transactions are blocked until the Coordinator recovers. This is the fundamental flaw of 2PC. (2) Single point of failure: the Coordinator is critical for progress. Solutions: use persistent logging so the Coordinator can recover its state; use Paxos/Raft for the Coordinator to handle leader failure. In practice, 2PC is acceptable within a single datacenter where network reliability is high and Coordinator recovery is fast (seconds). Across data centers, use Saga instead.

Question 2

What is the difference between saga choreography and orchestration?

Accepted Answer

Choreography: each service listens for domain events and publishes its own events. No central coordinator. Example: Order Service publishes OrderCreated -> Inventory Service listens, reserves stock, publishes InventoryReserved -> Payment Service listens, charges, publishes PaymentCharged -> Order Service listens and confirms. Advantages: loose coupling, no SPOF. Disadvantages: logic is distributed across services (hard to trace the full flow), no single place to see saga state, circular event chains are possible. Orchestration: a Saga Orchestrator sends explicit commands (RESERVE_INVENTORY, CHARGE_PAYMENT) and handles success/failure. The orchestrator stores saga state. Advantages: clear ownership, easy to monitor, explicit error handling. Disadvantages: orchestrator is a new SPOF and coupling point. Prefer orchestration for complex long-running workflows.

Question 3

How do you implement idempotency in a distributed system?

Accepted Answer

Idempotency means that processing the same operation multiple times has the same effect as processing it once. Implementation: the caller generates a unique idempotency_key (UUID) for each logical operation. The server stores (idempotency_key, response) in a database with a unique constraint. On receipt: SELECT by idempotency_key. If found, return the stored response immediately. If not found: process the operation, store the result with the key, return the result. The SELECT + INSERT must be atomic (use INSERT ... ON CONFLICT DO NOTHING and check if a row was inserted). TTL: expire idempotency keys after 24 hours or a business-appropriate window. Critical for: payment processing (retry on network failure should not double-charge), inventory reservation, and any state-changing operation in a distributed system.

Question 4

When should you use eventual consistency instead of strong consistency?

Accepted Answer

Use eventual consistency when: (1) the data is non-critical for correctness (social media likes, view counts, analytics metrics) -- a few seconds of lag is invisible to users. (2) You need maximum availability -- eventual consistency allows writes to proceed even when some replicas are unreachable. (3) Geographic distribution -- eventual consistency allows low-latency writes to the nearest datacenter without waiting for cross-region acknowledgment. Use strong consistency when: (1) correctness is critical (account balances, inventory counts, order status). (2) Multiple services must agree on the same value simultaneously. (3) Users expect to immediately see their own writes (profile updates, settings changes). CAP theorem: you cannot have both availability and strong consistency in the presence of network partitions -- eventual consistency chooses availability.

Question 5

How does the outbox pattern ensure reliable event publishing in a saga?

Accepted Answer

The dual-write problem: a service updates its database and publishes an event to Kafka. If the service crashes between the DB write and the Kafka publish, the event is never sent and the saga stalls. Outbox pattern: write the event to an outbox table in the same database transaction as the state change. A separate relay process reads unpublished outbox events and publishes to Kafka, then marks them as published. Even if the relay crashes: on restart, it re-reads unpublished outbox events and republishes (at-least-once delivery). Consumers must be idempotent to handle duplicate events. The relay can use: polling (simple, slightly higher latency), or CDC (Change Data Capture) with Debezium (reads the database WAL, near real-time, no polling overhead).

Approach	Consistency	Availability	Complexity	Use When
2PC	Strong	Low	Medium	Same datacenter, short transactions
Saga (choreography)	Eventual	High	Medium	Independent services, simple flows
Saga (orchestration)	Eventual	High	High	Complex multi-step workflows
Eventual consistency	Eventual	Very High	Low	Non-critical reads, high-scale

System Design: Distributed Transactions — Two-Phase Commit, Saga, and Eventual Consistency

The Problem with Distributed Transactions

Two-Phase Commit (2PC)

Three-Phase Commit (3PC)

Saga Pattern

Idempotency and At-Least-Once Delivery

Eventual Consistency

Choosing the Right Approach