Low-Level Design: Payment Processor — Idempotency, State Machine, and Retry Handling

Core Entities

PaymentIntent: intent_id (UUID), merchant_id, customer_id, amount (integer cents), currency, status (CREATED, PROCESSING, REQUIRES_ACTION, SUCCEEDED, FAILED, CANCELLED, REFUNDED), payment_method_id, idempotency_key, created_at, updated_at, metadata (JSON). PaymentMethod: method_id, customer_id, type (CARD, BANK_TRANSFER, WALLET), card_fingerprint, last4, expiry_month, expiry_year, billing_address_id, is_default. Charge: charge_id, intent_id, processor (STRIPE, ADYEN, BRAINTREE), processor_charge_id, amount, status, created_at, processor_response (JSON). Refund: refund_id, charge_id, amount, reason, status (PENDING, SUCCEEDED, FAILED), created_at. WebhookEvent: event_id, source, event_type, payload (JSON), received_at, processed_at, status (PENDING, PROCESSED, FAILED).

Idempotency

Idempotency prevents double charges when clients retry on network failure. Every payment creation request must include an idempotency_key (client-generated UUID). Server logic:

class PaymentService:
    def create_intent(self, request: CreateIntentRequest) -> PaymentIntent:
        # Check idempotency cache first
        cached = self.idempotency_store.get(request.idempotency_key)
        if cached:
            return cached  # Return exact same response as original

        with self.db.transaction():
            # Double-check under lock to prevent race conditions
            existing = self.repo.find_by_idempotency_key(
                request.idempotency_key
            )
            if existing:
                return existing

            intent = PaymentIntent(
                intent_id=uuid4(),
                idempotency_key=request.idempotency_key,
                amount=request.amount,
                status=IntentStatus.CREATED,
                ...
            )
            self.repo.save(intent)

        # Cache the result for 24 hours
        self.idempotency_store.set(
            request.idempotency_key, intent, ttl=86400
        )
        return intent

Idempotency key uniqueness: scoped per merchant. Same key used by different merchants is valid (different namespaces). Validate that the request body matches the original request if the key is reused (return 422 Unprocessable Entity if the body differs — client error, not a retry).

Payment State Machine

CREATED → PROCESSING (charge attempt started) → SUCCEEDED (charge confirmed) or FAILED (charge declined). CREATED → CANCELLED (cancelled before processing). SUCCEEDED → REFUNDED (full refund). PROCESSING → REQUIRES_ACTION (3DS authentication needed) → PROCESSING (customer completes 3DS). Each transition is persisted atomically: UPDATE payment_intents SET status=:new, updated_at=NOW() WHERE intent_id=:id AND status=:expected. If rows_affected=0: concurrent update happened — retry or return conflict. All transitions logged in an events table for audit. Invalid transitions are rejected with a 409 Conflict response. State machine validation: define VALID_TRANSITIONS = {CREATED: [PROCESSING, CANCELLED], PROCESSING: [SUCCEEDED, FAILED, REQUIRES_ACTION], …}. Assert before any state update.

Retry Logic and Processor Failover

Transient failures: network timeouts, processor rate limits (HTTP 429), temporary unavailability. Retry strategy: exponential backoff with jitter. First retry after 1s, second after 2s, third after 4s, up to 5 retries. Jitter: add ±20% random variance to prevent thundering herd (all retries hitting the processor simultaneously). Non-retriable failures: card declined (insufficient funds, invalid card number) — do not retry, return FAILED immediately. Processor failover: maintain two processors (primary and backup, e.g., Stripe primary, Adyen backup). If the primary fails 3 consecutive times for a single intent: fail over to the backup. Track failover events in the Charge table. Reconciliation: after processor failover, reconcile the charge status with the primary processor on recovery (to detect if a charge actually succeeded but timed out). Dead letter queue: failed payments after all retries are placed in a DLQ for manual review and customer notification.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is an idempotency key and how does it prevent double charges?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An idempotency key is a client-generated unique identifier (UUID) included in a payment creation request. It allows the client to safely retry failed requests without causing duplicate operations. Scenario: client sends “charge $100″ with idempotency key K. The request reaches the server, the charge is created, but the network drops before the response arrives. The client doesn’t know if the charge succeeded. Without idempotency: client retries u2192 two charges of $100. With idempotency: client retries with the same key K u2192 server detects it has already processed key K u2192 returns the original result without creating a new charge. Server implementation: store (idempotency_key, merchant_id, response) in the database with a 24-hour TTL. On receipt of a request: check the store first. If found: return the cached response immediately. If not found: process the request, store the result, return it. Important: validate that the request body matches the original request for the key. If the body differs: return 422 (the client is trying to use the same key for a different operation — likely a bug).”
}
},
{
“@type”: “Question”,
“name”: “How do you handle webhook delivery and ensure exactly-once processing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Webhooks are HTTP callbacks that notify your system of external events (Stripe notifying you of charge.succeeded). Challenges: delivery failures (your server is down), duplicate delivery (webhook provider retries), and out-of-order delivery. At-least-once delivery from providers: Stripe, Stripe retry failed webhooks up to 3 days with exponential backoff. You must handle duplicates. Idempotent webhook processing: store each received event_id in a WebhookEvents table. Before processing: INSERT … ON CONFLICT DO NOTHING. Check rows_affected: if 0, already processed — skip. The event_id uniqueness constraint prevents duplicate processing even under concurrent delivery. Signature verification: validate the webhook signature (HMAC of the payload with the webhook secret) before processing. Rejects spoofed webhooks. Async processing: acknowledge the webhook immediately (HTTP 200) and process asynchronously. Slow processing causes the provider to retry (thinking delivery failed). Reconciliation: periodically query the payment processor’s API to find any events you may have missed or failed to process.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement 3D Secure (3DS) authentication in the payment flow?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “3D Secure adds a bank authentication step for card payments (OTP, biometric). The payment flow becomes async: (1) Client submits payment. Server creates a PaymentIntent (status=CREATED). (2) Server submits the charge to the payment processor. If 3DS is required: processor returns requires_action with a redirect URL. Server updates intent status to REQUIRES_ACTION. (3) Client is redirected to the bank’s 3DS challenge page. User completes authentication. Bank redirects back to the merchant with a result. (4) Server receives the result via webhook (payment_intent.payment_failed or payment_intent.succeeded). Or: client polls the PaymentIntent status endpoint. (5) Server updates the intent status to SUCCEEDED or FAILED. State machine: CREATED u2192 PROCESSING u2192 REQUIRES_ACTION u2192 PROCESSING (after 3DS) u2192 SUCCEEDED/FAILED. Timeout: if 3DS is not completed within 15 minutes: expire the intent (status u2192 FAILED), release any holds, notify the user. 3DS exemptions: low-risk transactions under u20ac30 (SCA exemptions in Europe) can skip 3DS for better conversion.”
}
},
{
“@type”: “Question”,
“name”: “How do you design the reconciliation process between your system and the payment processor?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Reconciliation detects discrepancies between your payment records and the processor’s records. Daily reconciliation batch: download the processor’s settlement file (CSV with all charges and their statuses for the day). Compare with your database: for each processor charge_id in the settlement file: find the matching charge in your DB by processor_charge_id. Compare status (processor says succeeded but your DB says processing = missed webhook — update your DB). Compare amount (processor captured $95 but you expected $100 = partial capture or refund you missed — investigate). Discrepancies: (1) Processor succeeded, your DB shows pending: missed webhook — update status, trigger fulfillment if needed. (2) Your DB shows succeeded, processor not found: charge may have been reversed by the bank — check with processor, refund if necessary. (3) Amount mismatch: investigate manually — potential data integrity issue. Automated vs manual: auto-resolve discrepancies within a tolerance band. Flag larger discrepancies (>$1 or >0.1% of charge amount) for manual review. Alert the finance team for unresolved discrepancies older than 48 hours.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between authorization and capture in card payments?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Authorization: the merchant requests approval from the card network to charge a specific amount. The bank checks: is the card valid? Is there sufficient credit/funds? It places a hold on the customer’s available balance for up to 7 days. No money moves yet. The merchant receives an authorization code. Use case: when you authorize before you know the final amount (hotel check-in, car rental, restaurants that add tips). Capture: the merchant submits the actual charge against the authorization. Money moves. The capture amount can be less than the authorized amount (tip added after meal, partial shipment) but typically not more. The authorization hold is released and replaced by the actual charge. Timing: capture must happen within 7 days (varies by card network). After 7 days: the authorization expires; you must re-authorize. Auth-capture in practice: most e-commerce payments authorize at order placement and capture at shipment. This ensures you only charge for shipped items. A single API call can do both (immediate capture) — appropriate for digital goods that are delivered instantly. Voiding an authorization: if the order is cancelled before capture, void the authorization (releases the hold immediately without waiting for it to expire).”
}
}
]
}

Asked at: Stripe Interview Guide

Asked at: Shopify Interview Guide

Asked at: Coinbase Interview Guide

Asked at: DoorDash Interview Guide

Scroll to Top