Core Components
An e-commerce system handles product catalog, inventory tracking, shopping cart, order placement, payment processing, and fulfillment. Shopify processes $200B+ GMV per year; Amazon handles 1.6M orders per day. Key challenges: inventory consistency (preventing overselling), high read volume on product pages, flash sale load spikes, and order status tracking across distributed systems.
Product Catalog Service
The product catalog is read-heavy (customers browse and search; writes happen infrequently when merchants update listings). Data model: products have attributes (name, description, price, category, images) and variants (color, size combinations, each with its own SKU and inventory count). Storage: Elasticsearch for full-text search and faceted filtering (filter by category, price range, brand). MySQL/PostgreSQL for authoritative product data. Redis for product page caching (TTL 5 minutes — merchant updates are infrequent). CDN for product images (served from S3 via CloudFront).
Search query flow: user searches “blue running shoes size 10” → Elasticsearch returns top-50 matching SKUs with facet aggregations (brands, price ranges, ratings) → product service fetches display data for top-10 → cached result rendered in < 200ms. For category pages, results are pre-computed and cached.
Inventory Management and Oversell Prevention
Inventory is the most consistency-critical component — an item can only be sold once. Challenge: high concurrent demand (flash sales, limited edition drops) requires preventing overselling without killing availability.
Database-level inventory: inventory stored as a count in MySQL (quantity_available). On purchase: UPDATE inventory SET quantity_available = quantity_available – 1 WHERE sku_id = ? AND quantity_available > 0. If the update affects 0 rows, the item is out of stock — transaction is rolled back. Pessimistic locking (SELECT FOR UPDATE) serializes concurrent updates to the same SKU. Works for moderate concurrency but becomes a bottleneck at high QPS (> 1,000 concurrent buyers for a single SKU).
Redis atomic decrement for flash sales: pre-load inventory counts into Redis (SET inventory:sku123 100). On each purchase attempt: DECR inventory:sku123. If result < 0, INCR back and reject (out of stock). Redis processes 100K+ commands/second on a single instance — handles extreme flash sale spikes without database contention. After the sale, reconcile Redis counts back to the database asynchronously.
Reservation system: when an item is added to cart, reserve the inventory for 15 minutes (similar to airline seat hold). Create a reservation record in the database and decrement available inventory. If the user doesn’t complete purchase within 15 minutes, a background job releases the reservation and increments inventory back. This prevents the “item in cart but sold out at checkout” problem.
Shopping Cart
Shopping cart data is session-scoped, user-specific, frequently read and written, and doesn’t require strong ACID consistency. Storage: Redis hash per user (HSET cart:{user_id} sku_id quantity) with TTL of 7 days (abandoned cart expiration). For logged-out users, store in browser localStorage and sync to Redis on login (merge carts if both exist). Cart is lightweight — no inventory check at add-to-cart time (inventory checked at checkout to avoid unnecessary locks).
Order Processing (Checkout Flow)
Checkout is the most critical path — money changes hands:
- Client submits checkout (cart contents + payment info + shipping address).
- Inventory check: atomically verify all items are in stock. For each SKU, use SELECT FOR UPDATE or Redis DECR. If any item is out of stock, abort and notify the user.
- Payment authorization: call payment processor (Stripe) with the total amount. Block until authorization succeeds (2-3 seconds). If declined, release inventory reservations.
- Order record creation: write the order to the database with status = CONFIRMED. Release the reservation records.
- Fulfillment dispatch: publish an OrderPlaced event to Kafka. Warehouse management system (WMS) consumes the event, picks the item, and updates status to SHIPPED with tracking number.
- Notification: order confirmation email sent asynchronously via SES/SendGrid.
Flash Sale Architecture
A flash sale (10,000 units available at noon, expected 500K concurrent requests): standard architecture fails — the database cannot handle 500K concurrent inventory updates. Solution:
- Pre-load inventory into Redis before the sale starts.
- Rate limit the checkout endpoint aggressively (token bucket, 10K requests/second).
- Virtual queue: at sale start, all requests are queued. Issue queue tokens to clients. A worker dequeues tokens one by one and processes purchases. Clients poll or receive SSE notification when their token is processed.
- Redis DECR for atomic inventory: each dequeued token attempts inventory decrement — if < 0, sale is over. The first 10K successful decrements complete purchases.
- CDN caches the sold-out page — once inventory = 0, all subsequent requests hit the CDN edge with “sold out” response, never reaching origin.
Order Status and Fulfillment
Orders go through states: PENDING → CONFIRMED → PICKED → SHIPPED → DELIVERED → (optionally RETURNED). These transitions are stored in an order_events table (append-only, like event sourcing) rather than updating a single status field — preserves full history for customer service and analytics. Status is derived from the latest event. Shipping events (carrier tracking updates) arrive via webhooks from FedEx/UPS APIs and trigger order_events inserts. Customers receive push notifications (mobile) or emails at each transition.
Capacity Planning
100M daily active users × 10 product page views each = 1B page views/day = ~12K page views/second. With 90% CDN cache hit rate: 1,200 requests/second reach origin — manageable with 10 product service instances. Order processing: 5M orders/day = 58 orders/second — trivially handled. Peak (Black Friday 10× spike): 120K page views/second — CDN absorbs 90%; 12K requests/second to origin — scale product service to 100 instances ahead of the event. Use auto-scaling on CPU utilization metric.