Question 1

How does an order book work and what data structure implements it efficiently?

Accepted Answer

An order book maintains outstanding limit orders organized by price and time. Two sides: bids (buy orders, sorted descending by price) and asks (sell orders, sorted ascending). Best bid = highest buy price; best ask = lowest sell price. When best bid >= best ask, a trade executes. Implementation: outer structure is a sorted dictionary (TreeMap in Java, SortedDict in Python) mapping price → queue of orders. Each queue is FIFO (maintains time priority at same price). Operations: add_order O(log P) where P = price levels; cancel_order O(1) with an order_id → order pointer hash map; match O(M log P) where M = fills. For ultra-low-latency (microseconds): use an array indexed by price (price ladder) for a bounded price range — O(1) access to any price level. The price range for most stocks is bounded (e.g., 10,000 price ticks), making array indexing practical. Java TreeMap + LinkedList per level is the standard interview-level answer.

Question 2

How does the matching engine ensure exactly-once trade execution?

Accepted Answer

The matching engine is single-threaded to avoid races. All orders are processed sequentially on one thread — no locks needed, no concurrent modification. The sequence: (1) receive order from inbound queue, (2) write to append-only journal (WAL — write-ahead log) before modifying the order book, (3) apply to in-memory order book, (4) publish trade executions to outbound queue. The journal write before state modification is the durability guarantee: if the engine crashes after the journal write but before applying to the book, on restart it replays the journal and re-applies. If it crashes before the journal write, the operation never happened — no partial state. The journal is the ground truth. This is the same WAL pattern used by PostgreSQL and Kafka. For distributed exchanges with hot standby: the primary streams the journal to a standby; on failover, the standby replays any unconfirmed journal entries before going live.

Question 3

Why do high-frequency trading systems use DPDK and CPU pinning?

Accepted Answer

Standard network stack latency path: NIC → kernel interrupt → kernel network stack → kernel buffer → syscall → user space. Each step adds overhead: interrupt latency (microseconds), kernel scheduler latency (context switch), memory copies between kernel and user space. Total: typically 50–200 microseconds for a packet to reach the application. DPDK (Data Plane Development Kit): bypasses the kernel entirely. NIC DMA copies packets directly to user-space ring buffers (hugepages). Application polls the ring buffer in a busy loop (no interrupts, no context switches). Latency: 1–5 microseconds. CPU pinning (core isolation): dedicated CPU cores for the matching engine and NIC polling. Operating system scheduler never preempts these threads. Eliminates scheduler latency (up to 10ms on a loaded system). Combined with NUMA-aware memory allocation (packets and order book on same NUMA node as CPU), these techniques bring end-to-end latency from order receipt to execution to under 10 microseconds. Languages: C++ or Rust — Java GC pauses (1–100ms) are incompatible with microsecond latency targets.

System Design Interview: Design a Stock Exchange / Trading System

What Is a Stock Exchange System?

Order Types

Order Book Data Structure

Matching Algorithm

Performance Optimizations

Trade Matching Log and Replay

Market Data Distribution

Interview Framework