Randomized Algorithms Interview Patterns: Reservoir Sampling, QuickSelect, Skip Lists (2025)

Why Randomized Algorithms?

Randomized algorithms use random choices to achieve good expected performance, simplify implementation, or avoid worst-case inputs that deterministic algorithms suffer from. Two types: Las Vegas algorithms (always correct, random runtime — QuickSort), Monte Carlo algorithms (always fast, probabilistically correct — approximate counting). Interview favorites: reservoir sampling, QuickSelect, randomized QuickSort.

Reservoir Sampling

Sample k items uniformly at random from a stream of unknown length n, using O(k) memory. Algorithm: fill the reservoir with the first k items. For each subsequent item i (i > k): generate a random integer j in [0, i]. If j < k: replace reservoir[j] with item i. Proof of uniformity: each item has equal probability k/n of being in the final sample. For k=1: for item i, probability of selection = 1/i; probability of surviving subsequent items = i/(i+1) * … * (n-1)/n = i/n. So overall probability = 1/n. Generalizes to weighted reservoir sampling (item weight proportional to inclusion probability).

import random
def reservoir_sample(stream, k):
    reservoir = []
    for i, item in enumerate(stream):
        if i < k:
            reservoir.append(item)
        else:
            j = random.randint(0, i)
            if j < k:
                reservoir[j] = item
    return reservoir

QuickSelect: k-th Largest Element

Find the k-th largest element in an unsorted array. Average O(n), worst case O(n^2). Partition around a pivot: elements larger than pivot go left, smaller go right. After partition: if pivot position == k-1 (0-indexed from largest): return pivot. If pivot position > k-1: recurse on left subarray. Else: recurse on right subarray. Random pivot selection avoids worst-case O(n^2) on sorted inputs. Expected recurrence: T(n) = T(n/2) + O(n) = O(n). For guaranteed O(n): use median-of-medians pivot selection (interview overkill). LC 215. Alternative: min-heap of size k in O(n log k) — simpler and often preferred in interviews for clarity.

def find_kth_largest(nums, k):
    def partition(lo, hi):
        pivot_idx = random.randint(lo, hi)
        nums[pivot_idx], nums[hi] = nums[hi], nums[pivot_idx]
        pivot = nums[hi]
        i = lo
        for j in range(lo, hi):
            if nums[j] >= pivot:
                nums[i], nums[j] = nums[j], nums[i]
                i += 1
        nums[i], nums[hi] = nums[hi], nums[i]
        return i

    lo, hi = 0, len(nums) - 1
    while lo <= hi:
        p = partition(lo, hi)
        if p == k - 1: return nums[p]
        elif p > k - 1: hi = p - 1
        else: lo = p + 1

Randomized QuickSort Analysis

Expected time O(n log n) regardless of input order. Each element is compared to the pivot at most once per partition level. Expected number of comparisons: sum over all pairs (i,j) of Pr[i and j are compared]. Two elements i and j (where i < j in sorted order) are compared iff one of them is chosen as the pivot before any element between them. Probability = 2/(j-i+1). Total expected comparisons = sum over pairs = O(n log n) by harmonic series. Random pivot selection is the key — it breaks adversarial inputs that cause O(n^2) on deterministic pivots (first or last element).

Skip Lists

A probabilistic sorted data structure with O(log n) expected insert, delete, and search. Structure: multiple linked list levels. Bottom level has all elements. Each higher level skips ahead (each element promoted with probability 1/2). Search: start at the top level, advance right while next element <= target, drop down a level when overshooting. Expected height: O(log n). Skip lists are simpler to implement than balanced BSTs and support range queries naturally. Used in Redis sorted sets (for O(log n) rank queries), LevelDB (memtable). For interviews: explain the probabilistic leveling and the O(log n) expected complexity from the geometric distribution of level heights.

Monte Carlo vs Las Vegas

Las Vegas algorithms: always produce the correct answer; randomness affects runtime only. Examples: randomized QuickSort (always correctly sorts), randomized QuickSelect (always finds the correct k-th element), randomized hashing. Expected runtime is the performance guarantee. Monte Carlo algorithms: always run in bounded time; correctness is probabilistic. Examples: Miller-Rabin primality test (fast but has a small error probability per round, repeat to reduce error), random projection (approximate nearest neighbor), Monte Carlo integration. Reduce error by running multiple independent rounds (error decreases exponentially). For interviews: QuickSelect and reservoir sampling are Las Vegas; Miller-Rabin is Monte Carlo.

Asked at: Meta Interview Guide

Asked at: Apple Interview Guide

Asked at: LinkedIn Interview Guide

Asked at: Coinbase Interview Guide

Scroll to Top