Why Randomized Algorithms?
Randomized algorithms use random choices to achieve good expected performance, simplify implementation, or avoid worst-case inputs that deterministic algorithms suffer from. Two types: Las Vegas algorithms (always correct, random runtime — QuickSort), Monte Carlo algorithms (always fast, probabilistically correct — approximate counting). Interview favorites: reservoir sampling, QuickSelect, randomized QuickSort.
Reservoir Sampling
Sample k items uniformly at random from a stream of unknown length n, using O(k) memory. Algorithm: fill the reservoir with the first k items. For each subsequent item i (i > k): generate a random integer j in [0, i]. If j < k: replace reservoir[j] with item i. Proof of uniformity: each item has equal probability k/n of being in the final sample. For k=1: for item i, probability of selection = 1/i; probability of surviving subsequent items = i/(i+1) * … * (n-1)/n = i/n. So overall probability = 1/n. Generalizes to weighted reservoir sampling (item weight proportional to inclusion probability).
import random
def reservoir_sample(stream, k):
reservoir = []
for i, item in enumerate(stream):
if i < k:
reservoir.append(item)
else:
j = random.randint(0, i)
if j < k:
reservoir[j] = item
return reservoir
QuickSelect: k-th Largest Element
Find the k-th largest element in an unsorted array. Average O(n), worst case O(n^2). Partition around a pivot: elements larger than pivot go left, smaller go right. After partition: if pivot position == k-1 (0-indexed from largest): return pivot. If pivot position > k-1: recurse on left subarray. Else: recurse on right subarray. Random pivot selection avoids worst-case O(n^2) on sorted inputs. Expected recurrence: T(n) = T(n/2) + O(n) = O(n). For guaranteed O(n): use median-of-medians pivot selection (interview overkill). LC 215. Alternative: min-heap of size k in O(n log k) — simpler and often preferred in interviews for clarity.
def find_kth_largest(nums, k):
def partition(lo, hi):
pivot_idx = random.randint(lo, hi)
nums[pivot_idx], nums[hi] = nums[hi], nums[pivot_idx]
pivot = nums[hi]
i = lo
for j in range(lo, hi):
if nums[j] >= pivot:
nums[i], nums[j] = nums[j], nums[i]
i += 1
nums[i], nums[hi] = nums[hi], nums[i]
return i
lo, hi = 0, len(nums) - 1
while lo <= hi:
p = partition(lo, hi)
if p == k - 1: return nums[p]
elif p > k - 1: hi = p - 1
else: lo = p + 1
Randomized QuickSort Analysis
Expected time O(n log n) regardless of input order. Each element is compared to the pivot at most once per partition level. Expected number of comparisons: sum over all pairs (i,j) of Pr[i and j are compared]. Two elements i and j (where i < j in sorted order) are compared iff one of them is chosen as the pivot before any element between them. Probability = 2/(j-i+1). Total expected comparisons = sum over pairs = O(n log n) by harmonic series. Random pivot selection is the key — it breaks adversarial inputs that cause O(n^2) on deterministic pivots (first or last element).
Skip Lists
A probabilistic sorted data structure with O(log n) expected insert, delete, and search. Structure: multiple linked list levels. Bottom level has all elements. Each higher level skips ahead (each element promoted with probability 1/2). Search: start at the top level, advance right while next element <= target, drop down a level when overshooting. Expected height: O(log n). Skip lists are simpler to implement than balanced BSTs and support range queries naturally. Used in Redis sorted sets (for O(log n) rank queries), LevelDB (memtable). For interviews: explain the probabilistic leveling and the O(log n) expected complexity from the geometric distribution of level heights.
Monte Carlo vs Las Vegas
Las Vegas algorithms: always produce the correct answer; randomness affects runtime only. Examples: randomized QuickSort (always correctly sorts), randomized QuickSelect (always finds the correct k-th element), randomized hashing. Expected runtime is the performance guarantee. Monte Carlo algorithms: always run in bounded time; correctness is probabilistic. Examples: Miller-Rabin primality test (fast but has a small error probability per round, repeat to reduce error), random projection (approximate nearest neighbor), Monte Carlo integration. Reduce error by running multiple independent rounds (error decreases exponentially). For interviews: QuickSelect and reservoir sampling are Las Vegas; Miller-Rabin is Monte Carlo.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does reservoir sampling guarantee uniform probability?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Reservoir sampling guarantees that each element has equal probability k/n of being in the final sample, even though n is unknown during processing. Proof for k=1 (select 1 item from n): item i is selected with probability 1/i (it wins the random draw at step i). It survives each subsequent step j > i with probability (j-1)/j (the new item does NOT replace it). Product of survival probabilities: (i/(i+1)) * ((i+1)/(i+2)) * … * ((n-1)/n) = i/n. Combined with selection probability 1/i: overall probability = 1/i * i/n = 1/n. Every item has probability 1/n — uniform. For k > 1, the same telescoping product argument applies, giving each item probability k/n.”
}
},
{
“@type”: “Question”,
“name”: “What is the expected time complexity of QuickSelect and why?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “QuickSelect finds the k-th smallest element in expected O(n) time with random pivot selection. At each step, the pivot splits the array into two parts. We recurse only on the relevant part (the one containing the k-th element). Expected recurrence: if the pivot lands in the middle half (probability 1/2), we recurse on at most 3n/4 elements. T(n) = n + T(3n/4) with probability 1/2, giving T(n) = O(n) in expectation (geometric series: n + 3n/4 + 9n/16 + … = 4n). More precisely, expected comparisons = 2n + o(n). Compare to worst case O(n^2) when the pivot is always the minimum or maximum — random pivot selection makes this event exponentially unlikely.”
}
},
{
“@type”: “Question”,
“name”: “What is a skip list and how does it compare to a balanced BST?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A skip list is a probabilistic sorted linked list with multiple levels. The bottom level has all elements. Each element is promoted to the next level with probability 1/2. Higher levels act as express lanes for faster traversal. Expected height: O(log n). Search: descend from the top level, move right as long as the next node is less than or equal to the target, drop down when the next node exceeds it. Expected O(log n) per operation. Compared to balanced BSTs (AVL, red-black): skip lists are simpler to implement correctly (no rotations), support concurrent operations more easily (lock-coupling on fewer nodes), and support range queries naturally. Red-black trees have deterministic O(log n) while skip lists have expected O(log n). Redis sorted sets use skip lists.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between Monte Carlo and Las Vegas algorithms?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Las Vegas algorithms always produce the correct output; only their running time is random. Examples: randomized QuickSort (always sorts correctly, expected O(n log n)), QuickSelect (always finds the correct element), randomized hashing. The guarantee is on expected runtime. Monte Carlo algorithms always run in bounded deterministic time but may produce incorrect results with some probability. Examples: Miller-Rabin primality test (may incorrectly classify a composite as prime with probability at most 1/4 per round), approximate counting, Monte Carlo integration. Reduce error by repeating: k independent Monte Carlo rounds reduce error probability to (1/4)^k. For interviews: randomized sorting and selection are Las Vegas; probabilistic primality tests are Monte Carlo.”
}
},
{
“@type”: “Question”,
“name”: “How is randomized hashing used to prevent worst-case hash collisions?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A deterministic hash function can be attacked: an adversary knowing the hash function can craft inputs that all map to the same bucket, causing O(n) lookup time (hash DoS). Solution: universal hashing. At startup, randomly choose a hash function from a family of universal hash functions — the adversary does not know which one was chosen. A family H is universal if for any two distinct keys x,y: Pr[h(x)==h(y)] <= 1/m for a random h in H (m = table size). This guarantees expected O(1) per lookup even for adversarial inputs. In practice: Python dictionaries use a randomized hash seed (PYTHONHASHSEED) per process invocation, making hash DoS attacks impractical."
}
}
]
}
Asked at: Meta Interview Guide
Asked at: Apple Interview Guide
Asked at: LinkedIn Interview Guide
Asked at: Coinbase Interview Guide