Question 1

What data structure is used for search autocomplete and why?

Accepted Answer

A trie (prefix tree) is the standard data structure for autocomplete. Each character is a node; words are paths from the root to leaf nodes. Searching for suggestions given a prefix takes O(|prefix|) time to reach the prefix node, then a DFS to collect all completions. Naive trie: DFS from the prefix node visits all words with that prefix — O(total_completions), which can be slow for common prefixes. Optimized trie: cache the top-k completions at each node. When inserting or updating a word's frequency, update the top-k list at every node along the insertion path. Query time becomes O(|prefix|) — just traverse the path and return the cached top-k at the final node. Memory trade-off: each node stores k (frequency, word) tuples, increasing memory by O(k * nodes). For k=5 and millions of nodes, this adds ~500 MB but reduces query latency from milliseconds to microseconds. Alternative data structure: a sorted array of (frequency, word) pairs with binary search for the prefix. Simpler to implement but O(log N) per query. For interview purposes, describe the trie and mention the top-k caching optimization.

Question 2

How do you scale a typeahead service to handle millions of users?

Accepted Answer

Single-server architecture breaks at scale because: (1) a trie for billions of queries is too large for one machine (hundreds of GB), (2) millions of QPS exceed single-server capacity. Distributed approach: shard the trie by first character. Each shard owns a range of the alphabet (e.g., a-m and n-z). A load balancer routes each query to the appropriate shard based on the prefix's first character. Within each shard, replicate the trie across multiple servers for read availability — reads dominate (100:1 over writes). The trie is rebuilt from data pipelines (daily Spark batch) and distributed as a serialized file — edge servers download the new trie atomically. Read path: Client → CDN (cache suggestions with 60s TTL) → Load Balancer → Trie Shard → return top-5. Most suggestions are served from CDN cache (common prefixes like "how", "what", "the" are requested billions of times daily). Latency budget: cache hit

Question 3

How do you rank autocomplete suggestions?

Accepted Answer

Multiple ranking signals combine to produce the final suggestion order: (1) Query frequency — the most important signal. How many times has this exact query been submitted historically? Weighted by recency: queries from the last 7 days count more than older queries. Formula: score = sum(count_day_i * decay_factor^i) where decay_factor=0.9 and i=days_ago. (2) Personalization — queries that match the user's search history or demographic profile receive a boost. If a user frequently searches for Python topics, "python" gets boosted over "python snake". Requires per-user profile storage and lookup. (3) Trending queries — real-time frequency from the last hour/day. A query spiking in the last hour (e.g., a breaking news topic) gets a recency boost even if its historical frequency is low. (4) Query completion probability — machine learning model predicts P(user will submit this query | they typed this prefix). Features include: prefix length, current time, device type, user history. Google uses click-through rate on suggestions as a training signal. (5) Safety filtering — blocklist-based filtering of prohibited content (CSAM, violent threats, hate speech) happens after scoring, before serving. Production implementation: store (query, composite_score) in sorted sets (Redis ZSET or dedicated ranking service); update scores asynchronously after each query submission.

System Design Interview: Design a Search Autocomplete (Typeahead)

Problem Statement

Core Data Structure: Trie

Optimized Trie with Cached Top-K

Distributed Architecture

Data Collection Pipeline

Trie Partitioning

Read Path

Real-Time Updates

Filtering and Personalization

Interview Checklist