Problem Scope
Given a prefix typed by the user, return the top-K (typically 5-10) suggestions ranked by relevance. Used in search bars (Google, Amazon), hashtag suggestions (Twitter), address completion (Google Maps), and command palettes (VS Code). Latency requirement: < 100ms for suggestions to feel instant. Scale: Google autocomplete handles millions of queries per second globally.
Data Structure: Trie
A trie (prefix tree) stores strings such that all strings sharing a prefix share a common path from the root. Each node represents a character; a path from root to a node represents a prefix. Storing top-K suggestions at each node: each trie node stores a list of the top-K queries that pass through it (by frequency). On prefix lookup: navigate to the node matching the prefix in O(L) where L = prefix length. Return the stored top-K list directly — no further traversal needed. Update: when a query’s frequency increases, update the top-K lists at each ancestor node (up to L ancestor nodes). Memory: a single trie for all English search queries is ~50GB — not fitting in one machine’s RAM. Shard by prefix: all queries starting with “a” go to one shard, “b” to another, etc.
Ranking Signals
Simple ranking: by query frequency (most popular queries first). Better: personalized ranking. Signals: (1) Global frequency — “apple” is searched millions of times. (2) Recency — trending queries (from the last hour) are boosted. (3) User context — the user’s location, language, and search history. (4) Completion quality — prefer completions that form complete phrases over fragments. Score = log(frequency) + recency_boost + personalization_score. Recency boost: a query searched 1000 times in the last hour gets a higher boost than one searched 1000 times over the last year. Compute recency boost with an exponential decay: boost = count_last_hour * decay_factor^hours_since. For personalization: serve base suggestions from the global trie; re-rank using the user’s personal search history stored in their profile (Redis hash).
Architecture
Client sends the prefix on each keystroke (debounce: wait 100ms after the last keystroke before sending). Load balancer routes the request to one of the autocomplete servers. Each autocomplete server holds a portion of the trie in memory (sharded by prefix). Cache: LRU cache on each server for the most common prefixes (the top 1000 prefixes cover ~80% of traffic). On cache hit: O(1) response. On miss: trie lookup. Response: array of {query, score, display_text}.
Data pipeline: search query logs → Kafka → stream processor counts query frequencies in a 1-hour sliding window → updates a Redis sorted set per prefix (ZADD prefix:app score query). The autocomplete service reads from Redis for real-time frequency data. Nightly batch job builds the full trie from historical query frequencies and pushes to all autocomplete servers. Real-time events (breaking news) propagate via the streaming path to update Redis sorted sets within seconds.
Handling Typos and Fuzzy Matching
Users make typos: “aple” instead of “apple.” Approaches: (1) Pre-generate common typos and index them. Map “aple” → “apple” in a typo dictionary. (2) Edit distance matching: for short prefixes (< 5 chars), check all strings within edit distance 1. Too slow for large tries. (3) Phonetic matching (Soundex, Metaphone): "Smith" and "Smyth" have the same phonetic code. (4) N-gram indexing: index all character bigrams and trigrams of each query. On a typo-d prefix: look up matching n-grams. Return results sorted by overlap score. For most systems: pre-computed typo dictionary + prefix trie is sufficient. Deep learning-based query correction (as used by Google) can handle complex misspellings but requires significant infrastructure.
Interview Tips
- Store top-K at each trie node to avoid full subtree traversal — trades memory for query speed.
- Separate the fast read path (trie in memory) from the slow update path (background aggregation). Never update the trie on every query.
- Global frequency + recency is usually sufficient for non-personalized autocomplete. Personalization adds complexity but improves relevance.
- Debouncing client-side reduces server load by 5-10x without noticeably affecting UX.
Asked at: LinkedIn Interview Guide
Asked at: Meta Interview Guide
Asked at: Twitter/X Interview Guide
Asked at: Snap Interview Guide