Problem Scope
Given a prefix typed by the user, return the top-K (typically 5-10) suggestions ranked by relevance. Used in search bars (Google, Amazon), hashtag suggestions (Twitter), address completion (Google Maps), and command palettes (VS Code). Latency requirement: < 100ms for suggestions to feel instant. Scale: Google autocomplete handles millions of queries per second globally.
Data Structure: Trie
A trie (prefix tree) stores strings such that all strings sharing a prefix share a common path from the root. Each node represents a character; a path from root to a node represents a prefix. Storing top-K suggestions at each node: each trie node stores a list of the top-K queries that pass through it (by frequency). On prefix lookup: navigate to the node matching the prefix in O(L) where L = prefix length. Return the stored top-K list directly — no further traversal needed. Update: when a query’s frequency increases, update the top-K lists at each ancestor node (up to L ancestor nodes). Memory: a single trie for all English search queries is ~50GB — not fitting in one machine’s RAM. Shard by prefix: all queries starting with “a” go to one shard, “b” to another, etc.
Ranking Signals
Simple ranking: by query frequency (most popular queries first). Better: personalized ranking. Signals: (1) Global frequency — “apple” is searched millions of times. (2) Recency — trending queries (from the last hour) are boosted. (3) User context — the user’s location, language, and search history. (4) Completion quality — prefer completions that form complete phrases over fragments. Score = log(frequency) + recency_boost + personalization_score. Recency boost: a query searched 1000 times in the last hour gets a higher boost than one searched 1000 times over the last year. Compute recency boost with an exponential decay: boost = count_last_hour * decay_factor^hours_since. For personalization: serve base suggestions from the global trie; re-rank using the user’s personal search history stored in their profile (Redis hash).
Architecture
Client sends the prefix on each keystroke (debounce: wait 100ms after the last keystroke before sending). Load balancer routes the request to one of the autocomplete servers. Each autocomplete server holds a portion of the trie in memory (sharded by prefix). Cache: LRU cache on each server for the most common prefixes (the top 1000 prefixes cover ~80% of traffic). On cache hit: O(1) response. On miss: trie lookup. Response: array of {query, score, display_text}.
Data pipeline: search query logs → Kafka → stream processor counts query frequencies in a 1-hour sliding window → updates a Redis sorted set per prefix (ZADD prefix:app score query). The autocomplete service reads from Redis for real-time frequency data. Nightly batch job builds the full trie from historical query frequencies and pushes to all autocomplete servers. Real-time events (breaking news) propagate via the streaming path to update Redis sorted sets within seconds.
Handling Typos and Fuzzy Matching
Users make typos: “aple” instead of “apple.” Approaches: (1) Pre-generate common typos and index them. Map “aple” → “apple” in a typo dictionary. (2) Edit distance matching: for short prefixes (< 5 chars), check all strings within edit distance 1. Too slow for large tries. (3) Phonetic matching (Soundex, Metaphone): "Smith" and "Smyth" have the same phonetic code. (4) N-gram indexing: index all character bigrams and trigrams of each query. On a typo-d prefix: look up matching n-grams. Return results sorted by overlap score. For most systems: pre-computed typo dictionary + prefix trie is sufficient. Deep learning-based query correction (as used by Google) can handle complex misspellings but requires significant infrastructure.
Interview Tips
- Store top-K at each trie node to avoid full subtree traversal — trades memory for query speed.
- Separate the fast read path (trie in memory) from the slow update path (background aggregation). Never update the trie on every query.
- Global frequency + recency is usually sufficient for non-personalized autocomplete. Personalization adds complexity but improves relevance.
- Debouncing client-side reduces server load by 5-10x without noticeably affecting UX.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “Why store top-K suggestions at every trie node instead of traversing the subtree on each query?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Without pre-stored top-K: on each prefix query, traverse the entire subtree rooted at the prefix node to find the K highest-frequency terms. For a prefix like “a” in English, the subtree contains hundreds of thousands of words — traversal is O(subtree_size) which is very slow. With pre-stored top-K: each trie node stores a sorted list of the K best completions (by frequency). On query: navigate to the prefix node in O(L) where L is the prefix length, then return the stored list in O(K). Total: O(L + K) regardless of subtree size. Trade-off: memory (each node stores K items) and update cost (updating a term’s frequency requires updating the top-K list at every ancestor node — O(L * log K) per update with a heap). For a system serving millions of queries and with infrequent trie updates (daily batch), this trade-off is strongly favorable.”
}
},
{
“@type”: “Question”,
“name”: “How do you update the autocomplete trie in real time as search frequencies change?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Two update strategies: (1) Batch rebuild: collect all queries in a 24-hour window. Count frequencies. Sort by frequency. Rebuild the trie from scratch nightly. Push the new trie to all autocomplete servers with a blue/green deployment (new trie is loaded while old one continues serving). Pros: simple, handles deletions and frequency decreases naturally. Cons: up to 24-hour lag before new trends appear. (2) Streaming updates: a stream processor (Flink) counts query frequencies in a 1-hour sliding window. When a term’s score changes significantly: update the trie node and its ancestors. Pros: reflects trending queries within minutes (e.g., breaking news). Cons: complex to implement correctly, especially for frequency decreases. Production systems (Google, Bing) use the hybrid approach: a nightly batch baseline plus streaming updates for trending terms only.”
}
},
{
“@type”: “Question”,
“name”: “How do you shard the autocomplete trie across multiple servers?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Shard by prefix. Option 1: one shard per first letter (26 shards). Simple but uneven — “s” queries are far more common than “z”. Option 2: consistent hashing on prefix characters. More even but harder to reason about. Option 3: shard by the first two characters with load-aware assignment. Measure traffic per 2-character prefix, group prefixes to balance total QPS per shard. With 10 shards and ~700 common 2-gram prefixes: each shard handles ~70 prefixes. The client hashes the prefix to find the right shard. All servers in the autocomplete fleet have the same shard routing table (small, easily cached). For global scale: a layer of regional clusters (North America, Europe, Asia) with region-specific frequency data (trending topics differ by region). Each region builds its own trie from regional query logs.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle personalized autocomplete without making the trie too large?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Personalization adds user-specific ranking on top of the global trie. Architecture: global trie returns the top-50 candidates for the prefix (more than the final 5-10 to leave room for re-ranking). Personalization layer: re-rank the 50 candidates using the user’s personal signals. Signals: user’s recent search history (last 30 days), user’s location (boost local business names), language preference. These are stored in the user’s profile (Redis hash, updated in real-time). The personalization re-ranking runs in < 5ms on the autocomplete server. Return top-10 after re-ranking. This keeps the trie universal (not one per user) while giving each user personalized results. The 50-candidate buffer ensures that the globally top-10 results are not always returned — personal items can surface from positions 11-50."
}
},
{
"@type": "Question",
"name": "How does autocomplete handle multi-word queries and semantic understanding?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Trie-based autocomplete matches character-by-character from the left — it cannot understand intent. For multi-word queries like "restaurants near": the trie stores the full phrase as a key. The prefix "restauran" matches all stored phrases starting with that prefix. Semantic understanding (predicting intent from partial input): use an embedding model. Map the partial query to a vector; find the nearest complete queries in vector space. This handles semantic completions ("cheap food" u2192 "affordable restaurants") that trie cannot. Implementation: a dual system. Fast trie path for exact prefix matches (handles 90% of queries). Slow neural path for when the trie returns few results or for long-tail prefixes. Neural path uses a small BERT-based model with query-completion embeddings (pre-computed nightly for all common queries). Combine results with a blending score (trie_score * 0.7 + semantic_score * 0.3)."
}
}
]
}
Asked at: LinkedIn Interview Guide
Asked at: Meta Interview Guide
Asked at: Twitter/X Interview Guide
Asked at: Snap Interview Guide