Q: How does autocomplete handle multi-word queries and semantic understanding?

Trie-based autocomplete matches character-by-character from the left -- it cannot understand intent. For multi-word queries like "restaurants near": the trie stores the full phrase as a key. The prefix "restauran" matches all stored phrases starting with that prefix. Semantic understanding (predicting intent from partial input): use an embedding model. Map the partial query to a vector; find the nearest complete queries in vector space. This handles semantic completions ("cheap food" u2192 "affordable restaurants") that trie cannot. Implementation: a dual system. Fast trie path for exact prefix matches (handles 90% of queries). Slow neural path for when the trie returns few results or for long-tail prefixes. Neural path uses a small BERT-based model with query-completion embeddings (pre-computed nightly for all common queries). Combine results with a blending score (trie_score * 0.7 + semantic_score * 0.3).

Question 1

Why store top-K suggestions at every trie node instead of traversing the subtree on each query?

Accepted Answer

Without pre-stored top-K: on each prefix query, traverse the entire subtree rooted at the prefix node to find the K highest-frequency terms. For a prefix like "a" in English, the subtree contains hundreds of thousands of words -- traversal is O(subtree_size) which is very slow. With pre-stored top-K: each trie node stores a sorted list of the K best completions (by frequency). On query: navigate to the prefix node in O(L) where L is the prefix length, then return the stored list in O(K). Total: O(L + K) regardless of subtree size. Trade-off: memory (each node stores K items) and update cost (updating a term's frequency requires updating the top-K list at every ancestor node -- O(L * log K) per update with a heap). For a system serving millions of queries and with infrequent trie updates (daily batch), this trade-off is strongly favorable.

Question 2

How do you update the autocomplete trie in real time as search frequencies change?

Accepted Answer

Two update strategies: (1) Batch rebuild: collect all queries in a 24-hour window. Count frequencies. Sort by frequency. Rebuild the trie from scratch nightly. Push the new trie to all autocomplete servers with a blue/green deployment (new trie is loaded while old one continues serving). Pros: simple, handles deletions and frequency decreases naturally. Cons: up to 24-hour lag before new trends appear. (2) Streaming updates: a stream processor (Flink) counts query frequencies in a 1-hour sliding window. When a term's score changes significantly: update the trie node and its ancestors. Pros: reflects trending queries within minutes (e.g., breaking news). Cons: complex to implement correctly, especially for frequency decreases. Production systems (Google, Bing) use the hybrid approach: a nightly batch baseline plus streaming updates for trending terms only.

Question 3

How do you shard the autocomplete trie across multiple servers?

Accepted Answer

Shard by prefix. Option 1: one shard per first letter (26 shards). Simple but uneven -- "s" queries are far more common than "z". Option 2: consistent hashing on prefix characters. More even but harder to reason about. Option 3: shard by the first two characters with load-aware assignment. Measure traffic per 2-character prefix, group prefixes to balance total QPS per shard. With 10 shards and ~700 common 2-gram prefixes: each shard handles ~70 prefixes. The client hashes the prefix to find the right shard. All servers in the autocomplete fleet have the same shard routing table (small, easily cached). For global scale: a layer of regional clusters (North America, Europe, Asia) with region-specific frequency data (trending topics differ by region). Each region builds its own trie from regional query logs.

Question 4

How do you handle personalized autocomplete without making the trie too large?

Accepted Answer

Personalization adds user-specific ranking on top of the global trie. Architecture: global trie returns the top-50 candidates for the prefix (more than the final 5-10 to leave room for re-ranking). Personalization layer: re-rank the 50 candidates using the user's personal signals. Signals: user's recent search history (last 30 days), user's location (boost local business names), language preference. These are stored in the user's profile (Redis hash, updated in real-time). The personalization re-ranking runs in < 5ms on the autocomplete server. Return top-10 after re-ranking. This keeps the trie universal (not one per user) while giving each user personalized results. The 50-candidate buffer ensures that the globally top-10 results are not always returned -- personal items can surface from positions 11-50.

Question 5

How does autocomplete handle multi-word queries and semantic understanding?

Accepted Answer

Trie-based autocomplete matches character-by-character from the left -- it cannot understand intent. For multi-word queries like "restaurants near": the trie stores the full phrase as a key. The prefix "restauran" matches all stored phrases starting with that prefix. Semantic understanding (predicting intent from partial input): use an embedding model. Map the partial query to a vector; find the nearest complete queries in vector space. This handles semantic completions ("cheap food" u2192 "affordable restaurants") that trie cannot. Implementation: a dual system. Fast trie path for exact prefix matches (handles 90% of queries). Slow neural path for when the trie returns few results or for long-tail prefixes. Neural path uses a small BERT-based model with query-completion embeddings (pre-computed nightly for all common queries). Combine results with a blending score (trie_score * 0.7 + semantic_score * 0.3).

System Design: Typeahead and Autocomplete — Trie, Ranking, and Real-Time Suggestion Updates

Problem Scope

Data Structure: Trie

Ranking Signals

Architecture

Handling Typos and Fuzzy Matching

Interview Tips