Requirements and Scale
Search autocomplete suggests query completions as the user types. Design constraints: latency must be under 100ms (users expect instant suggestions). Suggestions must be relevant (ranked by popularity or personalization). System must handle billions of queries per day. The core challenge is combining prefix matching (Trie or inverted index) with relevance ranking at scale.
Trie-Based Approach
A Trie (prefix tree) stores all known queries. Each node represents a character. Each leaf node represents a complete query. Store query frequency at each node (or at the leaf). To find suggestions for prefix “app”: traverse to the “app” node, then DFS/BFS to find the top-K most frequent completions below that node.
Problem: a naive Trie traversal scans all completions in the subtree — expensive for popular short prefixes. Optimization: cache the top-K completions at every Trie node. On traversal: return cached list directly (O(prefix_length) lookup). Cache invalidation: when query frequencies change, update the cached lists on the path from the changed node to the root. O(prefix_length * K) for cache updates.
Memory: storing all queries in a Trie in memory requires significant RAM. A Trie with 1M unique queries of average length 20 = ~20M nodes. At 40 bytes per node: 800MB. Use a compact representation or limit the Trie to top-N queries (filter out queries with frequency < threshold).
Inverted Index / Elasticsearch Approach
Store queries in Elasticsearch with their frequencies. Use prefix query or edge n-gram tokenizer: index “apple” as “a”, “ap”, “app”, “appl”, “apple”. On prefix search: lookup all documents matching the prefix token, sort by frequency, return top-K. Simpler to operate than a custom Trie but higher latency (network call) and less memory-efficient. Used for: medium-traffic autocomplete where P99 latency of 50-80ms is acceptable.
Data Collection and Ranking
Query frequency data: log every search query with a timestamp. Aggregate logs with a batch job (Spark) running every hour. Compute query frequency for the rolling 7-day window. Update the Trie or search index with new frequencies. Real-time events (trending searches in the last hour): use a streaming job (Flink) to compute real-time trending queries and inject them as high-frequency candidates. Personalization: blend global popularity with user-specific history. User’s past 90-day query history boosts personalized suggestions. Compute per-user history offline (nightly batch).
System Architecture
Data flow: client types a character -> request goes to Autocomplete Service (a fleet of servers, behind a load balancer). The service looks up the prefix in a distributed cache (Redis) keyed by prefix. Cache miss: query the Trie servers (a separate cluster that holds the full Trie in memory). Return top-10 suggestions. Cache the result in Redis with a 1-hour TTL. The Trie servers are updated by a data pipeline that processes log data. Frontend optimization: debounce requests (only send after 300ms of no typing), cancel in-flight requests when new characters are typed. Prefix caching: a short prefix (“a”, “ap”) is extremely popular — pre-compute and cache these aggressively.
Interview Tips
- The Trie is the canonical data structure for this problem. Know how to build it, traverse it for top-K completions, and explain the cached top-K optimization.
- Distinguish between the offline component (building the Trie from logs) and the online component (serving suggestions in real time).
- Personalization vs global: personalization requires per-user query history at read time. Global ranking uses a shared Trie. Discuss the trade-off explicitly.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a Trie support search autocomplete efficiently?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A Trie stores all known search queries character by character. Each path from root to a marked node represents a complete query. For autocomplete on prefix ‘app’: traverse from root following ‘a’ -> ‘p’ -> ‘p’. From the ‘app’ node, find all completions (DFS/BFS through all descendants). Optimization: cache the top-K most frequent completions at every Trie node. Lookup becomes O(prefix_length) — just traverse to the prefix node and return the cached list. Cache update: when a query’s frequency changes, update the cached lists for all nodes on its path from root to leaf. O(prefix_length * K) per update. Space: a Trie for 1M queries of average length 20 uses ~800MB at 40 bytes/node. Mitigate by storing only queries above a frequency threshold (e.g., queried > 100 times).”
}
},
{
“@type”: “Question”,
“name”: “What is the data pipeline for keeping autocomplete suggestions fresh?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Two components: offline (historical popularity) and real-time (trending). Offline: collect all search queries with timestamps (log to Kafka). Aggregate hourly using a Spark job: compute query frequency over the rolling 7-day window. Update the Trie or search index with new frequencies (incremental updates, not full rebuild). Full Trie rebuild: weekly, to clean out low-frequency queries and incorporate structural changes. Real-time: a Flink streaming job reads from the Kafka query log and maintains per-query counts in a sliding 1-hour window. Trending queries (frequency spike > 2x baseline) are injected as high-priority suggestions with a freshness boost. Personalization: a nightly job computes per-user query history (rolling 90 days) and stores it in a user preference store. At serve time: blend global suggestions (80%) with personalized (20%).”
}
},
{
“@type”: “Question”,
“name”: “How do you scale autocomplete to handle billions of queries per day?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Scale the serving layer: a fleet of autocomplete servers, each holding the full Trie in memory (800MB-2GB). Stateless — any server can handle any request. Load balancer distributes requests. Prefix caching: popular short prefixes (‘a’, ‘th’, ‘ho’) receive millions of requests per hour. Cache their suggestions in Redis (TTL 1 hour). Cache hit rate for the top 1000 prefixes handles 60-70% of traffic. Database layer: Trie updates are pushed to all servers via a Kafka topic. Each server applies updates in order. This is an eventually consistent fan-out — servers may briefly show slightly different suggestions during updates. For even higher scale: shard the Trie by first character (26 shards). Each shard serves only queries starting with its letter.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement typo tolerance in search autocomplete?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Typo tolerance handles common misspellings (user types ‘appel’ but means ‘apple’). Approaches: (1) Edit distance matching: find all Trie entries within edit distance 1 or 2 of the prefix. Expensive for large Tries but feasible with pruning (stop when edit distance exceeds threshold). (2) Phonetic matching: Soundex or Metaphone algorithms normalize words to phonetic codes. Store phonetic codes alongside queries; match by code. (3) N-gram index: index query n-grams (trigrams: ‘apple’ -> ‘app’, ‘ppl’, ‘ple’). On typo: compute trigrams of the input, find matching queries by shared trigrams. Fast but may have false positives. (4) ML spelling correction: train a spell-check model on query logs (common corrections). Run spelling correction before the Trie lookup. Google and Bing use a combination of all these approaches.”
}
},
{
“@type”: “Question”,
“name”: “How do you personalize autocomplete suggestions for individual users?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Personalization blends global popularity with the user’s own search history. User signals: queries the user has typed in the past 90 days (strong signal), queries the user has clicked results for (strong), queries the user typed but abandoned (weak). Storage: user query history in a key-value store (Redis or DynamoDB) keyed by user_id. On autocomplete request: fetch the user’s recent queries matching the prefix. Score: personal_score = frequency_in_user_history * recency_weight. Blend: final_score = 0.7 * global_score + 0.3 * personal_score. Return top-K by final score. Privacy: users must be able to clear their search history, which deletes their personalization data. Anonymous users: use global suggestions only. Personalization adds 5-15ms to latency — acceptable given the improvement in relevance.”
}
}
]
}
Asked at: Meta Interview Guide
Asked at: LinkedIn Interview Guide
Asked at: Apple Interview Guide
Asked at: Twitter/X Interview Guide
Asked at: Uber Interview Guide