Question 1

How does a Trie support search autocomplete efficiently?

Accepted Answer

A Trie stores all known search queries character by character. Each path from root to a marked node represents a complete query. For autocomplete on prefix 'app': traverse from root following 'a' -> 'p' -> 'p'. From the 'app' node, find all completions (DFS/BFS through all descendants). Optimization: cache the top-K most frequent completions at every Trie node. Lookup becomes O(prefix_length) -- just traverse to the prefix node and return the cached list. Cache update: when a query's frequency changes, update the cached lists for all nodes on its path from root to leaf. O(prefix_length * K) per update. Space: a Trie for 1M queries of average length 20 uses ~800MB at 40 bytes/node. Mitigate by storing only queries above a frequency threshold (e.g., queried > 100 times).

Question 2

What is the data pipeline for keeping autocomplete suggestions fresh?

Accepted Answer

Two components: offline (historical popularity) and real-time (trending). Offline: collect all search queries with timestamps (log to Kafka). Aggregate hourly using a Spark job: compute query frequency over the rolling 7-day window. Update the Trie or search index with new frequencies (incremental updates, not full rebuild). Full Trie rebuild: weekly, to clean out low-frequency queries and incorporate structural changes. Real-time: a Flink streaming job reads from the Kafka query log and maintains per-query counts in a sliding 1-hour window. Trending queries (frequency spike > 2x baseline) are injected as high-priority suggestions with a freshness boost. Personalization: a nightly job computes per-user query history (rolling 90 days) and stores it in a user preference store. At serve time: blend global suggestions (80%) with personalized (20%).

Question 3

How do you scale autocomplete to handle billions of queries per day?

Accepted Answer

Scale the serving layer: a fleet of autocomplete servers, each holding the full Trie in memory (800MB-2GB). Stateless -- any server can handle any request. Load balancer distributes requests. Prefix caching: popular short prefixes ('a', 'th', 'ho') receive millions of requests per hour. Cache their suggestions in Redis (TTL 1 hour). Cache hit rate for the top 1000 prefixes handles 60-70% of traffic. Database layer: Trie updates are pushed to all servers via a Kafka topic. Each server applies updates in order. This is an eventually consistent fan-out -- servers may briefly show slightly different suggestions during updates. For even higher scale: shard the Trie by first character (26 shards). Each shard serves only queries starting with its letter.

Question 4

How do you implement typo tolerance in search autocomplete?

Accepted Answer

Typo tolerance handles common misspellings (user types 'appel' but means 'apple'). Approaches: (1) Edit distance matching: find all Trie entries within edit distance 1 or 2 of the prefix. Expensive for large Tries but feasible with pruning (stop when edit distance exceeds threshold). (2) Phonetic matching: Soundex or Metaphone algorithms normalize words to phonetic codes. Store phonetic codes alongside queries; match by code. (3) N-gram index: index query n-grams (trigrams: 'apple' -> 'app', 'ppl', 'ple'). On typo: compute trigrams of the input, find matching queries by shared trigrams. Fast but may have false positives. (4) ML spelling correction: train a spell-check model on query logs (common corrections). Run spelling correction before the Trie lookup. Google and Bing use a combination of all these approaches.

Question 5

How do you personalize autocomplete suggestions for individual users?

Accepted Answer

Personalization blends global popularity with the user's own search history. User signals: queries the user has typed in the past 90 days (strong signal), queries the user has clicked results for (strong), queries the user typed but abandoned (weak). Storage: user query history in a key-value store (Redis or DynamoDB) keyed by user_id. On autocomplete request: fetch the user's recent queries matching the prefix. Score: personal_score = frequency_in_user_history * recency_weight. Blend: final_score = 0.7 * global_score + 0.3 * personal_score. Return top-K by final score. Privacy: users must be able to clear their search history, which deletes their personalization data. Anonymous users: use global suggestions only. Personalization adds 5-15ms to latency -- acceptable given the improvement in relevance.

System Design: Typeahead and Search Autocomplete — Trie, Prefix Indexing, and Real-time Suggestions

Requirements and Scale

Trie-Based Approach

Inverted Index / Elasticsearch Approach

Data Collection and Ranking

System Architecture

Interview Tips