Requirements
A typeahead service returns query suggestions as the user types. Google Search autocomplete, Twitter’s hashtag completion, and Stripe’s dashboard search are examples. Requirements: return the top 5-10 suggestions for a given prefix within 100ms; suggestions should be ranked by a combination of popularity and personalization; the system should update to reflect trending queries within minutes. At Google scale: 5 billion searches per day, 10+ keystrokes per query = 50 billion autocomplete requests per day. Even at medium scale (50M daily active users), this is millions of requests per second at typing speed.
Data Structure: Trie with Top-K Precomputation
A trie maps prefixes to matching completions. Naive trie: each leaf is a query string with a frequency count. Finding top-K for a prefix requires DFS over the entire subtree — O(subtree size). Too slow for the serving path. Optimization: at each trie node, precompute and store the top K most popular completions passing through that node. Serving is then O(prefix length) — just traverse to the prefix node and return its stored top-K list. Tradeoff: inserting or updating a query’s frequency requires updating the top-K list at every ancestor node — O(L * K) per update, where L = query length. This is acceptable because updates happen asynchronously (not in the serving path). Trie is built offline and served from memory. In-memory trie for 10M unique queries * avg 30 chars * ~40 bytes per node: ~12GB — fits in a large server or is partitioned across multiple.
Architecture
Two paths: serving (real-time, read-only) and index building (async, write). Serving path: client types a character → request hits the autocomplete API → API queries the in-memory trie service → returns top-K suggestions in < 5ms. Client-side debouncing: send requests at most every 100ms (not on every keystroke). Client-side caching: cache results per prefix for 60 seconds. These two optimizations reduce actual server requests by 80-90%. Index building path: query logs → batch aggregation job (Spark, hourly) → recompute prefix → top-K mapping → serialize new trie → atomic swap (write new trie to a new memory region, then swap the pointer). Zero-downtime updates: the old trie continues serving until the swap. Personalization: blend global popularity (80% weight) with the user's past queries (20% weight) to rerank the global top-K before returning to the client. Personalization happens at the API layer, not in the trie itself.
Trie Partitioning and Scale
A single trie server can handle hundreds of thousands of QPS (reads from memory are fast). For Google scale: partition the trie by prefix range. Shard by first character or first two characters (26 or 676 shards). Each shard handles queries starting with its assigned prefixes. Query routing: the API layer routes based on the query prefix. Consistent hashing: a prefix “ap” always goes to the same shard. Replication: each shard has 3 replicas (active-active). Reads are load-balanced; writes go to all replicas. Hotspot: common prefixes like “a”, “b”, “the” are extremely hot. Solution: assign more servers to hot prefix ranges; or add a result cache (Redis) in front of the trie for the top 10,000 most frequent prefixes (covers ~60% of traffic via Zipf’s law).
Trending Query Updates
A query that goes viral (breaking news, a meme) should appear in autocomplete within minutes, not hours. Real-time pipeline: query stream → Kafka → Flink sliding window (count queries per prefix per 5-minute window) → merge with historical counts → update trie. Sliding window merging: new_count = historical_count * decay_factor + realtime_count. Decay factor (e.g., 0.95 per hour) ensures recent queries get higher weight than old ones. Trie update is still async (not in serving path) but now refreshes every 5 minutes instead of hourly. Safety: new queries don’t appear until they’ve accumulated enough frequency (minimum count threshold) to avoid surfacing rare/offensive one-off queries. Blocklist: maintain a blocklist of phrases that should never appear in autocomplete (profanity, dangerous content). Applied as a filter layer after trie lookup, before returning to the client.
Interview Tips
- Client-side caching and debouncing are essential — mention them early to show you understand the full system, not just the backend.
- Trie with precomputed top-K per node is the standard answer; know the tradeoff with naive DFS.
- Separate the serving path (read-only, in-memory, < 5ms) from the index-building path (async, batch or stream, minutes latency).
Asked at: LinkedIn Interview Guide
Asked at: Twitter/X Interview Guide
Asked at: Snap Interview Guide
Asked at: Shopify Interview Guide