What Is a Typeahead / Search Suggestion System?
A typeahead (also called autocomplete or search suggestion) system predicts what a user is typing and suggests completions in real time. Google Search, Amazon product search, and Twitter search all use it. The system must respond within 100ms per keystroke, handle millions of concurrent users, and return the most relevant suggestions (not just alphabetically next).
System Requirements
Functional
- Return top 5–10 search suggestions for a given prefix in <100ms
- Suggestions ranked by search frequency (most popular first)
- Support billion-scale query corpus (Google handles 8.5B searches/day)
- Personalized suggestions (factor in user’s search history)
- Trending queries: newly popular searches surface quickly
Non-Functional
- Latency: <100ms p99 (users stop typing if suggestions lag)
- Availability: high — search degraded without suggestions is still usable
- Freshness: trending queries appear in suggestions within 10–15 minutes
Core Data Structure: Trie
A Trie (prefix tree) is the canonical data structure for autocomplete. Each node represents one character. Traversing from root to a node spells a prefix. At each node, store the top-K (e.g., 10) most popular queries in its subtree — precomputed and cached.
class TrieNode:
def __init__(self):
self.children = {}
self.top_queries = [] # [(frequency, query), ...] max 10
class Trie:
def search(self, prefix):
node = self.root
for c in prefix:
if c not in node.children:
return []
node = node.children[c]
return node.top_queries # O(len(prefix)) lookup, O(1) return
By pre-caching top queries at every node, lookups are O(prefix_length) — no subtree traversal needed at query time. The trie is built offline and served read-only.
Trie at Scale: Why You Can’t Fit It in Memory
A Trie for all English queries would have billions of nodes — too large for one server’s RAM. Solutions:
- Shard by prefix: queries starting with ‘a’ go to shard 1, ‘b’ to shard 2, etc. More granular: shard by first two characters (26² = 676 shards). The router maps prefix → shard server.
- Serialize to Redis/disk: serialize the Trie as a hash map: prefix → [top queries]. Redis GET “se” → [“search”, “seattle”, “send”, …]. Simple, fast, easily sharded.
- Approximate with top-N prefix table: for each prefix observed in search logs, precompute top-10 queries. Store as key-value: “se” → […]. Much simpler than a real Trie, equally effective in practice.
Ranking Suggestions
Pure frequency isn’t enough. Ranking factors:
- Query frequency: #1 signal — “weather” is searched billions of times
- Recency: trending queries decay over time. Exponential decay: score = frequency * e^(-λ * age_in_hours)
- Personalization: user’s own recent searches weighted higher. “py” → “python tutorial” for a developer, “pyrotechnics” for a fireworks retailer
- Geography: “starbucks” in Seattle vs. “starbucks near me”
- Spell correction: “gogle” → suggest “google”
Trie Update Pipeline
The Trie can’t be updated in real time (would invalidate cached top-K at many ancestor nodes). Instead:
- Raw query logs → Kafka → aggregation service counts query frequencies (sliding window, last 7 days)
- Weekly (or daily) batch job rebuilds the Trie from the frequency table
- New Trie deployed as a snapshot to Trie servers (blue/green swap)
- For trending queries (<15 minutes freshness): maintain a separate “trending” Redis sorted set updated every 5 minutes. Merge trending results with Trie results at query time.
Frontend Optimization
Reduce backend requests:
- Debounce: only send a request after 200ms of typing inactivity — avoids a request per every single character
- Cancel previous: abort in-flight requests when user types more (fetch AbortController)
- Browser cache: cache responses keyed by prefix. “se”, “sea”, “sear” each cached independently. Common prefixes are re-used across sessions.
- Prefetch: when user focuses the search bar, pre-fetch top 10 global suggestions (empty prefix) — these show as soon as user starts typing.
API Design
GET /suggest?q=sea&limit=10&locale=en_US&session_id=abc123
Response: {
suggestions: [
{ query: "seattle weather", score: 0.98 },
{ query: "search engine", score: 0.87 },
...
],
latency_ms: 42
}
Interview Tips
- The Trie with cached top-K at each node is the key optimization — distinguish it from a naive Trie that requires subtree traversal on every lookup.
- Explain the update pipeline clearly: offline rebuild + separate trending overlay for freshness.
- Frontend optimizations (debouncing, browser caching) are often overlooked but show product thinking.
- If asked about personalization: the simplest approach is to fetch both global Trie suggestions and a user’s recent query history matching the prefix, then rank and merge.