Problem Statement
Design a search autocomplete (typeahead) system like Google Search suggestions or Amazon product search. As a user types, the system returns the top 5 completions ranked by query frequency/relevance within 100ms. Handle 100M unique queries per day, billions of historical queries for ranking, and real-time updates to suggestions.
Core Data Structure: Trie
from dataclasses import dataclass, field
import heapq
@dataclass
class TrieNode:
children: dict = field(default_factory=dict) # char -> TrieNode
is_end: bool = False
frequency: int = 0 # Total queries through this node's complete word
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str, frequency: int = 1):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
node.frequency += frequency
def get_suggestions(self, prefix: str, top_k: int = 5) -> list[tuple[int, str]]:
"""Return top_k (frequency, word) tuples for the given prefix."""
node = self.root
for char in prefix:
if char not in node.children:
return [] # Prefix not found
node = node.children[char]
# DFS from the prefix node to collect all completions
results = []
self._dfs(node, prefix, results)
# Return top-k by frequency (max-heap)
return heapq.nlargest(top_k, results, key=lambda x: x[0])
def _dfs(self, node: TrieNode, current: str,
results: list):
if node.is_end:
results.append((node.frequency, current))
for char, child in node.children.items():
self._dfs(child, current + char, results)
Problem with naive trie: DFS from the prefix node has O(total_words_with_prefix) complexity. With billions of queries, this is too slow. Solution: store top-k completions at each trie node.
Optimized Trie with Cached Top-K
@dataclass
class OptimizedTrieNode:
children: dict = field(default_factory=dict)
# Cache top-5 completions at this node: [(frequency, word), ...]
top_k: list = field(default_factory=list)
TOP_K = 5
class OptimizedTrie:
def __init__(self):
self.root = OptimizedTrieNode()
def insert(self, word: str, frequency: int):
node = self.root
for i, char in enumerate(word):
if char not in node.children:
node.children[char] = OptimizedTrieNode()
node = node.children[char]
# Update top-k at each prefix node
self._update_top_k(node, word, frequency)
def _update_top_k(self, node: OptimizedTrieNode,
word: str, frequency: int):
# Add (frequency, word) and keep only top-k
node.top_k.append((frequency, word))
# Keep sorted, trim to TOP_K
node.top_k = heapq.nlargest(
OptimizedTrieNode.TOP_K, node.top_k, key=lambda x: x[0]
)
def get_suggestions(self, prefix: str) -> list[str]:
node = self.root
for char in prefix:
if char not in node.children:
return []
node = node.children[char]
return [word for _, word in node.top_k]
# get_suggestions is now O(|prefix|) — just traverse the trie path
# No DFS needed — suggestions pre-cached at each node
Distributed Architecture
A single trie can’t fit all queries for a global service. Distributed approach:
Data Collection Pipeline
class QueryLogger:
"""Logs search queries for aggregation."""
def log_query(self, query: str, user_id: str):
# Append to Kafka topic: autocomplete.raw_queries
kafka.produce("autocomplete.raw_queries", {
"query": query.lower().strip(),
"user_id": user_id,
"timestamp": time.time()
})
# Offline aggregation (Apache Spark, daily batch):
# 1. Read raw_queries from Kafka / data lake
# 2. Count frequency of each query in the last 7 days (recency-weighted)
# 3. Filter: remove profanity, spam, low-frequency queries (< 10 occurrences)
# 4. Build trie from aggregated (query, frequency) pairs
# 5. Serialize trie and distribute to autocomplete servers
# Recency weighting: queries from today count more than queries from 7 days ago
# weight = frequency * (1 / days_ago_factor)
Trie Partitioning
Partition the trie by prefix to distribute across servers:
- Server 1: a-m (all queries starting with letters a through m)
- Server 2: n-z (all queries starting with n through z)
- The zookeeper/config service maps first-letter → server address
- API Gateway routes query to the correct server based on the first character of the prefix
Read Path
class AutocompleteService:
def __init__(self, trie_shards: dict, cache: redis.Redis):
self.trie_shards = trie_shards # first_char -> TrieServer
self.cache = cache
def get_suggestions(self, prefix: str) -> list[str]:
prefix = prefix.lower().strip()
if not prefix:
return []
# 1. Cache check (Redis, TTL=60s)
cache_key = f"ac:{prefix}"
cached = self.cache.get(cache_key)
if cached:
return json.loads(cached)
# 2. Route to appropriate trie shard
shard = self.trie_shards.get(prefix[0], self.trie_shards['default'])
suggestions = shard.get_suggestions(prefix)
# 3. Cache result
self.cache.setex(cache_key, 60, json.dumps(suggestions))
return suggestions
Real-Time Updates
The daily batch is too slow for trending queries (a celebrity news event should surface in minutes). Two-tier approach:
- Base trie: Built daily from 7 days of historical data. Stable, accurate frequency counts.
- Realtime layer: Kafka Streams or Flink consumes the raw query stream in real-time, maintaining a rolling 1-hour frequency count in Redis. Results merged with base trie at query time:
merged_score = base_freq + alpha * realtime_freq.
Filtering and Personalization
class SuggestionFilter:
def __init__(self, profanity_list: set, spam_patterns: list):
self.blocked_words = profanity_list
self.spam_patterns = spam_patterns
def filter(self, suggestions: list[str],
user_history: list[str]) -> list[str]:
# 1. Remove blocked content
clean = [s for s in suggestions
if not any(w in s for w in self.blocked_words)]
# 2. Boost suggestions that match user's search history
def score(suggestion: str) -> float:
base = 1.0
if suggestion in user_history:
base *= 1.5 # Personalization boost
return base
return sorted(clean, key=score, reverse=True)[:5]
Interview Checklist
- Data structure: trie with top-k cached at each node for O(|prefix|) query time
- Scale: partition trie by first character across multiple servers
- Ranking: query frequency, recency weighting, personalization
- Pipeline: daily batch Spark job for base trie + real-time Flink for trending
- Cache: Redis at query layer (TTL 60s), reduces trie server load
- Filtering: profanity filter, spam detection before serving suggestions
- Latency budget: 100ms total — 5ms cache hit, 20ms trie lookup on miss
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What data structure is used for search autocomplete and why?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A trie (prefix tree) is the standard data structure for autocomplete. Each character is a node; words are paths from the root to leaf nodes. Searching for suggestions given a prefix takes O(|prefix|) time to reach the prefix node, then a DFS to collect all completions. Naive trie: DFS from the prefix node visits all words with that prefix — O(total_completions), which can be slow for common prefixes. Optimized trie: cache the top-k completions at each node. When inserting or updating a word’s frequency, update the top-k list at every node along the insertion path. Query time becomes O(|prefix|) — just traverse the path and return the cached top-k at the final node. Memory trade-off: each node stores k (frequency, word) tuples, increasing memory by O(k * nodes). For k=5 and millions of nodes, this adds ~500 MB but reduces query latency from milliseconds to microseconds. Alternative data structure: a sorted array of (frequency, word) pairs with binary search for the prefix. Simpler to implement but O(log N) per query. For interview purposes, describe the trie and mention the top-k caching optimization.”}},{“@type”:”Question”,”name”:”How do you scale a typeahead service to handle millions of users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Single-server architecture breaks at scale because: (1) a trie for billions of queries is too large for one machine (hundreds of GB), (2) millions of QPS exceed single-server capacity. Distributed approach: shard the trie by first character. Each shard owns a range of the alphabet (e.g., a-m and n-z). A load balancer routes each query to the appropriate shard based on the prefix’s first character. Within each shard, replicate the trie across multiple servers for read availability — reads dominate (100:1 over writes). The trie is rebuilt from data pipelines (daily Spark batch) and distributed as a serialized file — edge servers download the new trie atomically. Read path: Client → CDN (cache suggestions with 60s TTL) → Load Balancer → Trie Shard → return top-5. Most suggestions are served from CDN cache (common prefixes like “how”, “what”, “the” are requested billions of times daily). Latency budget: cache hit <5ms, trie lookup 10-20ms. Write path: raw queries → Kafka → Flink (real-time frequency counts) → trie update service → push to shard servers. For gradually trending terms, update frequencies in the trie every few minutes rather than on every query."}},{"@type":"Question","name":"How do you rank autocomplete suggestions?","acceptedAnswer":{"@type":"Answer","text":"Multiple ranking signals combine to produce the final suggestion order: (1) Query frequency — the most important signal. How many times has this exact query been submitted historically? Weighted by recency: queries from the last 7 days count more than older queries. Formula: score = sum(count_day_i * decay_factor^i) where decay_factor=0.9 and i=days_ago. (2) Personalization — queries that match the user's search history or demographic profile receive a boost. If a user frequently searches for Python topics, "python" gets boosted over "python snake". Requires per-user profile storage and lookup. (3) Trending queries — real-time frequency from the last hour/day. A query spiking in the last hour (e.g., a breaking news topic) gets a recency boost even if its historical frequency is low. (4) Query completion probability — machine learning model predicts P(user will submit this query | they typed this prefix). Features include: prefix length, current time, device type, user history. Google uses click-through rate on suggestions as a training signal. (5) Safety filtering — blocklist-based filtering of prohibited content (CSAM, violent threats, hate speech) happens after scoring, before serving. Production implementation: store (query, composite_score) in sorted sets (Redis ZSET or dedicated ranking service); update scores asynchronously after each query submission."}}]}
🏢 Asked at: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale
🏢 Asked at: Twitter / X Interview Guide 2026: Real-Time Systems, Feed Architecture, and Platform Engineering
🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture
🏢 Asked at: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering