System Design Interview: Design a Search Autocomplete (Typeahead)

Problem Statement

Design a search autocomplete (typeahead) system like Google Search suggestions or Amazon product search. As a user types, the system returns the top 5 completions ranked by query frequency/relevance within 100ms. Handle 100M unique queries per day, billions of historical queries for ranking, and real-time updates to suggestions.

Core Data Structure: Trie

from dataclasses import dataclass, field
import heapq

@dataclass
class TrieNode:
    children: dict = field(default_factory=dict)  # char -> TrieNode
    is_end:   bool = False
    frequency: int = 0  # Total queries through this node's complete word

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str, frequency: int = 1):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True
        node.frequency += frequency

    def get_suggestions(self, prefix: str, top_k: int = 5) -> list[tuple[int, str]]:
        """Return top_k (frequency, word) tuples for the given prefix."""
        node = self.root
        for char in prefix:
            if char not in node.children:
                return []  # Prefix not found
            node = node.children[char]

        # DFS from the prefix node to collect all completions
        results = []
        self._dfs(node, prefix, results)

        # Return top-k by frequency (max-heap)
        return heapq.nlargest(top_k, results, key=lambda x: x[0])

    def _dfs(self, node: TrieNode, current: str,
             results: list):
        if node.is_end:
            results.append((node.frequency, current))
        for char, child in node.children.items():
            self._dfs(child, current + char, results)

Problem with naive trie: DFS from the prefix node has O(total_words_with_prefix) complexity. With billions of queries, this is too slow. Solution: store top-k completions at each trie node.

Optimized Trie with Cached Top-K

@dataclass
class OptimizedTrieNode:
    children: dict = field(default_factory=dict)
    # Cache top-5 completions at this node: [(frequency, word), ...]
    top_k: list = field(default_factory=list)
    TOP_K = 5

class OptimizedTrie:
    def __init__(self):
        self.root = OptimizedTrieNode()

    def insert(self, word: str, frequency: int):
        node = self.root
        for i, char in enumerate(word):
            if char not in node.children:
                node.children[char] = OptimizedTrieNode()
            node = node.children[char]
            # Update top-k at each prefix node
            self._update_top_k(node, word, frequency)

    def _update_top_k(self, node: OptimizedTrieNode,
                      word: str, frequency: int):
        # Add (frequency, word) and keep only top-k
        node.top_k.append((frequency, word))
        # Keep sorted, trim to TOP_K
        node.top_k = heapq.nlargest(
            OptimizedTrieNode.TOP_K, node.top_k, key=lambda x: x[0]
        )

    def get_suggestions(self, prefix: str) -> list[str]:
        node = self.root
        for char in prefix:
            if char not in node.children:
                return []
            node = node.children[char]
        return [word for _, word in node.top_k]

# get_suggestions is now O(|prefix|) — just traverse the trie path
# No DFS needed — suggestions pre-cached at each node

Distributed Architecture

A single trie can’t fit all queries for a global service. Distributed approach:

Data Collection Pipeline

class QueryLogger:
    """Logs search queries for aggregation."""
    def log_query(self, query: str, user_id: str):
        # Append to Kafka topic: autocomplete.raw_queries
        kafka.produce("autocomplete.raw_queries", {
            "query":     query.lower().strip(),
            "user_id":   user_id,
            "timestamp": time.time()
        })

# Offline aggregation (Apache Spark, daily batch):
# 1. Read raw_queries from Kafka / data lake
# 2. Count frequency of each query in the last 7 days (recency-weighted)
# 3. Filter: remove profanity, spam, low-frequency queries (< 10 occurrences)
# 4. Build trie from aggregated (query, frequency) pairs
# 5. Serialize trie and distribute to autocomplete servers

# Recency weighting: queries from today count more than queries from 7 days ago
# weight = frequency * (1 / days_ago_factor)

Trie Partitioning

Partition the trie by prefix to distribute across servers:

  • Server 1: a-m (all queries starting with letters a through m)
  • Server 2: n-z (all queries starting with n through z)
  • The zookeeper/config service maps first-letter → server address
  • API Gateway routes query to the correct server based on the first character of the prefix

Read Path

class AutocompleteService:
    def __init__(self, trie_shards: dict, cache: redis.Redis):
        self.trie_shards = trie_shards  # first_char -> TrieServer
        self.cache = cache

    def get_suggestions(self, prefix: str) -> list[str]:
        prefix = prefix.lower().strip()
        if not prefix:
            return []

        # 1. Cache check (Redis, TTL=60s)
        cache_key = f"ac:{prefix}"
        cached = self.cache.get(cache_key)
        if cached:
            return json.loads(cached)

        # 2. Route to appropriate trie shard
        shard = self.trie_shards.get(prefix[0], self.trie_shards['default'])
        suggestions = shard.get_suggestions(prefix)

        # 3. Cache result
        self.cache.setex(cache_key, 60, json.dumps(suggestions))
        return suggestions

Real-Time Updates

The daily batch is too slow for trending queries (a celebrity news event should surface in minutes). Two-tier approach:

  • Base trie: Built daily from 7 days of historical data. Stable, accurate frequency counts.
  • Realtime layer: Kafka Streams or Flink consumes the raw query stream in real-time, maintaining a rolling 1-hour frequency count in Redis. Results merged with base trie at query time: merged_score = base_freq + alpha * realtime_freq.

Filtering and Personalization

class SuggestionFilter:
    def __init__(self, profanity_list: set, spam_patterns: list):
        self.blocked_words = profanity_list
        self.spam_patterns = spam_patterns

    def filter(self, suggestions: list[str],
               user_history: list[str]) -> list[str]:
        # 1. Remove blocked content
        clean = [s for s in suggestions
                 if not any(w in s for w in self.blocked_words)]

        # 2. Boost suggestions that match user's search history
        def score(suggestion: str) -> float:
            base = 1.0
            if suggestion in user_history:
                base *= 1.5  # Personalization boost
            return base

        return sorted(clean, key=score, reverse=True)[:5]

Interview Checklist

  • Data structure: trie with top-k cached at each node for O(|prefix|) query time
  • Scale: partition trie by first character across multiple servers
  • Ranking: query frequency, recency weighting, personalization
  • Pipeline: daily batch Spark job for base trie + real-time Flink for trending
  • Cache: Redis at query layer (TTL 60s), reduces trie server load
  • Filtering: profanity filter, spam detection before serving suggestions
  • Latency budget: 100ms total — 5ms cache hit, 20ms trie lookup on miss

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What data structure is used for search autocomplete and why?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A trie (prefix tree) is the standard data structure for autocomplete. Each character is a node; words are paths from the root to leaf nodes. Searching for suggestions given a prefix takes O(|prefix|) time to reach the prefix node, then a DFS to collect all completions. Naive trie: DFS from the prefix node visits all words with that prefix — O(total_completions), which can be slow for common prefixes. Optimized trie: cache the top-k completions at each node. When inserting or updating a word’s frequency, update the top-k list at every node along the insertion path. Query time becomes O(|prefix|) — just traverse the path and return the cached top-k at the final node. Memory trade-off: each node stores k (frequency, word) tuples, increasing memory by O(k * nodes). For k=5 and millions of nodes, this adds ~500 MB but reduces query latency from milliseconds to microseconds. Alternative data structure: a sorted array of (frequency, word) pairs with binary search for the prefix. Simpler to implement but O(log N) per query. For interview purposes, describe the trie and mention the top-k caching optimization.”}},{“@type”:”Question”,”name”:”How do you scale a typeahead service to handle millions of users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Single-server architecture breaks at scale because: (1) a trie for billions of queries is too large for one machine (hundreds of GB), (2) millions of QPS exceed single-server capacity. Distributed approach: shard the trie by first character. Each shard owns a range of the alphabet (e.g., a-m and n-z). A load balancer routes each query to the appropriate shard based on the prefix’s first character. Within each shard, replicate the trie across multiple servers for read availability — reads dominate (100:1 over writes). The trie is rebuilt from data pipelines (daily Spark batch) and distributed as a serialized file — edge servers download the new trie atomically. Read path: Client → CDN (cache suggestions with 60s TTL) → Load Balancer → Trie Shard → return top-5. Most suggestions are served from CDN cache (common prefixes like “how”, “what”, “the” are requested billions of times daily). Latency budget: cache hit <5ms, trie lookup 10-20ms. Write path: raw queries → Kafka → Flink (real-time frequency counts) → trie update service → push to shard servers. For gradually trending terms, update frequencies in the trie every few minutes rather than on every query."}},{"@type":"Question","name":"How do you rank autocomplete suggestions?","acceptedAnswer":{"@type":"Answer","text":"Multiple ranking signals combine to produce the final suggestion order: (1) Query frequency — the most important signal. How many times has this exact query been submitted historically? Weighted by recency: queries from the last 7 days count more than older queries. Formula: score = sum(count_day_i * decay_factor^i) where decay_factor=0.9 and i=days_ago. (2) Personalization — queries that match the user's search history or demographic profile receive a boost. If a user frequently searches for Python topics, "python" gets boosted over "python snake". Requires per-user profile storage and lookup. (3) Trending queries — real-time frequency from the last hour/day. A query spiking in the last hour (e.g., a breaking news topic) gets a recency boost even if its historical frequency is low. (4) Query completion probability — machine learning model predicts P(user will submit this query | they typed this prefix). Features include: prefix length, current time, device type, user history. Google uses click-through rate on suggestions as a training signal. (5) Safety filtering — blocklist-based filtering of prohibited content (CSAM, violent threats, hate speech) happens after scoring, before serving. Production implementation: store (query, composite_score) in sorted sets (Redis ZSET or dedicated ranking service); update scores asynchronously after each query submission."}}]}

🏢 Asked at: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

🏢 Asked at: Twitter / X Interview Guide 2026: Real-Time Systems, Feed Architecture, and Platform Engineering

🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

🏢 Asked at: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

Scroll to Top