System Design: Typeahead and Search Autocomplete — Trie, Prefix Indexing, and Real-time Suggestions

Requirements and Scale

Search autocomplete suggests query completions as the user types. Design constraints: latency must be under 100ms (users expect instant suggestions). Suggestions must be relevant (ranked by popularity or personalization). System must handle billions of queries per day. The core challenge is combining prefix matching (Trie or inverted index) with relevance ranking at scale.

Trie-Based Approach

A Trie (prefix tree) stores all known queries. Each node represents a character. Each leaf node represents a complete query. Store query frequency at each node (or at the leaf). To find suggestions for prefix “app”: traverse to the “app” node, then DFS/BFS to find the top-K most frequent completions below that node.

Problem: a naive Trie traversal scans all completions in the subtree — expensive for popular short prefixes. Optimization: cache the top-K completions at every Trie node. On traversal: return cached list directly (O(prefix_length) lookup). Cache invalidation: when query frequencies change, update the cached lists on the path from the changed node to the root. O(prefix_length * K) for cache updates.

Memory: storing all queries in a Trie in memory requires significant RAM. A Trie with 1M unique queries of average length 20 = ~20M nodes. At 40 bytes per node: 800MB. Use a compact representation or limit the Trie to top-N queries (filter out queries with frequency < threshold).

Inverted Index / Elasticsearch Approach

Store queries in Elasticsearch with their frequencies. Use prefix query or edge n-gram tokenizer: index “apple” as “a”, “ap”, “app”, “appl”, “apple”. On prefix search: lookup all documents matching the prefix token, sort by frequency, return top-K. Simpler to operate than a custom Trie but higher latency (network call) and less memory-efficient. Used for: medium-traffic autocomplete where P99 latency of 50-80ms is acceptable.

Data Collection and Ranking

Query frequency data: log every search query with a timestamp. Aggregate logs with a batch job (Spark) running every hour. Compute query frequency for the rolling 7-day window. Update the Trie or search index with new frequencies. Real-time events (trending searches in the last hour): use a streaming job (Flink) to compute real-time trending queries and inject them as high-frequency candidates. Personalization: blend global popularity with user-specific history. User’s past 90-day query history boosts personalized suggestions. Compute per-user history offline (nightly batch).

System Architecture

Data flow: client types a character -> request goes to Autocomplete Service (a fleet of servers, behind a load balancer). The service looks up the prefix in a distributed cache (Redis) keyed by prefix. Cache miss: query the Trie servers (a separate cluster that holds the full Trie in memory). Return top-10 suggestions. Cache the result in Redis with a 1-hour TTL. The Trie servers are updated by a data pipeline that processes log data. Frontend optimization: debounce requests (only send after 300ms of no typing), cancel in-flight requests when new characters are typed. Prefix caching: a short prefix (“a”, “ap”) is extremely popular — pre-compute and cache these aggressively.

Interview Tips

  • The Trie is the canonical data structure for this problem. Know how to build it, traverse it for top-K completions, and explain the cached top-K optimization.
  • Distinguish between the offline component (building the Trie from logs) and the online component (serving suggestions in real time).
  • Personalization vs global: personalization requires per-user query history at read time. Global ranking uses a shared Trie. Discuss the trade-off explicitly.

Asked at: Meta Interview Guide

Asked at: LinkedIn Interview Guide

Asked at: Apple Interview Guide

Asked at: Twitter/X Interview Guide

Asked at: Uber Interview Guide

Scroll to Top