System Design Interview: Design a Search Engine (Query Processing and Ranking)

What Is a Search Engine?

A search engine indexes a corpus of documents and retrieves the most relevant results for a query in milliseconds. Examples: Google Search, Elasticsearch, Algolia. Core challenges: inverted index construction at scale, query processing (tokenization, stemming, ranking), and relevance ranking (BM25, PageRank, ML re-ranking). This complements the web crawler design — this post covers the query-time retrieval side.

  • Snap Interview Guide
  • Databricks Interview Guide
  • Cloudflare Interview Guide
  • Atlassian Interview Guide
  • Twitter Interview Guide
  • LinkedIn Interview Guide
  • System Requirements

    Functional

    • Index documents: tokenize, stem, build inverted index
    • Search: return top-K relevant documents for a query
    • Support: phrase search, boolean operators (AND/OR/NOT), filters
    • Spell correction and query suggestion
    • Search latency <100ms at P99

    Non-Functional

    • 100B documents indexed, 10K queries/second
    • Index updates propagated within 60 seconds of document change

    Inverted Index

    An inverted index maps each term to the list of documents containing it (postings list):

    term → [(doc_id, tf, positions), ...]
    
    "python" → [(doc1, tf=3, [5,12,45]), (doc5, tf=1, [7]), ...]
    "interview" → [(doc1, tf=2, [8,20]), (doc3, tf=5, [1,2,3,4,5]), ...]
    

    tf = term frequency (occurrences of the term in the document). Positions enable phrase search (“python interview” = “python” and “interview” are adjacent). The index is stored as sorted arrays of doc_ids (for fast intersection via merge).

    Query Processing

    Query: "python interview tips"
    1. Tokenize: ["python", "interview", "tips"]
    2. Normalize: lowercase, stem ("tips" → "tip")
    3. Lookup each term's postings list
    4. Intersect (AND query): merge sorted doc_id lists
    5. Score each surviving document with BM25
    6. Return top-K by score
    

    Postings list intersection: merge two sorted lists in O(N1 + N2). For AND of multiple terms: start with the shortest postings list (most selective term), intersect incrementally. Phrase search: after AND intersection, verify positions are adjacent.

    BM25 Ranking Formula

    BM25(d, q) = sum over terms t in q:
        IDF(t) * (tf(t,d) * (k1+1)) / (tf(t,d) + k1*(1 - b + b*|d|/avgdl))
    
    IDF(t) = log((N - df(t) + 0.5) / (df(t) + 0.5))
    

    IDF (Inverse Document Frequency): terms appearing in fewer documents are more informative (higher weight). TF normalization: document length normalization (b parameter, typically 0.75) prevents long documents from dominating just by containing more words. k1 (typically 1.5) controls TF saturation — repeated occurrences have diminishing returns. BM25 is the standard baseline ranking function, outperforming TF-IDF.

    Distributed Index Architecture

    100B docs / 1K docs per shard = 100M shards (too many)
    100B docs / 1M docs per shard = 100K shards (more realistic)
    
    Query fan-out:
    Query → Query Router → N index shards (parallel)
                      ← top-K results from each shard
             Merge and re-rank ← aggregate top-K globally
    

    Each shard independently scores and returns its top-K. The query router merges results from all shards (heap merge) and returns the global top-K. Sharding by document ID (random) ensures even distribution. Replication: each shard has 3 replicas for availability and read throughput.

    Index Updates

    New documents: write to a small in-memory “delta index” that is merged with the main index periodically (segment merge, like Lucene/Elasticsearch). Deleted documents: mark as deleted in a tombstone list; filter from results until the next full merge. This write-optimized approach avoids expensive random updates to the main index.

    Spell Correction

    Edit distance (Levenshtein) to find dictionary words within distance 1 or 2 from the query term. At scale: BK-tree or SymSpell for O(1) lookup per query. “Did you mean” shown when the query has zero or few results and a correction produces significantly more.

    ML Re-Ranking

    After BM25 retrieves the top-100 candidates: a neural re-ranker (BERT, cross-encoder) re-scores the query-document pairs using semantic understanding. Expensive (O(100ms) per query for BERT inference) — only applied to the top-100 BM25 results, not all documents. This two-stage approach combines BM25 efficiency with ML accuracy.

    Interview Tips

    • Inverted index is the foundational data structure — describe it before anything else.
    • BM25 > TF-IDF — know the formula and what each term does.
    • Two-stage ranking: BM25 for recall, ML re-ranker for precision.
    • Shard fan-out with per-shard top-K and global merge is the distributed query pattern.
    Scroll to Top