System Design: Search Ranking — Query Processing, Inverted Index, and Relevance Scoring

Core Architecture

A search system takes a text query and returns a ranked list of documents (web pages, products, articles). Components: (1) Crawler/indexer: fetches and processes documents, builds the inverted index. (2) Query processor: parses the query, looks up the index, computes relevance scores, returns ranked results. (3) Serving layer: low-latency API that handles millions of queries per second. At Google scale: hundreds of billions of documents, trillions of queries per year. Even for application-level search (e-commerce search, internal search): the same principles apply at smaller scale.

Inverted Index

An inverted index maps each term to the list of documents containing that term: term → [(doc_id, position, frequency), …]. This “posting list” is the core data structure. Building the index: tokenize documents (split by whitespace, punctuation), normalize (lowercase, stemming: “running” → “run”), remove stop words (the, a, an), compute term frequency per document. For a query “fast search”: look up posting lists for “fast” and “search”, intersect the lists (AND query) or union (OR query). Intersection is faster to intersect in order of shortest posting list first. Index size: raw text → inverted index is typically 10-30% of the raw text size (with compression). Sharding: partition the index by document (each shard holds the full index for a subset of documents). Query fan-out: scatter the query to all shards, gather results, merge-sort by score, return top K. Elasticsearch/Lucene use this architecture.

Relevance Scoring: TF-IDF and BM25

TF-IDF: score(term, doc) = TF(term, doc) * IDF(term). TF (term frequency) = log(1 + count of term in doc). IDF (inverse document frequency) = log(N / df(term)) where N = total documents and df = documents containing the term. High IDF = rare term = more discriminative. “The” has IDF ≈ 0 (appears in almost every document); “photosynthesis” has high IDF. BM25 (Best Match 25): the industry standard. Improvements over TF-IDF: (1) TF saturation: BM25 uses TF / (TF + k1) — more occurrences help, but with diminishing returns (k1 typically 1.2). (2) Document length normalization: longer documents get a slight penalty. score = IDF * TF * (k1+1) / (TF + k1 * (1 – b + b * dl/avgdl)) where b=0.75, dl=document length, avgdl=average document length. BM25 is the default in Elasticsearch and Solr.

Query Processing Pipeline

Query flow: (1) Query parsing: tokenize, normalize, handle operators (AND, OR, NOT, phrase “exact match”, field:value). (2) Query understanding: spell correction, synonym expansion (“car” → also “automobile”), entity recognition (recognizing “iPhone 15” as a product entity). (3) Index lookup: retrieve posting lists for all query terms. (4) Candidate selection: intersect/union posting lists to get candidate document set. For large posting lists: apply WAND (Weak AND) algorithm to skip documents that can’t possibly rank in the top K without scoring them. (5) Scoring: compute BM25 + additional signals (recency, popularity, click-through rate). (6) Re-ranking: apply ML model (LambdaMART, BERT-based cross-encoder) to top-N candidates. (7) Return top K results with snippets. Latency budget: total query latency target = 100ms. Index lookup: 10ms. Scoring: 20ms. Re-ranking: 50ms. Serialization and network: 20ms.

Personalization and Learning to Rank

Pure lexical relevance is insufficient for production search. Ranking signals: (1) Text relevance: BM25 score. (2) Click-through rate (CTR): documents clicked more often for similar queries rank higher. (3) Recency: newer documents rank higher for time-sensitive queries. (4) Authority: PageRank-style link analysis for web; seller rating for e-commerce. (5) User context: past purchases, browsing history (personalized ranking). Learning to Rank (LTR): train a gradient boosted model (XGBoost, LightGBM) or neural model on query-document pairs labeled with relevance judgments (from human raters or click data). Offline evaluation: NDCG (Normalized Discounted Cumulative Gain) measures ranking quality. A/B testing: deploy new ranking model to 1% traffic, measure click-through rate and purchase conversion improvement. Feedback loop: click data from production continuously retrains the ranking model.

Asked at: LinkedIn Interview Guide

Asked at: Shopify Interview Guide

Asked at: Databricks Interview Guide

Asked at: Twitter/X Interview Guide

Scroll to Top