Question 1

When should you use KMP vs Rabin-Karp for string matching?

Accepted Answer

KMP: always O(n+m) worst case. Deterministic. Processes characters one at a time -- suitable for streaming. No hash collisions, no false positives. Use when: searching for one pattern in a text, the text arrives as a stream, you need guaranteed linear time. Rabin-Karp: O(n+m) average case, O(nm) worst case (all hash collisions, rare with a good hash). Use when: searching for multiple patterns simultaneously (compute all pattern hashes, compare the rolling window hash to all of them in O(1) per step). Finding duplicate substrings: use binary search on length + Rabin-Karp hashing to check if any substring of length L repeats. Use when: multiple patterns, substring deduplication, 2D pattern matching (Rabin-Karp can extend to 2D grids). In practice: for single pattern matching, KMP is preferred (deterministic). For algorithms problems involving duplicate substrings (LC 1044 Longest Duplicate Substring), Rabin-Karp + binary search is the standard O(n log n) approach. Z-function: better than both for problems involving prefix comparisons (is string B a rotation of A? Is pattern P a prefix of text T at position i?).

Question 2

How does the KMP failure function handle partial matches?

Accepted Answer

The failure function (also called the prefix function or pi-function) for pattern P: fail[i] = length of the longest proper prefix of P[0..i] that is also a suffix. Example: P = "ABCABD". fail[0]=0, fail[1]=0, fail[2]=0, fail[3]=1 (prefix "A" = suffix "A"), fail[4]=2 (prefix "AB" = suffix "AB"), fail[5]=0. When a mismatch occurs at pattern position j while comparing with text[i]: instead of resetting j to 0, we reset j to fail[j-1]. This "reuses" the matched portion. Why it works: fail[j-1] is the length of the longest prefix of P that is also a suffix of P[0..j-1]. After the mismatch at j, we know the text has P[0..j-1] ending at text[i-1]. The fail value tells us the longest prefix of P that's still potentially valid. We continue matching from fail[j-1] without re-examining text[i]. The result: each character of the text is examined at most twice -- once by i advancing, and potentially once by j backtracking. This gives O(n) for the search phase.

Question 3

How does the Z-algorithm detect string rotation?

Accepted Answer

A string B is a rotation of string A if B appears in A+A (concatenated with itself). Example: A="abcde", B="cdeab". A+A="abcdeabcde". B="cdeab" appears starting at index 2. Algorithm: compute if B is a substring of A+A using KMP or Z-function. Z-function approach: construct S = B + "$" + A + A (where "$" is a character not in either string). Compute Z[i] for S. If any Z[i] = len(B) and i > len(B): B is a rotation of A. The "$" separator prevents Z values from extending into the pattern from the text. This is O(n) time and O(n) space. Alternative (hashing): hash(B) == hash(some window of A+A) -- O(1) check after O(n) hash precomputation. The rotation detection is a common LC problem (LC 796 Rotate String). Note: B must have the same length as A to be a rotation; check len(A) == len(B) first before the substring check.

Question 4

How do you implement autocomplete with a Trie efficiently?

Accepted Answer

A Trie supports prefix search natively: traverse from root to the prefix node, then DFS from there to collect all words. Challenge: may collect thousands of words for a short prefix. Optimizations: (1) Limit results: return only the top K suggestions. Store a "popularity" score on each word node. Use a max-heap to efficiently return top K as you DFS. (2) Precomputed top K: for each trie node, precompute and store the top K most popular completions. On prefix search: return the stored list in O(1). Update lazily when word popularity changes. (3) Ternary Search Tree: more memory-efficient than a standard trie for large alphabets. Each node has three children: less-than, equal, greater-than. Supports near-linear prefix search with less memory than a trie. (4) Approximate (fuzzy) autocomplete: allow up to 1-2 edit distance in the prefix (handles typos). Implement via DFS with a budget for edit distance (similar to edit distance DP but on the trie). Prune DFS branches when the edit distance budget is exhausted. For production autocomplete at scale: use Elasticsearch's completion suggester, which uses an FST (Finite State Transducer) for O(1) amortized prefix queries.

String Algorithm Interview Patterns: KMP, Rabin-Karp, Z-Function, and Trie Applications (2025)

Why String Algorithms?

KMP: Pattern Matching in O(n+m)

Rabin-Karp: Rolling Hash

Z-Function: Prefix Matching

Trie for Prefix Problems