Why String Algorithms?
String matching and processing problems appear regularly in interviews. The naive string search (O(nm) nested loops) is often insufficient. Advanced algorithms offer O(n+m) or O(n) solutions. Key algorithms: KMP (Knuth-Morris-Pratt) for pattern matching, Rabin-Karp for multiple pattern search, Z-function for prefix matching, Trie for prefix/autocomplete problems. All are based on preprocessing the pattern to avoid redundant comparisons.
KMP: Pattern Matching in O(n+m)
KMP preprocesses the pattern to compute the “failure function” (longest proper prefix that is also a suffix) for each position. This lets the algorithm avoid re-examining characters after a mismatch.
def kmp_search(text, pattern):
n, m = len(text), len(pattern)
# Build failure function
fail = [0] * m
j = 0
for i in range(1, m):
while j > 0 and pattern[i] != pattern[j]:
j = fail[j-1]
if pattern[i] == pattern[j]:
j += 1
fail[i] = j
# Search
matches = []
j = 0
for i in range(n):
while j > 0 and text[i] != pattern[j]:
j = fail[j-1]
if text[i] == pattern[j]:
j += 1
if j == m:
matches.append(i - m + 1)
j = fail[j-1]
return matches
The failure function encodes: if a match fails at pattern[j], we can resume from pattern[fail[j-1]] instead of starting over. O(n+m) time, O(m) space. Use for: single pattern search in a text, detecting repeated substrings, checking if a string is a rotation of another (check if s2 is in s1+s1 using KMP).
Rabin-Karp: Rolling Hash
Rabin-Karp uses a rolling hash to compare windows of the text to the pattern hash in O(1) per step. O(n+m) average; O(nm) worst case (hash collisions). Better than KMP for multi-pattern search and for finding duplicates. Rolling hash: hash(s[i+1..i+m]) = (hash(s[i..i+m-1]) – s[i] * base^(m-1)) * base + s[i+m]. Mod a large prime to keep values small. Collision: if hashes match, confirm with character comparison (to avoid false positives). Use for: duplicate substring detection (find the longest duplicate substring using binary search + Rabin-Karp hashing), detecting plagiarism (find common long substrings), 2D pattern matching.
Z-Function: Prefix Matching
Z[i] = length of the longest substring starting at position i that is also a prefix of the string. Z[0] is undefined. Build in O(n). Use for: pattern matching (concatenate pattern + “$” + text; Z[i] == len(pattern) means a match at that position), finding all periods of a string, string compression (find the smallest period).
def z_function(s):
n = len(s)
z = [0] * n
l, r = 0, 0
for i in range(1, n):
if i < r:
z[i] = min(r - i, z[i - l])
while i + z[i] r:
l, r = i, i + z[i]
return z
Trie for Prefix Problems
A Trie (prefix tree) stores strings by their prefixes. Each node = one character; path from root = a prefix. Insert: O(L) per word. Search: O(L) per query. Applications: autocomplete (return all words with a given prefix by DFS from the prefix node), word search in a grid (LC 212: DFS with a trie to prune non-existent prefixes early), longest word with all prefixes (LC 720), palindrome pairs (LC 336 — insert reversed words, search for palindromes). LC 208 (Implement Trie): standard trie with insert/search/startsWith. LC 211 (Design Add and Search Words with Wildcards): DFS with “.” wildcard matches any character. LC 212 (Word Search II): build a trie of words; DFS on the grid, prune when the current path is no longer a trie prefix.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “When should you use KMP vs Rabin-Karp for string matching?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “KMP: always O(n+m) worst case. Deterministic. Processes characters one at a time — suitable for streaming. No hash collisions, no false positives. Use when: searching for one pattern in a text, the text arrives as a stream, you need guaranteed linear time. Rabin-Karp: O(n+m) average case, O(nm) worst case (all hash collisions, rare with a good hash). Use when: searching for multiple patterns simultaneously (compute all pattern hashes, compare the rolling window hash to all of them in O(1) per step). Finding duplicate substrings: use binary search on length + Rabin-Karp hashing to check if any substring of length L repeats. Use when: multiple patterns, substring deduplication, 2D pattern matching (Rabin-Karp can extend to 2D grids). In practice: for single pattern matching, KMP is preferred (deterministic). For algorithms problems involving duplicate substrings (LC 1044 Longest Duplicate Substring), Rabin-Karp + binary search is the standard O(n log n) approach. Z-function: better than both for problems involving prefix comparisons (is string B a rotation of A? Is pattern P a prefix of text T at position i?).”
}
},
{
“@type”: “Question”,
“name”: “How does the KMP failure function handle partial matches?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The failure function (also called the prefix function or pi-function) for pattern P: fail[i] = length of the longest proper prefix of P[0..i] that is also a suffix. Example: P = “ABCABD”. fail[0]=0, fail[1]=0, fail[2]=0, fail[3]=1 (prefix “A” = suffix “A”), fail[4]=2 (prefix “AB” = suffix “AB”), fail[5]=0. When a mismatch occurs at pattern position j while comparing with text[i]: instead of resetting j to 0, we reset j to fail[j-1]. This “reuses” the matched portion. Why it works: fail[j-1] is the length of the longest prefix of P that is also a suffix of P[0..j-1]. After the mismatch at j, we know the text has P[0..j-1] ending at text[i-1]. The fail value tells us the longest prefix of P that’s still potentially valid. We continue matching from fail[j-1] without re-examining text[i]. The result: each character of the text is examined at most twice — once by i advancing, and potentially once by j backtracking. This gives O(n) for the search phase.”
}
},
{
“@type”: “Question”,
“name”: “How do you solve the “shortest encoding of words” or “longest common prefix” problems with a Trie?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Shortest encoding of words (LC 820): for each word, check if it is a suffix of another word. Words that are suffixes of others don’t need their own entry in the encoding (they’re already covered). Solution: insert all reversed words into a trie. A word needs its own encoding if and only if it is NOT a prefix of any other (reversed) word — i.e., it does NOT share a trie prefix with any longer word. Words that are leaves in the trie (no children) need independent encoding. Sum = sum of (len(word) + 1) for all trie leaves (the +1 is for the ‘#’ delimiter). Longest common prefix (LC 14): insert all words into a trie. Follow the trie from the root as long as each node has exactly one child (common prefix). Stop when a node has multiple children or is a word endpoint. The path from root to that node = LCP. Alternative for LCP: sort the words; compare only the first and last word in sorted order — the LCP of all words is at most the LCP of the most different pair.”
}
},
{
“@type”: “Question”,
“name”: “How does the Z-algorithm detect string rotation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A string B is a rotation of string A if B appears in A+A (concatenated with itself). Example: A=”abcde”, B=”cdeab”. A+A=”abcdeabcde”. B=”cdeab” appears starting at index 2. Algorithm: compute if B is a substring of A+A using KMP or Z-function. Z-function approach: construct S = B + “$” + A + A (where “$” is a character not in either string). Compute Z[i] for S. If any Z[i] = len(B) and i > len(B): B is a rotation of A. The “$” separator prevents Z values from extending into the pattern from the text. This is O(n) time and O(n) space. Alternative (hashing): hash(B) == hash(some window of A+A) — O(1) check after O(n) hash precomputation. The rotation detection is a common LC problem (LC 796 Rotate String). Note: B must have the same length as A to be a rotation; check len(A) == len(B) first before the substring check.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement autocomplete with a Trie efficiently?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A Trie supports prefix search natively: traverse from root to the prefix node, then DFS from there to collect all words. Challenge: may collect thousands of words for a short prefix. Optimizations: (1) Limit results: return only the top K suggestions. Store a “popularity” score on each word node. Use a max-heap to efficiently return top K as you DFS. (2) Precomputed top K: for each trie node, precompute and store the top K most popular completions. On prefix search: return the stored list in O(1). Update lazily when word popularity changes. (3) Ternary Search Tree: more memory-efficient than a standard trie for large alphabets. Each node has three children: less-than, equal, greater-than. Supports near-linear prefix search with less memory than a trie. (4) Approximate (fuzzy) autocomplete: allow up to 1-2 edit distance in the prefix (handles typos). Implement via DFS with a budget for edit distance (similar to edit distance DP but on the trie). Prune DFS branches when the edit distance budget is exhausted. For production autocomplete at scale: use Elasticsearch’s completion suggester, which uses an FST (Finite State Transducer) for O(1) amortized prefix queries.”
}
}
]
}
Asked at: Meta Interview Guide
Asked at: Apple Interview Guide
Asked at: Coinbase Interview Guide
Asked at: LinkedIn Interview Guide