Why String Algorithms?
String matching and processing problems appear regularly in interviews. The naive string search (O(nm) nested loops) is often insufficient. Advanced algorithms offer O(n+m) or O(n) solutions. Key algorithms: KMP (Knuth-Morris-Pratt) for pattern matching, Rabin-Karp for multiple pattern search, Z-function for prefix matching, Trie for prefix/autocomplete problems. All are based on preprocessing the pattern to avoid redundant comparisons.
KMP: Pattern Matching in O(n+m)
KMP preprocesses the pattern to compute the “failure function” (longest proper prefix that is also a suffix) for each position. This lets the algorithm avoid re-examining characters after a mismatch.
def kmp_search(text, pattern):
n, m = len(text), len(pattern)
# Build failure function
fail = [0] * m
j = 0
for i in range(1, m):
while j > 0 and pattern[i] != pattern[j]:
j = fail[j-1]
if pattern[i] == pattern[j]:
j += 1
fail[i] = j
# Search
matches = []
j = 0
for i in range(n):
while j > 0 and text[i] != pattern[j]:
j = fail[j-1]
if text[i] == pattern[j]:
j += 1
if j == m:
matches.append(i - m + 1)
j = fail[j-1]
return matches
The failure function encodes: if a match fails at pattern[j], we can resume from pattern[fail[j-1]] instead of starting over. O(n+m) time, O(m) space. Use for: single pattern search in a text, detecting repeated substrings, checking if a string is a rotation of another (check if s2 is in s1+s1 using KMP).
Rabin-Karp: Rolling Hash
Rabin-Karp uses a rolling hash to compare windows of the text to the pattern hash in O(1) per step. O(n+m) average; O(nm) worst case (hash collisions). Better than KMP for multi-pattern search and for finding duplicates. Rolling hash: hash(s[i+1..i+m]) = (hash(s[i..i+m-1]) – s[i] * base^(m-1)) * base + s[i+m]. Mod a large prime to keep values small. Collision: if hashes match, confirm with character comparison (to avoid false positives). Use for: duplicate substring detection (find the longest duplicate substring using binary search + Rabin-Karp hashing), detecting plagiarism (find common long substrings), 2D pattern matching.
Z-Function: Prefix Matching
Z[i] = length of the longest substring starting at position i that is also a prefix of the string. Z[0] is undefined. Build in O(n). Use for: pattern matching (concatenate pattern + “$” + text; Z[i] == len(pattern) means a match at that position), finding all periods of a string, string compression (find the smallest period).
def z_function(s):
n = len(s)
z = [0] * n
l, r = 0, 0
for i in range(1, n):
if i < r:
z[i] = min(r - i, z[i - l])
while i + z[i] r:
l, r = i, i + z[i]
return z
Trie for Prefix Problems
A Trie (prefix tree) stores strings by their prefixes. Each node = one character; path from root = a prefix. Insert: O(L) per word. Search: O(L) per query. Applications: autocomplete (return all words with a given prefix by DFS from the prefix node), word search in a grid (LC 212: DFS with a trie to prune non-existent prefixes early), longest word with all prefixes (LC 720), palindrome pairs (LC 336 — insert reversed words, search for palindromes). LC 208 (Implement Trie): standard trie with insert/search/startsWith. LC 211 (Design Add and Search Words with Wildcards): DFS with “.” wildcard matches any character. LC 212 (Word Search II): build a trie of words; DFS on the grid, prune when the current path is no longer a trie prefix.
Asked at: Meta Interview Guide
Asked at: Apple Interview Guide
Asked at: Coinbase Interview Guide
Asked at: LinkedIn Interview Guide