Naive String Matching
For each position i in text (0 to n-m): check if text[i:i+m] == pattern. O(n*m) in the worst case (e.g., “aaaaaab” searched in “aaaaa…a” — checks m characters and fails at the last). Acceptable for short patterns or when the pattern rarely partially matches. All advanced algorithms improve upon this by avoiding re-comparison of characters already known to match.
KMP (Knuth-Morris-Pratt)
KMP preprocesses the pattern to build a failure function (lps = longest proper prefix which is also a suffix). This allows skipping characters on mismatch without restarting from the beginning. Build lps: lps[0]=0 always. For i from 1 to m-1: compare pattern[i] with pattern[lps[j]] (j starts at lps[i-1]). If match: lps[i] = j+1, advance j. If mismatch and j>0: j = lps[j-1] (fall back). If mismatch and j==0: lps[i] = 0. O(m) preprocessing. Search: two pointers i (text) and j (pattern). Match: advance both. Mismatch and j>0: j = lps[j-1] (reuse matched prefix). Mismatch and j==0: advance i only. When j==m: found a match at i-m. O(n) search. Total: O(n+m). Key insight: lps tells us the longest prefix of the pattern that is also a suffix of the matched portion — we can resume from there instead of starting over.
Rabin-Karp
Uses hashing to find pattern matches. Compute the hash of the pattern and the hash of each text window of size m. Compare hashes first (O(1)); only verify character-by-character when hashes match (full comparison is O(m) but rare). Rolling hash: instead of recomputing the hash from scratch for each window (O(m)), update incrementally: new_hash = (old_hash – text[i] * base^(m-1)) * base + text[i+m]. This gives O(1) hash update. Time: O(n+m) expected, O(nm) worst case (if all hashes collide). Rabin-Karp excels at multi-pattern search: compute hashes for all patterns; on a text hash match, check which pattern matches. Finding all K patterns simultaneously in O(n + sum(m_i)) expected time. Use case: plagiarism detection (search for multiple document fragments).
Z-Algorithm
The Z-array: z[i] = length of the longest substring starting at i that matches a prefix of the string. z[0] is defined as n (the whole string matches itself). Construction in O(n): maintain a window [l, r] (the rightmost Z-box). For each i: if i r. Pattern search: concatenate pattern + “$” + text ($ is a separator not in the alphabet). Compute Z-array. Positions where z[i] >= len(pattern) indicate a match (the $ separator prevents matches spanning the boundary). O(n+m) time. Simpler to implement than KMP in some cases; identical complexity.
Boyer-Moore
Boyer-Moore is the fastest algorithm in practice for large alphabets. Two heuristics: Bad Character: on mismatch at text position i with pattern position j, shift the pattern to align the last occurrence of text[i] in the pattern. If text[i] is not in the pattern: shift past text[i] entirely. Good Suffix: when a suffix of the pattern matched before a mismatch, shift to align the next occurrence of that suffix in the pattern. Preprocessing: O(m + alphabet_size). Search: O(n/m) best case (can skip characters), O(nm) worst case (rare). Boyer-Moore is used in grep and other text search tools for its practical speed on natural language text with large alphabets (26+ characters).
Interview Tips
- KMP is the most commonly asked string matching algorithm. Know how to build the failure function and use it in the search phase. Practice on “ababab” pattern in “ababababab” text.
- Rabin-Karp is asked less often but is the go-to when the problem involves multiple patterns or substring hashing (LeetCode “Repeated DNA Sequences”).
- For most interview problems, the naive O(nm) solution is acceptable. Mention KMP for optimization if the interviewer asks for better complexity.
Asked at: Meta Interview Guide
Asked at: Apple Interview Guide
Asked at: LinkedIn Interview Guide
Asked at: Coinbase Interview Guide