String Matching Algorithm Interview Patterns: KMP, Rabin-Karp, Z-Algorithm, and Boyer-Moore (2025)

Naive String Matching

For each position i in text (0 to n-m): check if text[i:i+m] == pattern. O(n*m) in the worst case (e.g., “aaaaaab” searched in “aaaaa…a” — checks m characters and fails at the last). Acceptable for short patterns or when the pattern rarely partially matches. All advanced algorithms improve upon this by avoiding re-comparison of characters already known to match.

KMP (Knuth-Morris-Pratt)

KMP preprocesses the pattern to build a failure function (lps = longest proper prefix which is also a suffix). This allows skipping characters on mismatch without restarting from the beginning. Build lps: lps[0]=0 always. For i from 1 to m-1: compare pattern[i] with pattern[lps[j]] (j starts at lps[i-1]). If match: lps[i] = j+1, advance j. If mismatch and j>0: j = lps[j-1] (fall back). If mismatch and j==0: lps[i] = 0. O(m) preprocessing. Search: two pointers i (text) and j (pattern). Match: advance both. Mismatch and j>0: j = lps[j-1] (reuse matched prefix). Mismatch and j==0: advance i only. When j==m: found a match at i-m. O(n) search. Total: O(n+m). Key insight: lps tells us the longest prefix of the pattern that is also a suffix of the matched portion — we can resume from there instead of starting over.

Rabin-Karp

Uses hashing to find pattern matches. Compute the hash of the pattern and the hash of each text window of size m. Compare hashes first (O(1)); only verify character-by-character when hashes match (full comparison is O(m) but rare). Rolling hash: instead of recomputing the hash from scratch for each window (O(m)), update incrementally: new_hash = (old_hash – text[i] * base^(m-1)) * base + text[i+m]. This gives O(1) hash update. Time: O(n+m) expected, O(nm) worst case (if all hashes collide). Rabin-Karp excels at multi-pattern search: compute hashes for all patterns; on a text hash match, check which pattern matches. Finding all K patterns simultaneously in O(n + sum(m_i)) expected time. Use case: plagiarism detection (search for multiple document fragments).

Z-Algorithm

The Z-array: z[i] = length of the longest substring starting at i that matches a prefix of the string. z[0] is defined as n (the whole string matches itself). Construction in O(n): maintain a window [l, r] (the rightmost Z-box). For each i: if i r. Pattern search: concatenate pattern + “$” + text ($ is a separator not in the alphabet). Compute Z-array. Positions where z[i] >= len(pattern) indicate a match (the $ separator prevents matches spanning the boundary). O(n+m) time. Simpler to implement than KMP in some cases; identical complexity.

Boyer-Moore

Boyer-Moore is the fastest algorithm in practice for large alphabets. Two heuristics: Bad Character: on mismatch at text position i with pattern position j, shift the pattern to align the last occurrence of text[i] in the pattern. If text[i] is not in the pattern: shift past text[i] entirely. Good Suffix: when a suffix of the pattern matched before a mismatch, shift to align the next occurrence of that suffix in the pattern. Preprocessing: O(m + alphabet_size). Search: O(n/m) best case (can skip characters), O(nm) worst case (rare). Boyer-Moore is used in grep and other text search tools for its practical speed on natural language text with large alphabets (26+ characters).

Interview Tips

  • KMP is the most commonly asked string matching algorithm. Know how to build the failure function and use it in the search phase. Practice on “ababab” pattern in “ababababab” text.
  • Rabin-Karp is asked less often but is the go-to when the problem involves multiple patterns or substring hashing (LeetCode “Repeated DNA Sequences”).
  • For most interview problems, the naive O(nm) solution is acceptable. Mention KMP for optimization if the interviewer asks for better complexity.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does the KMP failure function work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The failure function (lps = longest proper prefix which is also suffix) for pattern p: lps[i] = the length of the longest proper prefix of p[0..i] that is also a suffix. ‘Proper’ means not the full string. Example: p=’ababab’. lps[0]=0 (by definition). lps[1]=0 (‘ab’: no prefix ‘a’ == suffix ‘b’). lps[2]=1 (‘aba’: prefix ‘a’ == suffix ‘a’). lps[3]=2 (‘abab’: prefix ‘ab’ == suffix ‘ab’). lps[4]=3 (‘ababa’: ‘aba’ == ‘aba’). lps[5]=4 (‘ababab’: ‘abab’ == ‘abab’). Construction O(m): two pointers. i starts at 1, j at 0. If p[i]==p[j]: lps[i]=j+1, advance both. If mismatch and j>0: j=lps[j-1] (reuse previously computed failure). If mismatch and j==0: lps[i]=0, advance i. The failure function enables skipping characters on mismatch during search.”
}
},
{
“@type”: “Question”,
“name”: “How does Rabin-Karp use rolling hash for efficient pattern search?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Rolling hash updates the window hash in O(1) instead of O(m). Hash formula: hash(s[i:i+m]) = (s[i] * base^(m-1) + s[i+1] * base^(m-2) + … + s[i+m-1]) % mod. Rolling update: hash(s[i+1:i+m+1]) = (hash(s[i:i+m]) – s[i] * base^(m-1)) * base + s[i+m]. Subtraction removes the outgoing character; multiplication shifts all remaining characters left by one position; addition appends the new character. O(1) per window update after O(m) initialization. Collision handling: two different substrings may have the same hash (false positive). When hashes match: verify character-by-character. With a good mod (prime, e.g., 10^9+7) and base (e.g., 31 for lowercase letters), collisions are rare. Double hashing (two different hash functions) reduces collision probability to near zero. Multi-pattern search: compute hashes for K patterns, store in a hash set. On window hash match: check if it’s in the set.”
}
},
{
“@type”: “Question”,
“name”: “What problems are solved by the Z-algorithm?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The Z-array: z[i] = length of the longest substring starting at position i that matches a prefix of the string. Z-algorithm computes this in O(n) time. Pattern search: concatenate pattern + ‘$’ + text. Compute Z-array. Positions where z[i] >= len(pattern) are matches (the ‘$’ separator prevents false matches crossing the boundary). This gives O(n+m) pattern search. Other applications: count occurrences of pattern in text (count positions where z[i] >= m). Find the shortest string that when repeated forms s (check if m = n/k and z[k] = n-k for some k dividing n). Find all periods of a string. The Z-algorithm is equivalent in power to KMP but some find it easier to implement. Construction: maintain a window [l, r] (rightmost Z-box). For each i: initialize z[i] from z[i-l] if i < r, then extend naively. Update [l, r] if the extension goes past r."
}
},
{
"@type": "Question",
"name": "When would you use Boyer-Moore over KMP in practice?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Boyer-Moore is faster in practice for large alphabets (26+ characters, like English text) because its bad character heuristic can skip many characters at once. For a pattern of length m and a large alphabet: average case is O(n/m) — scans only 1/m characters. For a 10-character pattern in English text: BM skips 9 out of 10 characters on average. KMP scans every character in the text exactly once (O(n)) — cannot skip characters. BM is better for: English-language text search, searching long patterns in long texts, applications like grep (command-line text search). KMP is better for: small alphabets (DNA: ACGT), streams where you can't backtrack (can process text online as it arrives), and implementation simplicity. For interview problems: KMP is more commonly expected (the failure function demonstrates algorithmic thinking). BM's preprocessing is more complex but its practical speed advantage makes it the industry choice for grep-style tools."
}
},
{
"@type": "Question",
"name": "How do you find all anagram occurrences using string matching?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Find all starting indices in text where a substring is an anagram of pattern. Sliding window approach (O(n)): use a fixed window of size len(pattern). Maintain frequency counts for pattern (p_count) and the current window (w_count). Compare counts: if p_count == w_count, the window is an anagram. Slide the window: add text[right], remove text[left], update w_count. Compare counts: O(26) per step for 26-character alphabet. Optimize with a 'matches' counter: track how many of the 26 characters have equal frequency in p_count and w_count. Only update the counter when a character transitions from matched to unmatched or vice versa. When matches == 26: found an anagram. O(n) total. This is the standard LeetCode 438 solution. Rabin-Karp alternative: sort the pattern and each window (O(m log m) preprocessing + O(n * m log m) for all windows) — much slower."
}
}
]
}

Asked at: Meta Interview Guide

Asked at: Apple Interview Guide

Asked at: LinkedIn Interview Guide

Asked at: Coinbase Interview Guide

Scroll to Top