Q: When would you use Boyer-Moore over KMP in practice?

Boyer-Moore is faster in practice for large alphabets (26+ characters, like English text) because its bad character heuristic can skip many characters at once. For a pattern of length m and a large alphabet: average case is O(n/m) -- scans only 1/m characters. For a 10-character pattern in English text: BM skips 9 out of 10 characters on average. KMP scans every character in the text exactly once (O(n)) -- cannot skip characters. BM is better for: English-language text search, searching long patterns in long texts, applications like grep (command-line text search). KMP is better for: small alphabets (DNA: ACGT), streams where you can't backtrack (can process text online as it arrives), and implementation simplicity. For interview problems: KMP is more commonly expected (the failure function demonstrates algorithmic thinking). BM's preprocessing is more complex but its practical speed advantage makes it the industry choice for grep-style tools.

Q: How do you find all anagram occurrences using string matching?

Find all starting indices in text where a substring is an anagram of pattern. Sliding window approach (O(n)): use a fixed window of size len(pattern). Maintain frequency counts for pattern (p_count) and the current window (w_count). Compare counts: if p_count == w_count, the window is an anagram. Slide the window: add text[right], remove text[left], update w_count. Compare counts: O(26) per step for 26-character alphabet. Optimize with a 'matches' counter: track how many of the 26 characters have equal frequency in p_count and w_count. Only update the counter when a character transitions from matched to unmatched or vice versa. When matches == 26: found an anagram. O(n) total. This is the standard LeetCode 438 solution. Rabin-Karp alternative: sort the pattern and each window (O(m log m) preprocessing + O(n * m log m) for all windows) -- much slower.

Question 1

How does the KMP failure function work?

Accepted Answer

The failure function (lps = longest proper prefix which is also suffix) for pattern p: lps[i] = the length of the longest proper prefix of p[0..i] that is also a suffix. 'Proper' means not the full string. Example: p='ababab'. lps[0]=0 (by definition). lps[1]=0 ('ab': no prefix 'a' == suffix 'b'). lps[2]=1 ('aba': prefix 'a' == suffix 'a'). lps[3]=2 ('abab': prefix 'ab' == suffix 'ab'). lps[4]=3 ('ababa': 'aba' == 'aba'). lps[5]=4 ('ababab': 'abab' == 'abab'). Construction O(m): two pointers. i starts at 1, j at 0. If p[i]==p[j]: lps[i]=j+1, advance both. If mismatch and j>0: j=lps[j-1] (reuse previously computed failure). If mismatch and j==0: lps[i]=0, advance i. The failure function enables skipping characters on mismatch during search.

Question 2

How does Rabin-Karp use rolling hash for efficient pattern search?

Accepted Answer

Rolling hash updates the window hash in O(1) instead of O(m). Hash formula: hash(s[i:i+m]) = (s[i] * base^(m-1) + s[i+1] * base^(m-2) + ... + s[i+m-1]) % mod. Rolling update: hash(s[i+1:i+m+1]) = (hash(s[i:i+m]) - s[i] * base^(m-1)) * base + s[i+m]. Subtraction removes the outgoing character; multiplication shifts all remaining characters left by one position; addition appends the new character. O(1) per window update after O(m) initialization. Collision handling: two different substrings may have the same hash (false positive). When hashes match: verify character-by-character. With a good mod (prime, e.g., 10^9+7) and base (e.g., 31 for lowercase letters), collisions are rare. Double hashing (two different hash functions) reduces collision probability to near zero. Multi-pattern search: compute hashes for K patterns, store in a hash set. On window hash match: check if it's in the set.

Question 3

What problems are solved by the Z-algorithm?

Accepted Answer

The Z-array: z[i] = length of the longest substring starting at position i that matches a prefix of the string. Z-algorithm computes this in O(n) time. Pattern search: concatenate pattern + '$' + text. Compute Z-array. Positions where z[i] >= len(pattern) are matches (the '$' separator prevents false matches crossing the boundary). This gives O(n+m) pattern search. Other applications: count occurrences of pattern in text (count positions where z[i] >= m). Find the shortest string that when repeated forms s (check if m = n/k and z[k] = n-k for some k dividing n). Find all periods of a string. The Z-algorithm is equivalent in power to KMP but some find it easier to implement. Construction: maintain a window [l, r] (rightmost Z-box). For each i: initialize z[i] from z[i-l] if i < r, then extend naively. Update [l, r] if the extension goes past r.

Question 4

When would you use Boyer-Moore over KMP in practice?

Accepted Answer

Boyer-Moore is faster in practice for large alphabets (26+ characters, like English text) because its bad character heuristic can skip many characters at once. For a pattern of length m and a large alphabet: average case is O(n/m) -- scans only 1/m characters. For a 10-character pattern in English text: BM skips 9 out of 10 characters on average. KMP scans every character in the text exactly once (O(n)) -- cannot skip characters. BM is better for: English-language text search, searching long patterns in long texts, applications like grep (command-line text search). KMP is better for: small alphabets (DNA: ACGT), streams where you can't backtrack (can process text online as it arrives), and implementation simplicity. For interview problems: KMP is more commonly expected (the failure function demonstrates algorithmic thinking). BM's preprocessing is more complex but its practical speed advantage makes it the industry choice for grep-style tools.

Question 5

How do you find all anagram occurrences using string matching?

Accepted Answer

Find all starting indices in text where a substring is an anagram of pattern. Sliding window approach (O(n)): use a fixed window of size len(pattern). Maintain frequency counts for pattern (p_count) and the current window (w_count). Compare counts: if p_count == w_count, the window is an anagram. Slide the window: add text[right], remove text[left], update w_count. Compare counts: O(26) per step for 26-character alphabet. Optimize with a 'matches' counter: track how many of the 26 characters have equal frequency in p_count and w_count. Only update the counter when a character transitions from matched to unmatched or vice versa. When matches == 26: found an anagram. O(n) total. This is the standard LeetCode 438 solution. Rabin-Karp alternative: sort the pattern and each window (O(m log m) preprocessing + O(n * m log m) for all windows) -- much slower.

String Matching Algorithm Interview Patterns: KMP, Rabin-Karp, Z-Algorithm, and Boyer-Moore (2025)

Naive String Matching

KMP (Knuth-Morris-Pratt)

Rabin-Karp

Z-Algorithm

Boyer-Moore

Interview Tips