Question 1

What makes rolling hash O(n) instead of O(n*m) for string matching?

Accepted Answer

Naive string matching compares the pattern against every position in the text, each comparison taking O(m) time = O(n*m) total. Rolling hash avoids recomputing the hash from scratch at each position. Polynomial hash: h = s[0]*p^(m-1) + s[1]*p^(m-2) + ... + s[m-1]. When the window slides right by one: remove the contribution of s[0] (subtract s[0]*p^(m-1)) and add s[m] (add s[m]*p^0). This update is O(1). Across all n-m+1 positions: O(n) hash computations. When a hash matches: do a direct string comparison (O(m)) to confirm. False positives are rare (probability 1/MOD per position). Expected total comparisons: O(n) hash updates + O(1) expected string comparisons = O(n) average, O(n*m) worst case with adversarial input (all same character).

Question 2

How do you handle hash collisions in Rabin-Karp?

Accepted Answer

A hash collision occurs when two different strings produce the same hash value, causing a false positive match. Mitigation: (1) Large modulus: use MOD = 10^9 + 7 (prime). Probability of collision per position u2248 1/MOD u2248 10^-9. For n = 10^6 positions: expected 10^-3 false positives -- near zero. (2) Double hashing: compute two independent hashes with different (BASE, MOD) pairs. A collision requires both hashes to match simultaneously: probability 1/(MOD1 * MOD2) u2248 10^-18. (3) Always verify: whenever the hash matches, do a direct string comparison before recording a match. This makes the algorithm always correct (zero false positives) at the cost of O(m) per hash match. In practice, with a large prime MOD, double hashing is used only when the input might be adversarially crafted (competitive programming, not typical interviews).

Question 3

How do you use rolling hash to find all anagrams of a pattern in a string?

Accepted Answer

Anagram detection with rolling hash requires a hash that is order-independent (same characters = same hash regardless of order). Standard polynomial rolling hash is order-sensitive (position matters) -- "ab" and "ba" have different hashes. Instead use a frequency-based hash: hash(s) = sum of (char_value^2) or a product of primes (each character maps to a distinct prime, hash = product of character primes). This is commutative -- anagrams have the same hash. Sliding window: initialize the hash for the first m characters. Slide right: subtract the outgoing character's contribution, add the incoming. Compare hash with the pattern hash at each position; verify on match. Simpler alternative (and preferred in interviews): sliding window with a frequency difference array (26 integers for lowercase). Track how many characters currently match the required frequency. O(n) time, O(1) space.

Question 4

What is the difference between rolling hash and suffix arrays for substring problems?

Accepted Answer

Rolling hash: simple to implement, O(n) to check if a specific substring exists or appears more than once, O(n log n) for binary search + hash. Has false positive risk (mitigated by double hashing). Works online (streaming). Cannot enumerate all unique substrings efficiently. Suffix arrays: O(n log n) or O(n) to build. Supports: finding the longest repeated substring (LCP array), counting distinct substrings (n*(n+1)/2 - sum of LCP array), finding all occurrences of a pattern in O(m + log n + occurrences) via binary search on the suffix array. No false positives. More complex to implement correctly. Suffix arrays are the theoretically superior tool for offline substring problems, but rolling hash is faster to code in an interview setting and sufficient for most problems.

Rolling Hash and Rabin-Karp: String Matching Interview Patterns (2025)

The String Matching Problem

Rolling Hash (Polynomial Hash)

Rabin-Karp for Multiple Pattern Search

Binary Search + Rolling Hash Pattern

Interview Applications