String Hashing Interview Patterns: Rabin-Karp, Rolling Hash, and Polynomial Hashing (2025)

Why String Hashing?

Naive string matching (check every position in the text for the pattern) is O(n*m) where n = text length, m = pattern length. String hashing enables O(1) substring comparison by reducing a string to a single integer. Combined with a rolling hash, we can compare any substring of length m in O(1) after O(n) preprocessing, giving O(n+m) total for pattern matching. Polynomial hashing: hash(s) = s[0]*p^(m-1) + s[1]*p^(m-2) + … + s[m-1]*p^0 (mod M). p is a prime base (~31 for lowercase letters), M is a large prime (~10^9+7) to reduce collisions.

Rolling Hash (Rabin-Karp Algorithm)

Compute the hash of the first window of size m. Slide the window by removing the leftmost character and adding the new rightmost character in O(1). When the new hash equals the pattern hash: verify character by character (handles hash collisions). O(n+m) average, O(n*m) worst case (all collisions).

def rabin_karp(text, pattern):
    n, m = len(text), len(pattern)
    if m > n: return []
    BASE, MOD = 31, 10**9 + 7

    def char_val(c): return ord(c) - ord('a') + 1

    # Precompute pattern hash and first window hash
    pat_hash = 0
    win_hash = 0
    power = 1  # BASE^(m-1) mod MOD

    for i in range(m):
        pat_hash = (pat_hash * BASE + char_val(pattern[i])) % MOD
        win_hash = (win_hash * BASE + char_val(text[i])) % MOD
        if i < m - 1:
            power = power * BASE % MOD

    result = []
    for i in range(n - m + 1):
        if win_hash == pat_hash and text[i:i+m] == pattern:
            result.append(i)
        if i < n - m:
            # Roll: remove text[i], add text[i+m]
            win_hash = (win_hash - char_val(text[i]) * power) % MOD
            win_hash = (win_hash * BASE + char_val(text[i+m])) % MOD
            win_hash = (win_hash + MOD) % MOD  # keep positive
    return result

Prefix Hash Array for O(1) Substring Hash

Precompute prefix_hash[i] = hash(s[0..i-1]) and powers[i] = BASE^i mod MOD. Then hash(s[l..r]) = (prefix_hash[r+1] – prefix_hash[l] * powers[r-l+1]) mod MOD. This enables O(1) hash computation for any substring after O(n) preprocessing. Use this for: comparing many pairs of substrings efficiently (LC 1044 Longest Duplicate Substring with binary search + hashing), finding all distinct substrings, or checking if two strings share a common substring of length k.

class StringHasher:
    def __init__(self, s, base=131, mod=10**18+9):
        n = len(s)
        self.mod = mod
        self.base = base
        self.pw = [1] * (n + 1)
        self.h = [0] * (n + 1)
        for i in range(n):
            self.pw[i+1] = self.pw[i] * base % mod
            self.h[i+1] = (self.h[i] * base + ord(s[i])) % mod

    def query(self, l, r):  # hash of s[l..r] inclusive
        return (self.h[r+1] - self.h[l] * self.pw[r-l+1]) % self.mod

Double Hashing to Reduce Collisions

A single hash has a ~1/MOD collision probability per comparison. With n^2 comparisons, expected collisions = n^2/MOD. For n=10^5 and MOD=10^9, expected ~10 false positives per run. Double hashing uses two independent hash functions with different (BASE, MOD) pairs. The combined hash is (hash1, hash2). Collision probability = 1/(MOD1*MOD2) — negligible. Use double hashing when correctness is critical (competitive programming judges use anti-hash tests). Implementation: run two StringHasher instances with different parameters; compare both hashes.

Applications

LC 1044 (Longest Duplicate Substring): binary search on the length L. For each L, use a rolling hash to find all substrings of length L; store hashes in a set; if any hash repeats, L is achievable. O(n log n). LC 1062 (Longest Repeating Substring): same pattern without the binary search guarantee. LC 187 (Repeated DNA Sequences): fixed window (10 chars), rolling hash — find all windows that appear more than once. LC 28 (Find Needle in Haystack): Rabin-Karp is one solution (KMP is another). LC 718 (Maximum Length of Repeated Subarray): 2D extension — rolling hash on rows or binary search + 2D hash.

Asked at: Meta Interview Guide

Asked at: Apple Interview Guide

Asked at: Coinbase Interview Guide

Asked at: LinkedIn Interview Guide

Scroll to Top