Trie and String Matching Interview Patterns
Tries (prefix trees) and string matching algorithms are a distinct category in coding interviews. They appear in autocomplete, spell checker, word search, and IP routing problems. This guide covers the Trie data structure, its key operations, and the classic string matching algorithms you need to know.
Trie Data Structure
A Trie stores strings by their prefixes. Each node represents one character; the path from root to a node spells out a prefix.
class TrieNode:
def __init__(self):
self.children = {} # char -> TrieNode
self.is_end = False # marks end of a word
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_end = True
def search(self, word: str) -> bool:
node = self.root
for ch in word:
if ch not in node.children: return False
node = node.children[ch]
return node.is_end
def starts_with(self, prefix: str) -> bool:
node = self.root
for ch in prefix:
if ch not in node.children: return False
node = node.children[ch]
return True
Time: O(m) for insert/search/prefix, where m = word length. Space: O(total characters across all words).
Autocomplete (Design Search Autocomplete System — LeetCode 642)
Each TrieNode stores the top-3 matching sentences for its prefix:
class AutocompleteSystem:
def __init__(self, sentences, times):
self.root = TrieNode()
self.curr_input = []
for s, t in zip(sentences, times):
self._insert(s, t)
def _insert(self, s, count):
node = self.root
for ch in s:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
# keep top 3 by count at each node
node.top3 = sorted(node.top3 + [(count, s)],
key=lambda x: (-x[0], x[1]))[:3]
node.is_end = True
def input(self, c):
if c == '#':
self._insert(''.join(self.curr_input), 1)
self.curr_input = []
return []
self.curr_input.append(c)
node = self.root
for ch in self.curr_input:
if ch not in node.children: return []
node = node.children[ch]
return [s for _, s in node.top3]
Word Search II (LeetCode 212) — Trie + Backtracking
Find all words from a dictionary that exist in a 2D grid. Build a Trie from the dictionary, then DFS from each cell:
def find_words(board, words):
trie = Trie()
for w in words: trie.insert(w)
result = set()
rows, cols = len(board), len(board[0])
def dfs(node, r, c, path):
if node.is_end: result.add(path)
if not (0 <= r < rows and 0 <= c < cols): return
ch = board[r][c]
if ch not in node.children or ch == '#': return
board[r][c] = '#' # mark visited
next_node = node.children[ch]
for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
dfs(next_node, r+dr, c+dc, path+ch)
board[r][c] = ch # restore
for r in range(rows):
for c in range(cols):
dfs(trie.root, r, c, '')
return list(result)
# Trie pruning avoids re-exploring dead-end prefixes
KMP (Knuth-Morris-Pratt) Pattern Matching
Find all occurrences of pattern P in text T in O(n+m) time (vs O(n*m) naive).
def kmp_search(text, pattern):
def build_lps(pattern):
# lps[i] = length of longest proper prefix of pattern[:i+1]
# that is also a suffix
lps = [0] * len(pattern)
length, i = 0, 1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
elif length:
length = lps[length - 1] # don't increment i
else:
lps[i] = 0
i += 1
return lps
lps = build_lps(pattern)
matches = []
i = j = 0 # i = text index, j = pattern index
while i < len(text):
if text[i] == pattern[j]:
i += 1; j += 1
if j == len(pattern):
matches.append(i - j)
j = lps[j - 1]
elif i < len(text) and text[i] != pattern[j]:
if j: j = lps[j - 1]
else: i += 1
return matches
The LPS (Longest Proper Prefix which is also Suffix) array lets KMP skip redundant comparisons by reusing already-matched prefix information.
Rabin-Karp Rolling Hash
Use hashing to find pattern occurrences in O(n+m) average. Useful for multi-pattern search.
def rabin_karp(text, pattern):
BASE, MOD = 26, 10**9 + 7
n, m = len(text), len(pattern)
if m > n: return []
def char_val(c): return ord(c) - ord('a')
# Compute hash of pattern and first window
p_hash = t_hash = 0
high = pow(BASE, m-1, MOD)
for i in range(m):
p_hash = (p_hash * BASE + char_val(pattern[i])) % MOD
t_hash = (t_hash * BASE + char_val(text[i])) % MOD
matches = []
for i in range(n - m + 1):
if t_hash == p_hash and text[i:i+m] == pattern: # verify to avoid hash collision
matches.append(i)
if i < n - m:
t_hash = (t_hash - char_val(text[i]) * high) % MOD
t_hash = (t_hash * BASE + char_val(text[i+m])) % MOD
return matches
IP Address Routing (Longest Prefix Match)
Routers use a bitwise Trie to find the longest matching prefix for an IP address. Each bit of the IP (0 or 1) is one level in the Trie. This is how CIDR routing tables work — lookup is O(32) for IPv4, O(128) for IPv6.
Common Trie Problems
- LeetCode 208: Implement Trie
- LeetCode 211: Design Add and Search Words (wildcard '.' matching)
- LeetCode 212: Word Search II (Trie + backtracking)
- LeetCode 642: Design Search Autocomplete System
- LeetCode 648: Replace Words (prefix replacement)
- LeetCode 720: Longest Word in Dictionary
Interview Tips
- Always implement Trie with a dict of children, not a fixed 26-char array — handles Unicode, faster to write
- For Word Search II, the key insight is using Trie instead of a set to enable prefix pruning
- KMP is often asked conceptually — explain the LPS table intuition even if you can't code it perfectly under pressure
- Rabin-Karp is the answer when you need to find multiple patterns simultaneously (K patterns, O(n+Km) total)
- For autocomplete system design, mention storing top-K results at each Trie node to avoid DFS on every query