Advanced Trie Interview Patterns: Word Search, Palindrome Pairs, and Replace Words (2025)

Trie Recap and When to Use

A Trie (prefix tree) stores strings character by character. Insert and search are both O(L) per word where L = word length. The key advantage over a hash set: the trie supports prefix operations efficiently — find all words with a given prefix, count words with a prefix, or prune DFS early when no words match. Use a Trie when: the problem involves prefix matching, autocomplete, or searching for words with a shared structure. Often a better fit than sorting + binary search when multiple prefix queries are needed.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False
        self.word = None  # store word at end node for easy retrieval

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for c in word:
            if c not in node.children:
                node.children[c] = TrieNode()
            node = node.children[c]
        node.is_end = True
        node.word = word

    def search(self, word) -> bool:
        node = self.root
        for c in word:
            if c not in node.children: return False
            node = node.children[c]
        return node.is_end

Pattern 1: Word Search II (LC 212)

Find all words from a dictionary that exist in a 2D character grid. Naive approach: for each word, DFS in the grid — O(words * 4^L). With a Trie: build a trie of all dictionary words. DFS on the grid; at each cell, check if the current path is a trie prefix. If not: prune the DFS immediately. If the node is a word end: record the found word. This finds all words in one DFS pass — O(4^L) for DFS instead of O(words * 4^L). Optimization: mark a trie node as already-found after it’s added to results (delete from trie or set a flag) — avoids re-adding the same word if it appears multiple times in the grid. Mark cells as visited during DFS (set to ‘#’, restore after backtrack).

def find_words(board, words):
    trie = Trie()
    for w in words: trie.insert(w)
    m, n = len(board), len(board[0])
    result = []

    def dfs(node, r, c):
        char = board[r][c]
        if char not in node.children: return
        next_node = node.children[char]
        if next_node.word:
            result.append(next_node.word)
            next_node.word = None  # deduplicate
        board[r][c] = '#'  # mark visited
        for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]:
            nr, nc = r+dr, c+dc
            if 0 <= nr < m and 0 <= nc < n and board[nr][nc] != '#':
                dfs(next_node, nr, nc)
        board[r][c] = char  # restore

    for r in range(m):
        for c in range(n):
            dfs(trie.root, r, c)
    return result

Pattern 2: Palindrome Pairs (LC 336)

Find all pairs (i, j) where words[i] + words[j] is a palindrome. Brute force: O(n^2 * L). Trie approach: insert all reversed words into a trie. For each word W: traverse the trie with W’s characters. At each trie node encountered: if the remaining suffix of W (from the current trie position) is a palindrome AND the trie path ends a word: found a pair. Also check the full reversal: if W exactly matches a reversed word (not itself): that’s a pair. O(n * L^2) — much better than O(n^2 * L). The palindrome check at each node is the key: it means the word from the trie + the palindromic suffix of W + the trie word form a complete palindrome. This pattern requires careful index tracking — easier to implement with a hash map (store {word: index}) than a Trie in practice.

Pattern 3: Replace Words (LC 648)

Replace words in a sentence with their shortest root from a dictionary. Build a trie from the dictionary. For each word in the sentence: traverse the trie character by character. The first time we hit an is_end node: replace the word with the root (the path taken so far). If no root found: keep the original word. O(n * L) where n = sentence words, L = max word length. This is the canonical “find the shortest prefix that exists in a set” operation — O(L) per word with a trie vs O(L^2) with a hash set (checking each prefix one by one).

Pattern 4: Map Sum Pairs (LC 677) and Prefix Count

Insert key-value pairs into a trie. Query: sum of values of all keys with a given prefix. Augment each trie node with a sum (total value of all words in its subtree). On insert: update sum for every node on the path (not just the leaf). On prefix query: traverse to the prefix node, return its sum. O(L) per operation. Similar pattern: count words with prefix — store count instead of sum. This generalized “aggregate over subtree” pattern is powerful: precompute any aggregate at each node and keep it updated on insert/delete.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is the Trie different from a HashMap for prefix matching?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “HashMap: O(1) lookup for exact match. No support for prefix queries without iterating all keys (O(n)). To find all words starting with “ap”: iterate every key and check prefix — O(n * L) total. Trie: O(P) to navigate to the prefix node. All words under that node share the prefix — no need to examine keys that don’t match. To find all words starting with “ap”: traverse “a” u2192 “p”, then DFS the subtree — O(subtree size). Key advantage: the trie’s structure IS the prefix index. Shared prefixes are physically shared in the tree (memory-efficient for natural language with many shared prefixes). Additional operations: count all words with a given prefix (store count at each node: O(P)), find the longest common prefix (traverse until the first branching node: O(LCP length)). HashMap cannot do either without O(n) scans. When to use HashMap instead of Trie: when you only need exact lookups (no prefix queries), or when the key space has no meaningful prefix structure (random UUIDs, hashed values). When to use Trie: prefix operations, autocomplete, word validation with prefix constraints.”
}
},
{
“@type”: “Question”,
“name”: “How does the trie optimize the Word Search II grid DFS?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Word Search II (LC 212): find all dictionary words in a 2D grid. Naive: for each word, do a grid DFS — O(words * m * n * 4^L). With a Trie: a single DFS over the grid uses the trie to simultaneously search for all words. At each grid cell, check if the current character leads to a valid trie prefix. If not: prune this DFS path immediately — no word can be formed here. If yes: continue DFS. If a trie node is a word end: found a word. This transforms O(words * 4^L) into O(4^L) for the DFS exploration (L = max word length). The trie amortizes the cost across all words. Pruning is the key: if the current grid path spells “xyz” and no dictionary word starts with “xyz”, the trie check immediately fails at the “x” node and the entire subtree of DFS paths from this cell is skipped. Without the trie, every dictionary word would be checked against every DFS path. Trie optimization: after a word is found, delete the word from the trie (set is_end=False, clean up childless nodes). This prevents re-finding the same word and further prunes DFS paths.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement a case-insensitive trie or one that handles unicode?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Case-insensitive trie: normalize all inserted strings to lowercase. When searching, also normalize the query to lowercase. All comparisons are case-insensitive without any extra trie complexity. If case needs to be preserved (display original case) but search is case-insensitive: store the original string at the leaf node, but use lowercase characters as trie node keys. Unicode trie: the standard trie with 26-character array slots doesn’t work for unicode (hundreds of thousands of codepoints). Solutions: (1) HashMap children: instead of children = [None] * 26, use children = {} (a Python dict or Java HashMap). Keys are arbitrary characters. Space efficient for sparse alphabets (unicode). Slightly slower than array indexing but acceptable. (2) Compact trie (Patricia trie / radix tree): merge chains of single-child nodes into edge labels. Instead of one node per character, store substrings on edges. Reduces space dramatically for sparse keys. Used in IP routing tables and file system paths. (3) DAWG (Directed Acyclic Word Graph): compress common suffixes in addition to prefixes. Optimal space for static word sets. Complex to build, read-only after construction.”
}
},
{
“@type”: “Question”,
“name”: “How do you delete a word from a trie efficiently?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Trie deletion: unset the is_end flag at the word’s terminal node. Optionally clean up nodes that have no remaining children and are not word ends (to reclaim memory). Algorithm: DFS to the end of the word. Mark is_end = False. On the way back up: for each node, if it has no children AND is_end is False: delete it (unlink from its parent). Stop the cleanup when you hit a node that either has other children or is a word end — those must be kept. Implementation: the DFS function returns True if the current node can be deleted (empty and not a word end). Parent uses this to unlink the child. Edge cases: deleting a word that is a prefix of another word (e.g., “apple” from {“apple”, “apples”}): only unset is_end, do not delete any nodes (all nodes are shared with “apples”). Deleting a word where another word is its prefix (e.g., “apples” from {“app”, “apples”}): clean up nodes from “s” back up to the “l” node — stop at “p” which has is_end=True (“app”). Time complexity: O(L) for deletion. Space freed: O(unique suffix length) — the portion of the path not shared with any other word.”
}
},
{
“@type”: “Question”,
“name”: “What is the XOR trie and how does it solve maximum XOR problems?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “XOR trie (binary trie) stores numbers in binary. Each level represents one bit (from MSB to LSB). Insert: trace the path from MSB to LSB, creating nodes as needed. Maximum XOR of a number with any number in the trie: greedily choose the bit that maximizes XOR. At each bit level: we want the XOR bit to be 1. If the current number’s bit is 0: we want the trie to have a 1 at this level (go to the 1-child if it exists). If the bit is 1: we want the trie to have a 0. If the preferred path doesn’t exist: take the available path. After traversing all bits: XOR value = sum of chosen bits weighted by their position. O(32) per operation (32 bits for integers). LC 421 (Maximum XOR of Two Numbers in an Array): insert all numbers, then for each number query the trie for its maximum XOR pair. O(n * 32) total. LC 1707 (Maximum XOR with an element from array): offline queries sorted by limit, add numbers to trie in order, query max XOR for each query. The binary trie is space-efficient (at most 2^32 nodes but in practice much fewer due to shared prefixes among similar numbers).”
}
}
]
}

Asked at: Meta Interview Guide

Asked at: Apple Interview Guide

Asked at: Coinbase Interview Guide

Asked at: Atlassian Interview Guide

Scroll to Top