Union-Find Core Concept
Union-Find (Disjoint Set Union, DSU) efficiently tracks which elements belong to the same connected component. Two operations: find(x): returns the root (representative) of x’s component. union(x, y): merges the components of x and y. Naive implementation: O(n) per operation. With two optimizations, nearly O(1) amortized: Path compression (in find): after finding the root, make every node on the path point directly to the root. Flattens the tree for future queries. Union by rank (in union): attach the smaller tree under the root of the larger tree. Prevents the tree from becoming a linked list. Together: O(α(n)) per operation where α is the inverse Ackermann function — essentially constant for any practical n.
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.components = n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry: return False # already connected
if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx
self.parent[ry] = rx
if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1
self.components -= 1
return True
Number of Connected Components (LC 323, 547)
Initialize n components (one per node). For each edge (u, v): union(u, v). If they were in different components: components -= 1. Return components count at the end. This is the most direct Union-Find application. LC 547 (Friend Circles / Number of Provinces): same pattern on an adjacency matrix. LC 684 (Redundant Connection): the redundant edge is the first edge (u, v) where find(u) == find(v) before union — they were already connected, so this edge creates a cycle.
Number of Islands with Union-Find (LC 200)
Alternative to BFS/DFS. Initialize all ‘1’ cells as separate components. For each ‘1’ cell: union with adjacent ‘1’ cells. Count remaining components after processing all cells. Each union reduces the count by 1. Final count = number of islands. BFS/DFS is usually preferred for this problem (simpler to implement, same O(mn) time). Union-Find shines when the grid is given incrementally (cells added over time, like LC 305 Number of Islands II): process one cell at a time, union with existing adjacent ‘1’ cells, track the component count after each addition.
Kruskal’s MST with Union-Find
Minimum Spanning Tree (LC 1135, 1168, 1584): connect all nodes with minimum total edge weight. Kruskal’s: sort edges by weight. Process edges in order: if the two endpoints are in different components (find(u) != find(v)), add this edge to the MST and union them. Stop when n-1 edges are added (spanning tree complete) or all edges are processed. O(E log E) for sorting. Union-Find makes the connectivity check O(α(n)). LC 1584 (Min Cost to Connect All Points): build edges between all point pairs (cost = Manhattan distance), run Kruskal’s. Note: for dense graphs (all pairs), Prim’s algorithm is O(n^2) which can be better than Kruskal’s O(n^2 log n).
Weighted Union-Find and Advanced Patterns
Weighted Union-Find: each parent edge has a weight. Queries ask about the relationship between two nodes (e.g., ratio between values — LC 399 Evaluate Division). find(x) returns both the root and the accumulated weight along the path. LC 399: equations give ratios (a/b = 2.0). Model as a weighted graph: union(a, b) with weight 2.0. find(x) returns (root, product_of_weights_along_path). To evaluate a/b: find both roots; if same root, return weight(a) / weight(b). LC 1202 (Smallest String with Swaps): union all swappable index pairs. Within each component, sort the characters and place them in sorted order at the component’s positions. The key insight: any permutation of elements within a component is achievable via the allowed swaps.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does path compression improve Union-Find performance?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Without path compression: find(x) traverses the chain from x to the root — O(depth) time. In the worst case (sequential unions without rank optimization), the tree becomes a linked list and find is O(n). With path compression: during find(x), after locating the root, update every node on the path to point directly to the root. This “flattens” the tree. On subsequent find calls for x or any node that was on the path: find returns in O(1). The amortized cost over n operations: O(n * u03b1(n)) total, where u03b1 is the inverse Ackermann function (effectively constant — u03b1(2^65536) = 4). Two forms: full path compression (point all nodes on the path to root) and path halving (point each node to its grandparent). Path halving is equally efficient and easier to implement iteratively. Implementation: in find, while traversing to root, set parent[x] = parent[parent[x]] (skip one level) and advance x. This halves the path length on each traversal without recursion.”
}
},
{
“@type”: “Question”,
“name”: “How do you detect cycles in an undirected graph using Union-Find?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Algorithm: initialize a Union-Find with n nodes. For each edge (u, v): check if find(u) == find(v). If yes: u and v are already in the same component — adding this edge creates a cycle. Return True (cycle detected). If no: union(u, v) and continue. If all edges are processed without finding a cycle: return False. Why it works: in an undirected graph, a cycle exists if and only if an edge connects two nodes already in the same connected component. Adding a new edge between different components just connects them (no cycle). Adding an edge within a component closes a cycle. This is O(E * u03b1(n)) — nearly O(E). Alternative: DFS with parent tracking (O(V + E)). Union-Find is preferred when you process edges one-by-one (online algorithm) or when you need to detect the specific edge that creates the cycle (the first edge where find(u) == find(v) is the redundant edge — LC 684).”
}
},
{
“@type”: “Question”,
“name”: “How does Union-Find solve the number of islands II problem (LC 305)?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “LC 305: given an mu00d7n grid (initially water), add land cells one by one. After each addition, return the number of islands. Naive approach: run BFS/DFS after each addition — O(k * mn) total for k additions. Union-Find approach: maintain a DSU where each cell index = row * n + col. For each land addition at (r, c): initialize the cell as its own component (components += 1). For each of its 4 neighbors that is also land: if find(cell) != find(neighbor): union them (components -= 1). Append components count to result. O(k * u03b1(mn)) total — nearly O(k). The key difference from static islands: with each land cell added, you potentially merge up to 4 existing components into 1. The Union-Find handles this incrementally without re-scanning the grid. This pattern (online connectivity updates) is where Union-Find outperforms BFS/DFS: DSU handles dynamic edge/node additions efficiently; BFS/DFS is better for static graphs queried once.”
}
},
{
“@type”: “Question”,
“name”: “How do you use Union-Find to solve the accounts merge problem (LC 721)?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “LC 721: given a list of accounts (each account is a list of strings where the first is a name and the rest are emails), merge accounts that share at least one email. Algorithm: (1) Map each email to an account index (email_to_id dictionary). (2) For each account: for each email in the account: if the email was seen before (in email_to_id), union the current account with the account that owns that email. Otherwise: add email u2192 current account to the map. (3) Group emails by their root account (find()). (4) For each group: sort the emails, prepend the account name, add to result. The Union-Find handles the transitive closure: if account 1 and account 2 share email A, and account 2 and account 3 share email B, then all three accounts should be merged. find() on any email gives the canonical root account. This is O(n * m * u03b1(n)) where m = average emails per account — nearly linear. DFS/BFS alternative: build a graph of accounts connected by shared emails, find connected components. Both work; Union-Find is more elegant.”
}
},
{
“@type”: “Question”,
“name”: “What is the time complexity of Union-Find with path compression and union by rank?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “With both optimizations together: O(m * u03b1(n)) for m operations on n elements, where u03b1 is the inverse Ackermann function. For all practical values of n (even n = 2^65536), u03b1(n) u2264 4. So the amortized cost per operation is effectively O(1). This is nearly optimal — any data structure supporting union and find must take u03a9(u03b1(n)) amortized per operation. Path compression alone (without rank): O(m * log n) amortized — still very fast but slightly worse. Union by rank alone (without path compression): O(m * log n) in the worst case — same asymptotic as with path compression alone. Neither optimization alone achieves the near-O(1) bound; you need both. In practice: Union-Find is used where BFS/DFS would be O(n^2) due to repeated re-traversal. The near-O(1) amortized cost means Union-Find is competitive with hash sets for connectivity queries and far faster than BFS for dynamic connectivity problems.”
}
}
]
}
Asked at: Meta Interview Guide
Asked at: Apple Interview Guide
Asked at: LinkedIn Interview Guide
Asked at: Atlassian Interview Guide