What Is Union-Find?
Union-Find (also called Disjoint Set Union or DSU) is a data structure that efficiently tracks which elements belong to the same connected component. It supports two operations: find(x) — returns the representative (root) of x’s component, and union(x, y) — merges the components of x and y. The naive implementation runs in O(n) per operation; with path compression and union by rank, both operations run in effectively O(1) amortized (inverse Ackermann function O(α(n)), which is ≤ 4 for any practical n).
Implementation with Path Compression + Union by Rank
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.components = n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py:
return False # already in same component
if self.rank[px] < self.rank[py]:
px, py = py, px
self.parent[py] = px # attach smaller rank tree under larger
if self.rank[px] == self.rank[py]:
self.rank[px] += 1
self.components -= 1
return True
def connected(self, x, y):
return self.find(x) == self.find(y)
Path compression: in find(), after recursively finding the root, set every node on the path to point directly to the root. Subsequent finds on the same path are O(1). Union by rank: always attach the shorter tree under the taller tree. This keeps the tree height O(log n), ensuring find() stays fast even without path compression.
Classic Problems
Number of Connected Components (LC 323): create UF with n nodes. For each edge (u, v), union(u, v). Return uf.components. O(n + E × α(n)).
Redundant Connection (LC 684): find the edge that creates a cycle in an undirected graph. Process edges one by one. For each edge (u, v): if find(u) == find(v), this edge creates a cycle — return it. Otherwise union(u, v). The first edge where both endpoints are already in the same component is the answer.
Number of Islands (LC 200): can be solved with BFS/DFS (more common) or UF. Create UF with rows×cols nodes, treating each cell as a node. For every ‘1’ cell, union it with adjacent ‘1’ cells. Return uf.components minus the count of ‘0’ cells.
Accounts Merge (LC 721): each account has a name and list of emails. Merge accounts that share at least one email. UF approach: create a UF over emails (use a dict mapping email → index). For each account, union all emails with the first email in the account. After all unions, group emails by their root. Map roots back to account names.
Making a Large Island (LC 827): given a grid of 0s and 1s, flip at most one 0 to 1 and find the largest island. Pre-compute island sizes using UF. For each 0 cell, sum up sizes of distinct neighboring components (using their roots to deduplicate). Answer is max of (1 + sum of neighboring component sizes).
Smallest String With Swaps (LC 1202): given pairs of indices that can be freely swapped, find the lexicographically smallest string. Union all index pairs. Group indices by their root (all indices in the same component can be freely rearranged). Sort characters within each component and assign them back in sorted order to their sorted positions.
Weighted Union-Find
For problems involving ratios or relative values between connected elements (e.g., LC 399 Evaluate Division), use a weighted Union-Find where each node stores a weight relative to its parent. find() returns both the root and the accumulated weight product along the path. union() sets the weight of the new edge to maintain consistency.
When to Use Union-Find vs BFS/DFS
Use Union-Find when: edges are added incrementally (online algorithm), you need to check connectivity repeatedly after each add, or the problem involves merging components. Use BFS/DFS when: you need to traverse the graph (finding paths, levels, distances), edges can be deleted, or you need more than just connectivity (shortest path, topological sort). For static graphs (all edges given upfront), BFS/DFS and UF have similar complexity — prefer whichever is easier to implement correctly.
Interview Tips
- Always initialize parent[i] = i and track component count.
- The return value of union() (True if newly merged, False if already connected) is useful for detecting cycles or counting merges.
- For grid problems, convert (row, col) to a linear index: idx = row * cols + col.
- If nodes are strings or arbitrary objects, use a dictionary for parent instead of an array.
- Path compression is a one-liner in Python using recursion; use an iterative version to avoid stack overflow on very deep trees.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is path compression in Union-Find and why does it matter?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Path compression is an optimization in the find() operation that flattens the tree structure after each lookup. When find(x) is called, it recursively finds the root of x’s component. Without path compression, all nodes on the path from x to root remain at their original positions u2014 future finds on the same path traverse the same chain. With path compression, after finding the root, every node visited on the path is updated to point directly to the root: self.parent[x] = self.find(self.parent[x]). This two-pass approach (find root, then flatten) makes subsequent calls on any node in the path O(1). Combined with union by rank (always attach the shallower tree under the deeper tree), the amortized time per operation becomes O(u03b1(n)), the inverse Ackermann function u2014 effectively constant for all practical values of n (u03b1(n) u2264 4 for n < 10^600). Without these optimizations, a chain of union operations creates a linked list with O(n) find time."
}
},
{
"@type": "Question",
"name": "How do you detect a cycle in an undirected graph using Union-Find?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Process edges one by one. For each edge (u, v): find the root of u and the root of v. If they have the same root, u and v are already in the same connected component u2014 adding edge (u, v) creates a cycle. Return True (cycle detected). If they have different roots, they are in different components u2014 merge them with union(u, v). If all edges are processed without finding same-root endpoints, the graph is acyclic. This is used in LC 684 (Redundant Connection) to find the specific edge that creates the cycle u2014 return the first edge where find(u) == find(v). Time O(E u00d7 u03b1(V)), essentially O(E). This approach is more efficient than DFS cycle detection for sparse graphs with many connectivity queries, and is the basis for Kruskal's Minimum Spanning Tree algorithm (which adds edges greedily, skipping any edge that would create a cycle)."
}
},
{
"@type": "Question",
"name": "What is the difference between Union-Find and BFS/DFS for graph connectivity problems?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Union-Find and BFS/DFS both solve graph connectivity but have different strengths. Union-Find excels at: dynamic connectivity (edges are added over time and you need connectivity queries after each addition), multiple connectivity queries on a static graph (O(u03b1(n)) per query after O(E u00d7 u03b1(n)) preprocessing), and problems where you're merging components (accounts merge, minimum spanning tree). BFS/DFS excels at: finding the actual path between two nodes, computing shortest distances, detecting cycles in directed graphs (Union-Find works only for undirected), topological sorting, and bipartite checking. For a one-time "how many connected components" query on a static graph, both approaches are O(V + E) u2014 use whichever is simpler to implement. For "is X connected to Y after adding edge Z?", Union-Find is clearly better: O(u03b1(n)) per query vs O(V + E) per query for BFS/DFS on the updated graph."
}
}
]
}