What Is Union-Find?
Union-Find (Disjoint Set Union, DSU) is a data structure that tracks a collection of elements partitioned into non-overlapping sets. It supports two operations in near-constant amortized time:
- find(x): which set does element x belong to? Returns the set’s representative (root).
- union(x, y): merge the sets containing x and y.
With two optimizations — path compression and union by rank — both operations run in amortized O(α(N)) time, where α is the inverse Ackermann function. In practice, α(N) ≤ 4 for any realistic N. Union-Find is essentially O(1) per operation.
Implementation
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py:
return False # already in same set
# union by rank
if self.rank[px] < self.rank[py]:
px, py = py, px
self.parent[py] = px
if self.rank[px] == self.rank[py]:
self.rank[px] += 1
return True
def connected(self, x, y):
return self.find(x) == self.find(y)
Path Compression
During find(x), make every node on the path from x to the root point directly to the root. This flattens the tree over time, making future finds faster. The compressed tree has at most 2 levels.
Union by Rank
When merging two trees, attach the smaller tree (by rank) under the root of the larger tree. This prevents degenerate chains (which would make find O(N)). Rank is an upper bound on tree height.
Pattern 1: Connected Components
Count connected components in an undirected graph. Initialize each node as its own set. For each edge (u, v), union(u, v). Count distinct roots after all edges.
def count_components(n, edges):
uf = UnionFind(n)
for u, v in edges:
uf.union(u, v)
return len(set(uf.find(i) for i in range(n)))
Equivalent to BFS/DFS but often cleaner when you only need component count, not actual component membership.
Pattern 2: Cycle Detection in Undirected Graphs
Before unioning (u, v): if find(u) == find(v), they’re already in the same set, meaning adding this edge creates a cycle.
def has_cycle(n, edges):
uf = UnionFind(n)
for u, v in edges:
if not uf.union(u, v): # union returns False if already connected
return True
return False
Used in Kruskal’s MST algorithm: process edges by weight, skip edges that would form a cycle.
Pattern 3: Kruskal’s Minimum Spanning Tree
def kruskal(n, edges):
edges.sort(key=lambda e: e[2]) # sort by weight
uf = UnionFind(n)
mst_weight = 0
for u, v, w in edges:
if uf.union(u, v):
mst_weight += w
return mst_weight
Greedy: always add the cheapest edge that doesn’t form a cycle. Union-Find makes cycle detection O(α(N)), making Kruskal’s O(E log E) overall (dominated by sorting).
Pattern 4: Number of Islands (2D Grid)
Treat each cell as a node. Map (row, col) → row * cols + col. Union adjacent land cells (‘1’). Count distinct roots among all land cells.
Union-Find for islands is competitive with BFS/DFS — same O(N*M) time complexity. Advantage: can process islands incrementally (add cells one at a time) — useful for “Number of Islands II” where islands appear dynamically.
Pattern 5: Accounts Merge (Grouping by Shared Element)
Multiple accounts may belong to the same person if they share an email. Union accounts sharing any email. All accounts with the same root form one person’s accounts.
# Map each email to first account that used it
email_to_account = {}
uf = UnionFind(len(accounts))
for i, account in enumerate(accounts):
for email in account[1:]:
if email in email_to_account:
uf.union(i, email_to_account[email])
else:
email_to_account[email] = i
Pattern 6: Redundant Connection
Given a tree with one extra edge, find the edge that creates a cycle. Process edges in order; the first edge where union returns False (nodes already connected) is the redundant edge. Return it.
When to Use Union-Find vs. BFS/DFS
- Use Union-Find: when you need dynamic connectivity (edges added incrementally), cycle detection in undirected graphs, MST (Kruskal’s), grouping by shared property
- Use BFS/DFS: when you need to find actual paths, shortest paths, or process graph level by level
- Union-Find advantage: near-constant time per operation, trivial implementation, handles dynamic edge additions
- BFS/DFS advantage: more general, works on directed graphs, finds actual paths
Interview Tips
- Always implement both path compression and union by rank — without both, the complexity guarantee doesn’t hold.
- For grid problems, flatten 2D coordinates to 1D:
idx = row * cols + col. - The union() return value (True/False indicating whether a merge happened) is critical for cycle detection and Kruskal’s — make sure your implementation returns it.
- Common mistakes: forgetting path compression (making find recursive but not compressing), or forgetting to check if nodes are already connected before counting a component merge.