Union-Find Data Structure
Union-Find (also called Disjoint Set Union, DSU) tracks which elements belong to the same connected component. Supports two operations in nearly O(1) amortized time: find(x): returns the representative (root) of x’s component. Two elements are in the same component if find(x) == find(y). union(x, y): merges the components containing x and y. Applications: detecting cycles in an undirected graph, finding connected components, Kruskal’s MST algorithm, network connectivity, grouping problems (accounts merge, friend circles).
Implementation with Path Compression and Union by Rank
class UnionFind:
def __init__(self, n: int):
self.parent = list(range(n)) # parent[i] = i (self-loop for roots)
self.rank = [0] * n # tree height upper bound
self.components = n # count of distinct components
def find(self, x: int) -> int:
# Path compression: flatten the tree
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # recursive compression
return self.parent[x]
def union(self, x: int, y: int) -> bool:
rx, ry = self.find(x), self.find(y)
if rx == ry:
return False # already in same component (cycle detected!)
# Union by rank: attach smaller tree under larger
if self.rank[rx] < self.rank[ry]:
rx, ry = ry, rx
self.parent[ry] = rx
if self.rank[rx] == self.rank[ry]:
self.rank[rx] += 1
self.components -= 1
return True # successfully merged
Complexity: with both optimizations, find and union are O(alpha(n)) amortized — alpha is the inverse Ackermann function, effectively O(1) for all practical input sizes (alpha(n) <= 4 for n < 10^600).
Number of Connected Components (LC 323) and Friend Circles (LC 547)
def count_components(n: int, edges: list[list[int]]) -> int:
uf = UnionFind(n)
for u, v in edges:
uf.union(u, v)
return uf.components # starts at n, decremented on each successful union
def find_circle_num(is_connected: list[list[int]]) -> int:
n = len(is_connected)
uf = UnionFind(n)
for i in range(n):
for j in range(i + 1, n):
if is_connected[i][j]:
uf.union(i, j)
return uf.components
Detecting Cycles in an Undirected Graph
def has_cycle(n: int, edges: list[list[int]]) -> bool:
uf = UnionFind(n)
for u, v in edges:
if not uf.union(u, v):
return True # u and v already in same component = cycle
return False
Accounts Merge (LC 721)
Group accounts that share at least one email. Two accounts that share an email belong to the same person. Map each email to an account index. Union accounts that share an email.
def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
uf = UnionFind(len(accounts))
email_to_account = {} # email -> first account index that owns it
for i, account in enumerate(accounts):
for email in account[1:]:
if email in email_to_account:
uf.union(i, email_to_account[email])
else:
email_to_account[email] = i
# Group emails by root account
from collections import defaultdict
root_emails = defaultdict(set)
for email, idx in email_to_account.items():
root_emails[uf.find(idx)].add(email)
result = []
for root, emails in root_emails.items():
result.append([accounts[root][0]] + sorted(emails))
return result
Kruskal’s MST with Union-Find
def kruskal_mst(n: int, edges: list[tuple]) -> int:
# edges: list of (weight, u, v)
edges.sort() # sort by weight
uf = UnionFind(n)
mst_cost = 0
edges_used = 0
for weight, u, v in edges:
if uf.union(u, v): # if not already connected
mst_cost += weight
edges_used += 1
if edges_used == n - 1:
break # MST complete (n-1 edges)
return mst_cost if edges_used == n - 1 else -1 # -1 if not connected
See also: Meta Interview Prep
See also: Atlassian Interview Prep
See also: Databricks Interview Prep