What is path compression in Union-Find and why is it important?

Path compression flattens the tree during find() operations. When find(x) traverses nodes from x up to the root, it makes every visited node point directly to the root. This is done with a one-line recursive implementation: parent[x] = find(parent[x]). After compression, the subtree is almost completely flat — future finds for any node in that subtree go directly to the root in O(1). Without path compression, find() takes O(log N) with union by rank, or O(N) in degenerate cases without any optimization. With path compression alone (no rank): amortized O(log N). With both compression and union by rank: amortized O(α(N)) — essentially constant for any practical input size.

How does Union-Find detect cycles in an undirected graph?

Initialize Union-Find with N nodes (one per vertex). Process each edge (u, v): call find(u) and find(v). If they return the same root, u and v are already in the same connected component — adding this edge would create a cycle. Return True (cycle detected). If different roots, call union(u, v) to merge the two components and continue. If all edges are processed without finding a cycle, return False. This is O(E * α(V)) — essentially O(E). This is exactly how Kruskal's MST algorithm works: process edges by increasing weight, skip edges that would create a cycle (same component), add the rest to the MST.

What interview problems are best solved with Union-Find?

Union-Find excels when: (1) you need to track connected components dynamically as edges are added — Number of Islands II (cells added one at a time), (2) cycle detection in undirected graphs — Redundant Connection, Graph Valid Tree, (3) grouping by shared elements — Accounts Merge (emails connecting accounts), Friend Circles, (4) MST construction — Kruskal's algorithm. Choose Union-Find over BFS/DFS when: edges are added dynamically (BFS/DFS would require restarting from scratch), or when you only need connectivity information (not paths). For static graphs where you need to find actual paths or shortest distances, BFS/DFS is cleaner. The pattern: if the problem says "group", "connect", "merge", or "detect cycle in undirected graph", think Union-Find first.

Union-Find (Disjoint Set Union) Interview Patterns: Connected Components, Kruskal’s MST

⏱ 6 min read

What Is Union-Find?

Union-Find (Disjoint Set Union, DSU) is a data structure that tracks a collection of elements partitioned into non-overlapping sets. It supports two operations in near-constant amortized time:

find(x): which set does element x belong to? Returns the set’s representative (root).
union(x, y): merge the sets containing x and y.

With two optimizations — path compression and union by rank — both operations run in amortized O(α(N)) time, where α is the inverse Ackermann function. In practice, α(N) ≤ 4 for any realistic N. Union-Find is essentially O(1) per operation.

Databricks Interview Guide

Atlassian Interview Guide

Uber Interview Guide

LinkedIn Interview Guide

Apple Interview Guide

Meta Interview Guide

Implementation

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # path compression
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False  # already in same set
        # union by rank
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

    def connected(self, x, y):
        return self.find(x) == self.find(y)

Path Compression

During find(x), make every node on the path from x to the root point directly to the root. This flattens the tree over time, making future finds faster. The compressed tree has at most 2 levels.

Union by Rank

When merging two trees, attach the smaller tree (by rank) under the root of the larger tree. This prevents degenerate chains (which would make find O(N)). Rank is an upper bound on tree height.

Pattern 1: Connected Components

Count connected components in an undirected graph. Initialize each node as its own set. For each edge (u, v), union(u, v). Count distinct roots after all edges.

def count_components(n, edges):
    uf = UnionFind(n)
    for u, v in edges:
        uf.union(u, v)
    return len(set(uf.find(i) for i in range(n)))

Equivalent to BFS/DFS but often cleaner when you only need component count, not actual component membership.

Pattern 2: Cycle Detection in Undirected Graphs

Before unioning (u, v): if find(u) == find(v), they’re already in the same set, meaning adding this edge creates a cycle.

def has_cycle(n, edges):
    uf = UnionFind(n)
    for u, v in edges:
        if not uf.union(u, v):  # union returns False if already connected
            return True
    return False

Used in Kruskal’s MST algorithm: process edges by weight, skip edges that would form a cycle.

Pattern 3: Kruskal’s Minimum Spanning Tree

def kruskal(n, edges):
    edges.sort(key=lambda e: e[2])  # sort by weight
    uf = UnionFind(n)
    mst_weight = 0
    for u, v, w in edges:
        if uf.union(u, v):
            mst_weight += w
    return mst_weight

Greedy: always add the cheapest edge that doesn’t form a cycle. Union-Find makes cycle detection O(α(N)), making Kruskal’s O(E log E) overall (dominated by sorting).

Pattern 4: Number of Islands (2D Grid)

Treat each cell as a node. Map (row, col) → row * cols + col. Union adjacent land cells (‘1’). Count distinct roots among all land cells.

Union-Find for islands is competitive with BFS/DFS — same O(N*M) time complexity. Advantage: can process islands incrementally (add cells one at a time) — useful for “Number of Islands II” where islands appear dynamically.

Pattern 5: Accounts Merge (Grouping by Shared Element)

Multiple accounts may belong to the same person if they share an email. Union accounts sharing any email. All accounts with the same root form one person’s accounts.

# Map each email to first account that used it
email_to_account = {}
uf = UnionFind(len(accounts))
for i, account in enumerate(accounts):
    for email in account[1:]:
        if email in email_to_account:
            uf.union(i, email_to_account[email])
        else:
            email_to_account[email] = i

Pattern 6: Redundant Connection

Given a tree with one extra edge, find the edge that creates a cycle. Process edges in order; the first edge where union returns False (nodes already connected) is the redundant edge. Return it.

When to Use Union-Find vs. BFS/DFS

Use Union-Find: when you need dynamic connectivity (edges added incrementally), cycle detection in undirected graphs, MST (Kruskal’s), grouping by shared property
Use BFS/DFS: when you need to find actual paths, shortest paths, or process graph level by level
Union-Find advantage: near-constant time per operation, trivial implementation, handles dynamic edge additions
BFS/DFS advantage: more general, works on directed graphs, finds actual paths

Interview Tips

Always implement both path compression and union by rank — without both, the complexity guarantee doesn’t hold.
For grid problems, flatten 2D coordinates to 1D: idx = row * cols + col.
The union() return value (True/False indicating whether a merge happened) is critical for cycle detection and Kruskal’s — make sure your implementation returns it.
Common mistakes: forgetting path compression (making find recursive but not compressing), or forgetting to check if nodes are already connected before counting a component merge.