Union-Find (Disjoint Set Union) Interview Patterns

What Is Union-Find?

Union-Find (also called Disjoint Set Union or DSU) is a data structure that efficiently tracks which elements belong to the same connected component. It supports two operations: find(x) — returns the representative (root) of x’s component, and union(x, y) — merges the components of x and y. The naive implementation runs in O(n) per operation; with path compression and union by rank, both operations run in effectively O(1) amortized (inverse Ackermann function O(α(n)), which is ≤ 4 for any practical n).

Implementation with Path Compression + Union by Rank

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.components = n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # path compression
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False  # already in same component
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px  # attach smaller rank tree under larger
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        self.components -= 1
        return True

    def connected(self, x, y):
        return self.find(x) == self.find(y)

Path compression: in find(), after recursively finding the root, set every node on the path to point directly to the root. Subsequent finds on the same path are O(1). Union by rank: always attach the shorter tree under the taller tree. This keeps the tree height O(log n), ensuring find() stays fast even without path compression.

Classic Problems

Number of Connected Components (LC 323): create UF with n nodes. For each edge (u, v), union(u, v). Return uf.components. O(n + E × α(n)).

Redundant Connection (LC 684): find the edge that creates a cycle in an undirected graph. Process edges one by one. For each edge (u, v): if find(u) == find(v), this edge creates a cycle — return it. Otherwise union(u, v). The first edge where both endpoints are already in the same component is the answer.

Number of Islands (LC 200): can be solved with BFS/DFS (more common) or UF. Create UF with rows×cols nodes, treating each cell as a node. For every ‘1’ cell, union it with adjacent ‘1’ cells. Return uf.components minus the count of ‘0’ cells.

Accounts Merge (LC 721): each account has a name and list of emails. Merge accounts that share at least one email. UF approach: create a UF over emails (use a dict mapping email → index). For each account, union all emails with the first email in the account. After all unions, group emails by their root. Map roots back to account names.

Making a Large Island (LC 827): given a grid of 0s and 1s, flip at most one 0 to 1 and find the largest island. Pre-compute island sizes using UF. For each 0 cell, sum up sizes of distinct neighboring components (using their roots to deduplicate). Answer is max of (1 + sum of neighboring component sizes).

Smallest String With Swaps (LC 1202): given pairs of indices that can be freely swapped, find the lexicographically smallest string. Union all index pairs. Group indices by their root (all indices in the same component can be freely rearranged). Sort characters within each component and assign them back in sorted order to their sorted positions.

Weighted Union-Find

For problems involving ratios or relative values between connected elements (e.g., LC 399 Evaluate Division), use a weighted Union-Find where each node stores a weight relative to its parent. find() returns both the root and the accumulated weight product along the path. union() sets the weight of the new edge to maintain consistency.

When to Use Union-Find vs BFS/DFS

Use Union-Find when: edges are added incrementally (online algorithm), you need to check connectivity repeatedly after each add, or the problem involves merging components. Use BFS/DFS when: you need to traverse the graph (finding paths, levels, distances), edges can be deleted, or you need more than just connectivity (shortest path, topological sort). For static graphs (all edges given upfront), BFS/DFS and UF have similar complexity — prefer whichever is easier to implement correctly.

Interview Tips

  • Always initialize parent[i] = i and track component count.
  • The return value of union() (True if newly merged, False if already connected) is useful for detecting cycles or counting merges.
  • For grid problems, convert (row, col) to a linear index: idx = row * cols + col.
  • If nodes are strings or arbitrary objects, use a dictionary for parent instead of an array.
  • Path compression is a one-liner in Python using recursion; use an iterative version to avoid stack overflow on very deep trees.

  • Coinbase Interview Guide
  • Atlassian Interview Guide
  • Uber Interview Guide
  • LinkedIn Interview Guide
  • Apple Interview Guide
  • Companies That Ask This

    Scroll to Top