Question 1

How does path compression improve Union-Find performance?

Accepted Answer

Without path compression: find(x) traverses the chain from x to the root -- O(depth) time. In the worst case (sequential unions without rank optimization), the tree becomes a linked list and find is O(n). With path compression: during find(x), after locating the root, update every node on the path to point directly to the root. This "flattens" the tree. On subsequent find calls for x or any node that was on the path: find returns in O(1). The amortized cost over n operations: O(n * u03b1(n)) total, where u03b1 is the inverse Ackermann function (effectively constant -- u03b1(2^65536) = 4). Two forms: full path compression (point all nodes on the path to root) and path halving (point each node to its grandparent). Path halving is equally efficient and easier to implement iteratively. Implementation: in find, while traversing to root, set parent[x] = parent[parent[x]] (skip one level) and advance x. This halves the path length on each traversal without recursion.

Question 2

How do you detect cycles in an undirected graph using Union-Find?

Accepted Answer

Algorithm: initialize a Union-Find with n nodes. For each edge (u, v): check if find(u) == find(v). If yes: u and v are already in the same component -- adding this edge creates a cycle. Return True (cycle detected). If no: union(u, v) and continue. If all edges are processed without finding a cycle: return False. Why it works: in an undirected graph, a cycle exists if and only if an edge connects two nodes already in the same connected component. Adding a new edge between different components just connects them (no cycle). Adding an edge within a component closes a cycle. This is O(E * u03b1(n)) -- nearly O(E). Alternative: DFS with parent tracking (O(V + E)). Union-Find is preferred when you process edges one-by-one (online algorithm) or when you need to detect the specific edge that creates the cycle (the first edge where find(u) == find(v) is the redundant edge -- LC 684).

Question 3

How does Union-Find solve the number of islands II problem (LC 305)?

Accepted Answer

LC 305: given an mu00d7n grid (initially water), add land cells one by one. After each addition, return the number of islands. Naive approach: run BFS/DFS after each addition -- O(k * mn) total for k additions. Union-Find approach: maintain a DSU where each cell index = row * n + col. For each land addition at (r, c): initialize the cell as its own component (components += 1). For each of its 4 neighbors that is also land: if find(cell) != find(neighbor): union them (components -= 1). Append components count to result. O(k * u03b1(mn)) total -- nearly O(k). The key difference from static islands: with each land cell added, you potentially merge up to 4 existing components into 1. The Union-Find handles this incrementally without re-scanning the grid. This pattern (online connectivity updates) is where Union-Find outperforms BFS/DFS: DSU handles dynamic edge/node additions efficiently; BFS/DFS is better for static graphs queried once.

Question 4

How do you use Union-Find to solve the accounts merge problem (LC 721)?

Accepted Answer

LC 721: given a list of accounts (each account is a list of strings where the first is a name and the rest are emails), merge accounts that share at least one email. Algorithm: (1) Map each email to an account index (email_to_id dictionary). (2) For each account: for each email in the account: if the email was seen before (in email_to_id), union the current account with the account that owns that email. Otherwise: add email u2192 current account to the map. (3) Group emails by their root account (find()). (4) For each group: sort the emails, prepend the account name, add to result. The Union-Find handles the transitive closure: if account 1 and account 2 share email A, and account 2 and account 3 share email B, then all three accounts should be merged. find() on any email gives the canonical root account. This is O(n * m * u03b1(n)) where m = average emails per account -- nearly linear. DFS/BFS alternative: build a graph of accounts connected by shared emails, find connected components. Both work; Union-Find is more elegant.

Question 5

What is the time complexity of Union-Find with path compression and union by rank?

Accepted Answer

With both optimizations together: O(m * u03b1(n)) for m operations on n elements, where u03b1 is the inverse Ackermann function. For all practical values of n (even n = 2^65536), u03b1(n) u2264 4. So the amortized cost per operation is effectively O(1). This is nearly optimal -- any data structure supporting union and find must take u03a9(u03b1(n)) amortized per operation. Path compression alone (without rank): O(m * log n) amortized -- still very fast but slightly worse. Union by rank alone (without path compression): O(m * log n) in the worst case -- same asymptotic as with path compression alone. Neither optimization alone achieves the near-O(1) bound; you need both. In practice: Union-Find is used where BFS/DFS would be O(n^2) due to repeated re-traversal. The near-O(1) amortized cost means Union-Find is competitive with hash sets for connectivity queries and far faster than BFS for dynamic connectivity problems.

Union-Find (Disjoint Set Union) Interview Patterns: Path Compression, Connectivity, and Advanced Problems (2025)

Union-Find Core Concept

Number of Connected Components (LC 323, 547)

Number of Islands with Union-Find (LC 200)

Kruskal’s MST with Union-Find

Weighted Union-Find and Advanced Patterns