Q: How does Union-Find detect cycles in an undirected graph?

Process each edge (u, v) of the graph: find the roots of u and v. If find(u) == find(v): u and v are already in the same connected component, so adding this edge creates a cycle. If find(u) != find(v): union them (merge the components). The algorithm returns True (cycle found) at the first edge that connects two already-connected nodes. This works because in an undirected graph, a cycle exists if and only if we encounter an edge between two nodes in the same component while building a spanning forest. Note: this only works for undirected graphs. For directed graphs, use DFS with 3-state coloring (unvisited, in-progress, done).

Q: What is Kruskal's algorithm and why does it produce a minimum spanning tree?

Kruskal's algorithm: sort all edges by weight. Process edges in ascending weight order. For each edge (u, v, w): if u and v are in different components (union returns True), add this edge to the MST and merge the components. Stop when n-1 edges are added (MST is complete). Correctness proof (cut property): for any partition of the graph into two sets S and V-S, the minimum weight edge crossing the cut is in every MST. Kruskal's processes edges in weight order; the first edge connecting two components is always the minimum edge crossing the cut between them. By the cut property, it belongs in the MST. Union-Find makes the "same component" check and merge O(alpha(n)), giving overall O(E log E) time (dominated by the sort).

Q: How do you handle the Accounts Merge problem and what is the mapping strategy?

Accounts Merge: each account has a name and a list of emails. Two accounts belong to the same person if they share at least one email. Strategy: create a Union-Find with n nodes (one per account). For each email, track which account first claimed it (email_to_account map). When processing account i's emails: if the email is in email_to_account, union account i with email_to_account[email] (they are the same person). Otherwise, record email_to_account[email] = i. After processing all accounts: group emails by their root account (find() the root for each email's account). For each root, collect all emails in that group, sort them, and prepend the account name. The sort is required to produce canonical output — O(E log E) where E is the total number of emails.

Q: What is the difference between union by rank and union by size, and which is preferred?

Union by rank: attach the tree with smaller rank (upper bound on height) under the tree with larger rank. If ranks are equal, arbitrarily choose one as root and increment its rank. Union by size: attach the tree with fewer nodes under the tree with more nodes. Update parent's size = size_a + size_b. Both achieve O(log n) tree height without path compression, and O(alpha(n)) amortized with path compression. Union by size is slightly easier to reason about (size is intuitive; rank is an approximation that does not change with path compression). Union by rank uses slightly less memory (no need to track sizes separately if you do not need sizes for other purposes). In practice, both work equally well for interviews — pick one and be consistent.

Question 1

How does path compression make Union-Find nearly O(1)?

Accepted Answer

Without path compression, find(x) traverses up the parent chain to the root, which can be O(n) in the worst case (a skewed tree). Path compression flattens the tree: after finding the root r, set parent[x] = r for every node on the path from x to r. Future find() calls for those nodes go directly to the root in O(1). With union by rank (attach shorter tree under taller), the tree height is bounded by O(log n) even before path compression. Combined: the amortized cost per operation is O(alpha(n)), where alpha is the inverse Ackermann function. For all practical purposes (n < 10^600), alpha(n) <= 4, making Union-Find effectively O(1) per operation amortized.

Question 2

How does Union-Find detect cycles in an undirected graph?

Accepted Answer

Process each edge (u, v) of the graph: find the roots of u and v. If find(u) == find(v): u and v are already in the same connected component, so adding this edge creates a cycle. If find(u) != find(v): union them (merge the components). The algorithm returns True (cycle found) at the first edge that connects two already-connected nodes. This works because in an undirected graph, a cycle exists if and only if we encounter an edge between two nodes in the same component while building a spanning forest. Note: this only works for undirected graphs. For directed graphs, use DFS with 3-state coloring (unvisited, in-progress, done).

Question 3

What is Kruskal's algorithm and why does it produce a minimum spanning tree?

Accepted Answer

Kruskal's algorithm: sort all edges by weight. Process edges in ascending weight order. For each edge (u, v, w): if u and v are in different components (union returns True), add this edge to the MST and merge the components. Stop when n-1 edges are added (MST is complete). Correctness proof (cut property): for any partition of the graph into two sets S and V-S, the minimum weight edge crossing the cut is in every MST. Kruskal's processes edges in weight order; the first edge connecting two components is always the minimum edge crossing the cut between them. By the cut property, it belongs in the MST. Union-Find makes the "same component" check and merge O(alpha(n)), giving overall O(E log E) time (dominated by the sort).

Question 4

How do you handle the Accounts Merge problem and what is the mapping strategy?

Accepted Answer

Accounts Merge: each account has a name and a list of emails. Two accounts belong to the same person if they share at least one email. Strategy: create a Union-Find with n nodes (one per account). For each email, track which account first claimed it (email_to_account map). When processing account i's emails: if the email is in email_to_account, union account i with email_to_account[email] (they are the same person). Otherwise, record email_to_account[email] = i. After processing all accounts: group emails by their root account (find() the root for each email's account). For each root, collect all emails in that group, sort them, and prepend the account name. The sort is required to produce canonical output — O(E log E) where E is the total number of emails.

Question 5

What is the difference between union by rank and union by size, and which is preferred?

Accepted Answer

Union by rank: attach the tree with smaller rank (upper bound on height) under the tree with larger rank. If ranks are equal, arbitrarily choose one as root and increment its rank. Union by size: attach the tree with fewer nodes under the tree with more nodes. Update parent's size = size_a + size_b. Both achieve O(log n) tree height without path compression, and O(alpha(n)) amortized with path compression. Union by size is slightly easier to reason about (size is intuitive; rank is an approximation that does not change with path compression). Union by rank uses slightly less memory (no need to track sizes separately if you do not need sizes for other purposes). In practice, both work equally well for interviews — pick one and be consistent.

Union-Find (Disjoint Set Union) Interview Patterns: Connected Components and Cycle Detection (2025)

Union-Find Data Structure

Implementation with Path Compression and Union by Rank

Number of Connected Components (LC 323) and Friend Circles (LC 547)

Detecting Cycles in an Undirected Graph

Accounts Merge (LC 721)

Kruskal’s MST with Union-Find