Union-Find (Disjoint Set Union) Interview Patterns: Connected Components and Cycle Detection (2025)

Union-Find Data Structure

Union-Find (also called Disjoint Set Union, DSU) tracks which elements belong to the same connected component. Supports two operations in nearly O(1) amortized time: find(x): returns the representative (root) of x’s component. Two elements are in the same component if find(x) == find(y). union(x, y): merges the components containing x and y. Applications: detecting cycles in an undirected graph, finding connected components, Kruskal’s MST algorithm, network connectivity, grouping problems (accounts merge, friend circles).

Implementation with Path Compression and Union by Rank

class UnionFind:
    def __init__(self, n: int):
        self.parent = list(range(n))  # parent[i] = i (self-loop for roots)
        self.rank = [0] * n           # tree height upper bound
        self.components = n           # count of distinct components

    def find(self, x: int) -> int:
        # Path compression: flatten the tree
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # recursive compression
        return self.parent[x]

    def union(self, x: int, y: int) -> bool:
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False  # already in same component (cycle detected!)
        # Union by rank: attach smaller tree under larger
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1
        self.components -= 1
        return True  # successfully merged

Complexity: with both optimizations, find and union are O(alpha(n)) amortized — alpha is the inverse Ackermann function, effectively O(1) for all practical input sizes (alpha(n) <= 4 for n < 10^600).

Number of Connected Components (LC 323) and Friend Circles (LC 547)

def count_components(n: int, edges: list[list[int]]) -> int:
    uf = UnionFind(n)
    for u, v in edges:
        uf.union(u, v)
    return uf.components  # starts at n, decremented on each successful union

def find_circle_num(is_connected: list[list[int]]) -> int:
    n = len(is_connected)
    uf = UnionFind(n)
    for i in range(n):
        for j in range(i + 1, n):
            if is_connected[i][j]:
                uf.union(i, j)
    return uf.components

Detecting Cycles in an Undirected Graph

def has_cycle(n: int, edges: list[list[int]]) -> bool:
    uf = UnionFind(n)
    for u, v in edges:
        if not uf.union(u, v):
            return True  # u and v already in same component = cycle
    return False

Accounts Merge (LC 721)

Group accounts that share at least one email. Two accounts that share an email belong to the same person. Map each email to an account index. Union accounts that share an email.

def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
    uf = UnionFind(len(accounts))
    email_to_account = {}  # email -> first account index that owns it

    for i, account in enumerate(accounts):
        for email in account[1:]:
            if email in email_to_account:
                uf.union(i, email_to_account[email])
            else:
                email_to_account[email] = i

    # Group emails by root account
    from collections import defaultdict
    root_emails = defaultdict(set)
    for email, idx in email_to_account.items():
        root_emails[uf.find(idx)].add(email)

    result = []
    for root, emails in root_emails.items():
        result.append([accounts[root][0]] + sorted(emails))
    return result

Kruskal’s MST with Union-Find

def kruskal_mst(n: int, edges: list[tuple]) -> int:
    # edges: list of (weight, u, v)
    edges.sort()  # sort by weight
    uf = UnionFind(n)
    mst_cost = 0
    edges_used = 0
    for weight, u, v in edges:
        if uf.union(u, v):  # if not already connected
            mst_cost += weight
            edges_used += 1
            if edges_used == n - 1:
                break  # MST complete (n-1 edges)
    return mst_cost if edges_used == n - 1 else -1  # -1 if not connected


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does path compression make Union-Find nearly O(1)?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Without path compression, find(x) traverses up the parent chain to the root, which can be O(n) in the worst case (a skewed tree). Path compression flattens the tree: after finding the root r, set parent[x] = r for every node on the path from x to r. Future find() calls for those nodes go directly to the root in O(1). With union by rank (attach shorter tree under taller), the tree height is bounded by O(log n) even before path compression. Combined: the amortized cost per operation is O(alpha(n)), where alpha is the inverse Ackermann function. For all practical purposes (n < 10^600), alpha(n) <= 4, making Union-Find effectively O(1) per operation amortized.”}},{“@type”:”Question”,”name”:”How does Union-Find detect cycles in an undirected graph?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Process each edge (u, v) of the graph: find the roots of u and v. If find(u) == find(v): u and v are already in the same connected component, so adding this edge creates a cycle. If find(u) != find(v): union them (merge the components). The algorithm returns True (cycle found) at the first edge that connects two already-connected nodes. This works because in an undirected graph, a cycle exists if and only if we encounter an edge between two nodes in the same component while building a spanning forest. Note: this only works for undirected graphs. For directed graphs, use DFS with 3-state coloring (unvisited, in-progress, done).”}},{“@type”:”Question”,”name”:”What is Kruskal's algorithm and why does it produce a minimum spanning tree?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Kruskal's algorithm: sort all edges by weight. Process edges in ascending weight order. For each edge (u, v, w): if u and v are in different components (union returns True), add this edge to the MST and merge the components. Stop when n-1 edges are added (MST is complete). Correctness proof (cut property): for any partition of the graph into two sets S and V-S, the minimum weight edge crossing the cut is in every MST. Kruskal's processes edges in weight order; the first edge connecting two components is always the minimum edge crossing the cut between them. By the cut property, it belongs in the MST. Union-Find makes the "same component" check and merge O(alpha(n)), giving overall O(E log E) time (dominated by the sort).”}},{“@type”:”Question”,”name”:”How do you handle the Accounts Merge problem and what is the mapping strategy?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Accounts Merge: each account has a name and a list of emails. Two accounts belong to the same person if they share at least one email. Strategy: create a Union-Find with n nodes (one per account). For each email, track which account first claimed it (email_to_account map). When processing account i's emails: if the email is in email_to_account, union account i with email_to_account[email] (they are the same person). Otherwise, record email_to_account[email] = i. After processing all accounts: group emails by their root account (find() the root for each email's account). For each root, collect all emails in that group, sort them, and prepend the account name. The sort is required to produce canonical output — O(E log E) where E is the total number of emails.”}},{“@type”:”Question”,”name”:”What is the difference between union by rank and union by size, and which is preferred?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Union by rank: attach the tree with smaller rank (upper bound on height) under the tree with larger rank. If ranks are equal, arbitrarily choose one as root and increment its rank. Union by size: attach the tree with fewer nodes under the tree with more nodes. Update parent's size = size_a + size_b. Both achieve O(log n) tree height without path compression, and O(alpha(n)) amortized with path compression. Union by size is slightly easier to reason about (size is intuitive; rank is an approximation that does not change with path compression). Union by rank uses slightly less memory (no need to track sizes separately if you do not need sizes for other purposes). In practice, both work equally well for interviews — pick one and be consistent.”}}]}

See also: Meta Interview Prep

See also: Atlassian Interview Prep

See also: Databricks Interview Prep

Scroll to Top