Union Find

Abstract

Data Structure that lets you manage a collection of Disjoin Set also known as Dynamic Relation efficiently
Used in graph algorithms like Kruskal’s Algorithm for Minimum Spanning Tree, cycle detection, and connectivity problems

Dynamic Relation

A Dynamic Relation is a Relation that has a different set of elements inside the relation depending on the given input. In Union-Find, the operation union(x, y) gives the dynamic property, combining 2 relations into one relation.

All relations in Union-Find are Equivalence Relation!

2 main operations

find(x)

Determines the representative element of the disjoint set that contains element $x$

The representative acts as the unique identifier for the whole disjoint set

union(x, y)

Combines the two disjoint sets containing elements $x$ and $y$ respectively into a single disjoint set

What does it mean 'efficiently'?

find(x)

Time complexity is $O (1)$ using Path Compression

union(x, y)

Time complexity is $O (1)$

We use find(x) and find(y) to obtain the representative element of both disjoint sets in $O (1)$

Then attach the representative element of one disjoint set to another elements on the path in $O (1)$

Overall complexity is $O (1)$

Solve Grouping Problems

Union-Find can tell us if two elements are connected by a certain relationship in $O (1)$

Key Optimisations

Path Compression

When finding an element’s root, update all nodes along the path to point directly to the root
Flattens the tree structure for future operations
Reduces time complexity to nearly constant time
Mechanism: During find(x), directly link each node on the path to the root node
This flattens the trees, making subsequent find(x) operations much faster

Weighted Union

When merging sets, attach the smaller tree to the larger tree’s root
Maintains balance and minimizes tree height
Prevents degenerate tree structures

Rank vs Height

The rank in Union-Find is an upper bound on the height, not the exact height. This distinction is crucial:

Rank vs Height:

Height is the actual length of the longest path from root to leaf

Rank is a property that’s only incremented when merging trees of equal rank

Due to path compression, the actual height can be much smaller than the rank

Why use rank instead of height?:

Tracking exact height would be expensive to maintain after path compression

Rank is easier to maintain and still provides the balancing we need

The rank property is sufficient to guarantee the O(α(n)) time complexity

Rank Updates During Union:

When attaching smaller rank tree to larger rank tree: rank stays the same

When attaching trees of equal rank: rank increases by 1

Actual height might increase temporarily, but path compression will flatten it in future operations

Time Complexity

With both optimizations: O(α(n)) where α is the inverse Ackermann function

Without optimizations: O(n) for worst-case operations

α(n) grows extremely slowly, making operations effectively constant time

Core Operations

Find Operation

Returns the root/representative of the set containing element x
Implements path compression to optimize future operations
Key insight: All nodes along the path to root are updated to point directly to root

Union Operation

Merges two sets by connecting their roots
Uses weighted union to maintain balanced trees
Key insight: Smaller tree is always attached to larger tree’s root

Python Implementation

Union-Find Class with Optimizations

class UnionFind:
    def __init__(self, size):
        self.parent = list(range(size))  # Each element starts as its own parent
        self.rank = [0] * size  # Used for weighted union
    
    def find(self, x):
        # Path compression: make each node point directly to root
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Recursive path compression
        return self.parent[x]
    
    def union(self, x, y):
        # Find roots of x and y
        root_x = self.find(x)
        root_y = self.find(y)
        
        # If already in same set, do nothing
        if root_x == root_y:
            return
            
        # Weighted union: attach smaller rank tree to larger rank tree
        if self.rank[root_x] < self.rank[root_y]:
            self.parent[root_x] = root_y
        elif self.rank[root_x] > self.rank[root_y]:
            self.parent[root_y] = root_x
        else:
            # If ranks are equal, choose one as root and increment its rank
            self.parent[root_y] = root_x
            self.rank[root_x] += 1

Example Usage

# Create Union-Find with 5 elements
uf = UnionFind(5)
 
# Initially, each element is in its own set
print(uf.find(0))  # 0
print(uf.find(1))  # 1
 
# Union some elements
uf.union(0, 1)
uf.union(2, 3)
 
# Check if elements are in same set
print(uf.find(0) == uf.find(1))  # True
print(uf.find(2) == uf.find(3))  # True
print(uf.find(0) == uf.find(2))  # False

Applications

Graph Algorithms
- Kruskal’s Algorithm for Minimum Spanning Tree
- Cycle detection in undirected graphs
- Connected components in graphs
Dynamic Connectivity
- Network connectivity problems
- Image processing (pixel connectivity)
- Social network analysis
Percolation Problems
- Modeling physical systems
- Grid-based simulations
- Phase transitions

Common Mistakes

Common Pitfalls

Forgetting Path Compression

Without path compression, operations can degrade to O(n)

Always implement recursive path compression in find operation

Ignoring Weighted Union

Can lead to unbalanced trees

Degrades performance to O(n) in worst case

Incorrect Initialization

Each element must start as its own parent

Rank array must be initialized to zeros

CS Notes

Recent writing

VLAN

End-to-End Service Access Troubleshooting

File System Link

Explorer

Union Find

Abstract

Key Optimisations

Path Compression

Weighted Union

Core Operations

Find Operation

Union Operation

Python Implementation

Applications

Common Mistakes

References

Table of Contents

Mentioned by

Graph View