Union Find (DSU)
What Is Union-Find (DSU)?
Union-Find (also called Disjoint Set Union or DSU) is a data structure that efficiently tracks which elements belong to the same group (component), and allows merging groups together.
OPERATIONS:
find(x) → Returns the REPRESENTATIVE (root) of x's component.
All elements in the same component have the same root.
union(x,y) → Merges the components containing x and y.
connected(x,y) → Returns true if find(x) == find(y).
EXAMPLE: 6 elements, initially all separate:
Elements: {0, 1, 2, 3, 4, 5}
Components: {0}, {1}, {2}, {3}, {4}, {5}
union(0,1): {0,1}, {2}, {3}, {4}, {5}
union(2,3): {0,1}, {2,3}, {4}, {5}
union(0,2): {0,1,2,3}, {4}, {5}
find(1) = find(3)? YES — both in {0,1,2,3}
find(1) = find(4)? NO — 4 is in {4}
USE CASES:
✓ Connected components in a graph
✓ Cycle detection (Kruskal's MST)
✓ Dynamic connectivity queries
✓ Percolation problems
✓ Number of islands (dynamic)
✓ Redundant connections
The Internal Representation
PARENT ARRAY:
parent[i] = i → i is a root (its own representative)
parent[i] = j (j ≠ i) → i's parent is j (follow to find root)
Initially: parent = [0, 1, 2, 3, 4, 5]
(every element is its own root)
After union(0,1): parent = [0, 0, 2, 3, 4, 5]
↑
1's parent is now 0 (0 is the root)
After union(2,3): parent = [0, 0, 2, 2, 4, 5]
After union(0,2): parent = [0, 0, 0, 2, 4, 5]
↑
2's parent is 0 (component root)
find(3): parent[3]=2, parent[2]=0, parent[0]=0 → root = 0 ✓
find(1): parent[1]=0, parent[0]=0 → root = 0 ✓
So find(3) == find(1) == 0 → same component ✓
Building the DSU Step by Step
Step 1: Naive Implementation (no optimisations)
find(x): follow parent chain to root → O(n) worst case union(x,y): attach root of one to root of other → O(n) worst case Worst case: union(0,1), union(1,2), union(2,3), ... Creates a chain: 0→1→2→3→4...n find(n) must traverse entire chain = O(n).
Step 2: Path Compression
During find(x), make every visited node point directly to root. Before path compression: 3 → 2 → 0 → 0 (root) After find(3) with path compression: 3 → 0 (root) 2 → 0 (root) (both updated in one traversal) Result: future find(3) and find(2) are O(1).
Step 3: Union by Rank (or Size)
Attach SMALLER tree under LARGER tree root. Without union by rank: union(0,1): root=0, attach 1 under 0. rank[0]=1. union(0,2): root=0, attach 2 under 0. rank[0]=1. union(0,3): root=0, attach 3 under 0. rank[0]=1. Tree stays flat — height 1. union(a,b) where both have rank 1: Merge: height becomes 2. rank[root]=2. The merged rank increases ONLY when two equal-rank trees merge. Height is always O(log n). Combined — Path Compression + Union by Rank: Amortised O(α(n)) per operation ≈ O(1).
Complete Implementation
1public class UnionFind {
2
3 private int[] parent;
4 private int[] rank; // Upper bound on tree height
5 private int[] size; // Actual component size
6 private int components; // Number of distinct components
7
8 public UnionFind(int n) {
9 parent = new int[n];
10 rank = new int[n];
11 size = new int[n];
12 components = n;
13
14 for (int i = 0; i < n; i++) {
15 parent[i] = i; // Each element is its own root
16 size[i] = 1; // Each component starts with size 1
17 rank[i] = 0;
18 }
19 }
20
21 // FIND with path compression — O(α(n)) amortised
22 public int find(int x) {
23 if (parent[x] != x) {
24 parent[x] = find(parent[x]); // Recursive path compression
25 }
26 return parent[x];
27 }
28
29 // Iterative path compression (avoids stack overflow for large n)
30 public int findIterative(int x) {
31 int root = x;
32 while (parent[root] != root) root = parent[root]; // Find root
33
34 // Path compression — point all visited nodes to root
35 while (parent[x] != root) {
36 int next = parent[x];
37 parent[x] = root;
38 x = next;
39 }
40 return root;
41 }
42
43 // UNION by rank — O(α(n)) amortised
44 public boolean union(int x, int y) {
45 int px = find(x), py = find(y);
46 if (px == py) return false; // Already same component
47
48 // Attach smaller rank under larger rank
49 if (rank[px] < rank[py]) { int t = px; px = py; py = t; }
50 parent[py] = px;
51 size[px] += size[py]; // Update component size
52
53 if (rank[px] == rank[py]) rank[px]++; // Rank increases only on tie
54
55 components--;
56 return true; // Successfully merged two different components
57 }
58
59 // UNION by size (alternative to rank)
60 public boolean unionBySize(int x, int y) {
61 int px = find(x), py = find(y);
62 if (px == py) return false;
63
64 // Attach smaller size under larger size
65 if (size[px] < size[py]) { int t = px; px = py; py = t; }
66 parent[py] = px;
67 size[px] += size[py];
68 components--;
69 return true;
70 }
71
72 // Check if two elements are in the same component
73 public boolean connected(int x, int y) {
74 return find(x) == find(y);
75 }
76
77 // Get size of x's component
78 public int getSize(int x) { return size[find(x)]; }
79
80 // Get number of distinct components
81 public int countComponents() { return components; }
82
83 public static void main(String[] args) {
84 UnionFind uf = new UnionFind(6);
85
86 System.out.println("Components: " + uf.countComponents()); // 6
87
88 uf.union(0, 1);
89 uf.union(2, 3);
90 System.out.println("Components: " + uf.countComponents()); // 4
91
92 uf.union(0, 2);
93 System.out.println("Components: " + uf.countComponents()); // 3
94
95 System.out.println("Connected(1,3): " + uf.connected(1, 3)); // true
96 System.out.println("Connected(1,4): " + uf.connected(1, 4)); // false
97
98 System.out.println("Size of 1's component: " + uf.getSize(1)); // 4
99
100 uf.union(4, 5);
101 System.out.println("Components: " + uf.countComponents()); // 2
102 }
103}Output:
Components: 6
Components: 4
Components: 3
Connected(1,3): true
Connected(1,4): false
Size(1): 4
Components: 2
Path Compression Visualised
BEFORE PATH COMPRESSION:
find(6):
6 → 5 → 3 → 1 → 0 (root)
Tree structure:
0
│
1
│
3
│
5
│
6
AFTER find(6) with path compression:
All nodes on path now point directly to root (0):
0
/ │ \ \
1 3 5 6
(1,3,5,6 all point to 0)
Future find(6): 6→0 directly — O(1)
Future find(5): 5→0 directly — O(1)
Future find(3): 3→0 directly — O(1)
HALVING COMPRESSION (alternative — faster in practice):
parent[x] = parent[parent[x]] (skip one level each call)
Less aggressive, also achieves O(α(n)) amortised.
Complexity Table: Four Variants
VARIANT find() union() Space ────────────────────────────────────────────────────────────────── Naive (no optimisations) O(n) O(n) O(n) Path compression only O(log n) amort O(log n) O(n) Union by rank only O(log n) O(log n) O(n) Path compression + union by rank O(α(n)) O(α(n)) O(n) α(n) = inverse Ackermann ≈ O(1) for all practical inputs. Combined: effectively O(1) per operation. Total for m operations on n elements: O(m × α(n)) ≈ O(m).
Application 1: Cycle Detection in Undirected Graph
ALGORITHM:
For each edge (u, v):
If find(u) == find(v): CYCLE (both already connected)
Else: union(u, v) — merge components
EXAMPLE:
V=3, Edges: (0,1), (1,2), (0,2)
Process (0,1): find(0)=0, find(1)=1 → different → union(0,1)
parent=[0,0,2] (1's parent = 0)
Process (1,2): find(1)=0, find(2)=2 → different → union(0,2)
parent=[0,0,0] (all in same component)
Process (0,2): find(0)=0, find(2)=0 → SAME → CYCLE DETECTED! ✓
1import java.util.*;
2
3public class CycleDetectionDSU {
4
5 static int[] parent, rank;
6
7 static void init(int n) {
8 parent = new int[n]; rank = new int[n];
9 for (int i = 0; i < n; i++) parent[i] = i;
10 }
11
12 static int find(int x) {
13 if (parent[x] != x) parent[x] = find(parent[x]);
14 return parent[x];
15 }
16
17 static boolean union(int x, int y) {
18 int px = find(x), py = find(y);
19 if (px == py) return false; // Cycle!
20
21 if (rank[px] < rank[py]) { int t=px; px=py; py=t; }
22 parent[py] = px;
23 if (rank[px]==rank[py]) rank[px]++;
24 return true;
25 }
26
27 public static boolean hasCycle(int V, int[][] edges) {
28 init(V);
29 for (int[] edge : edges) {
30 if (!union(edge[0], edge[1])) return true; // union returns false = cycle
31 }
32 return false;
33 }
34
35 public static void main(String[] args) {
36 // Cyclic: 0─1─2─0 (triangle)
37 int[][] cyclic = {{0,1},{1,2},{0,2}};
38 System.out.println("Has cycle: " + hasCycle(3, cyclic)); // true
39
40 // Acyclic: 0─1─2 (path)
41 int[][] acyclic = {{0,1},{1,2}};
42 System.out.println("Has cycle: " + hasCycle(3, acyclic)); // false
43
44 // 4 vertices, 4 edges with cycle
45 int[][] g2 = {{0,1},{1,2},{2,3},{3,1}};
46 System.out.println("Has cycle: " + hasCycle(4, g2)); // true
47 }
48}Has cycle: true Has cycle: false Has cycle: true
Application 2: Number of Islands (Dynamic)
Count connected land cells, handling cells revealed one by one.
PROBLEM: given a grid, cells added one at a time. After each add,
report the number of connected islands.
APPROACH:
Map each cell (r,c) to index = r*cols + c.
When cell (r,c) is added:
components++ (new island)
For each adjacent cell already added:
If not connected: union(cell, adjacent) → components--
GRID EXAMPLE (4 adds):
┌───┬───┬───┐ After add (0,0): 1 island
│ 0 │ │ │ After add (0,2): 2 islands
├───┼───┼───┤ After add (1,1): 3 islands (diagonal, not connected)
│ │ │ │ After add (0,1): 2 islands (joins (0,0),(0,1),(0,2))
└───┴───┴───┘
1import java.util.*;
2
3public class NumberOfIslands {
4
5 int[] parent, size;
6 int islands;
7
8 public NumberOfIslands(int rows, int cols) {
9 int n = rows * cols;
10 parent = new int[n];
11 size = new int[n];
12 Arrays.fill(parent, -1); // -1 = water (not added)
13 islands = 0;
14 }
15
16 int find(int x) {
17 if (parent[x] != x) parent[x] = find(parent[x]);
18 return parent[x];
19 }
20
21 void addLand(int r, int c, int cols, int[][] dirs) {
22 int id = r * cols + c;
23 if (parent[id] != -1) return; // Already land
24
25 parent[id] = id;
26 size[id] = 1;
27 islands++; // New isolated island
28
29 for (int[] d : dirs) {
30 int nr = r + d[0], nc = c + d[1];
31 if (nr < 0 || nc < 0 || nc >= cols) continue;
32 int nid = nr * cols + nc;
33 if (parent[nid] == -1) continue; // Neighbour is water
34
35 // Neighbour is land — merge if different components
36 int pr = find(id), pn = find(nid);
37 if (pr != pn) {
38 if (size[pr] < size[pn]) { int t=pr; pr=pn; pn=t; }
39 parent[pn] = pr;
40 size[pr] += size[pn];
41 islands--;
42 }
43 }
44 }
45
46 public static List<Integer> numIslands2(int rows, int cols, int[][] positions) {
47 NumberOfIslands noi = new NumberOfIslands(rows, cols);
48 int[][] dirs = {{0,1},{0,-1},{1,0},{-1,0}};
49 List<Integer> result = new ArrayList<>();
50
51 for (int[] pos : positions) {
52 noi.addLand(pos[0], pos[1], cols, dirs);
53 result.add(noi.islands);
54 }
55 return result;
56 }
57
58 public static void main(String[] args) {
59 int[][] positions = {{0,0},{0,2},{1,1},{0,1}};
60 System.out.println(numIslands2(2, 3, positions));
61 // [1, 2, 3, 2]
62 }
63}Output:
[1, 2, 3, 2]
Explanation:
After (0,0): one island {(0,0)} → 1
After (0,2): two islands {(0,0)}, {(0,2)} → 2
After (1,1): three islands (1,1 not adjacent to either) → 3
After (0,1): (0,1) connects (0,0) and (0,2) → merges to 1 island → 2
DSU with Rollback (Persistent DSU)
For problems that need UNDO of union operations (e.g., offline dynamic connectivity with deletions).
KEY CHANGE: Use union by RANK ONLY (no path compression).
Path compression modifies multiple parents — hard to undo.
Union by rank modifies exactly TWO parent pointers per union.
→ Store (old_parent, old_rank) on a stack for undo.
UNDO STACK: [(node, old_parent, old_rank), ...]
union(x, y):
px, py = find(x), find(y)
if px == py: push_noop to stack
else:
if rank[px] < rank[py]: swap
stack.push((py, parent[py], rank[px])) ← record BEFORE change
parent[py] = px
if rank[px] == rank[py]: rank[px]++
rollback():
py, old_parent, old_rank = stack.pop()
parent[py] = old_parent
rank[find(old_parent)] = old_rank ← restore root's rank
find() WITHOUT path compression: O(log n) per call.
But rollback is O(1) — just pop stack and restore.
Complexity Summary
| Operation | Naive | Path Compress | Union by Rank | Both |
|---|---|---|---|---|
| find() | O(n) | O(log n) amort | O(log n) | O(α(n)) |
| union() | O(n) | O(log n) amort | O(log n) | O(α(n)) |
| Total m ops | O(mn) | O(m log n) | O(m log n) | O(mα(n)) |
α(n) ≤ 4 for n ≤ 2^(2^(2^(2^2))) — effectively O(1).
Common Mistakes
Not initialising parent[i] = i. If parent[] is initialised to 0 or -1, find(0) returns 0 (correct) but find(1) follows parent[1]=0 → returns 0, making 1 appear to already be in 0's component. Always initialise parent[i] = i for each i.
Calling union(x, y) when find(x) == find(y) and not returning false. When both elements are already in the same component, union should do nothing and return false. Without this check, you may corrupt rank[] by incrementing the root's rank unnecessarily.
Using path compression with rollback DSU. Path compression modifies multiple parent pointers during find(). Undoing this requires restoring every modified pointer, making rollback O(path length). Use union by rank only when rollback is required.
Incorrect path compression — updating parent BEFORE finding root. In iterative path compression, find the root FIRST in a separate loop, then update all nodes to point to the root. Mixing the two loops corrupts the structure.
Forgetting to update component count in union(). If you track components-- in union(), ensure it only decrements when a REAL merge happens (find(x) ≠ find(y)). The early return if px == py: return false must come before the decrement.
Interview Questions
Q: Why does path compression alone give O(log n) amortised rather than O(α(n))?
Path compression makes find() fast for subsequent calls by flattening the tree. However, without union by rank, the tree can grow to height O(log n) but not beyond (trees without rank control can still be unbalanced). Union by rank ensures the tree height is always O(log n) after any sequence of unions. Combined, path compression flattens the tree and union by rank keeps it from growing deep in the first place — together achieving O(α(n)).
Q: What is the difference between union by rank and union by size?
Union by rank tracks an upper bound on tree height. Union by size tracks the actual number of elements in the component. Both guarantee O(log n) tree height. The difference: size gives you accurate component counts (useful for queries like "how many elements in this component?"), while rank only tells you which root to attach under. For pure connectivity, both work equally well. For problems requiring size queries, use size.
Q: How would you use DSU to find the number of connected components in a graph?
Initialise DSU with n elements (n = number of vertices). For each edge (u,v), call union(u,v). Each successful union (that returns true) reduces the component count by 1. The component count starts at n (each vertex isolated) and decreases with each successful merge. Alternatively, count the number of unique roots after all unions: the number of distinct find(i) values for i in 0..n-1.
Summary
Union-Find (DSU) tracks components and answers connectivity queries in near-constant time.
Two core operations:
- ›
find(x)— returns the root (representative) of x's component - ›
union(x,y)— merges x's component with y's component
Two key optimisations (always use both):
- ›Path compression — during find, make every node point directly to root → future finds O(1)
- ›Union by rank/size — attach smaller tree under larger → tree height O(log n)
Combined: O(α(n)) ≈ O(1) per operation amortised.
Three classic applications:
| Application | Mechanism |
|---|---|
| Cycle detection | If find(u) == find(v) before adding edge → cycle |
| Kruskal's MST | Sort edges; add if find(u) ≠ find(v) |
| Number of islands | Add land cell; union with adjacent land cells; track component count |
Rollback DSU: union by rank only (no path compression); store undo stack; O(log n) per find, O(1) per undo.
In the next topic you will explore Strongly Connected Components — Kosaraju's and Tarjan's algorithms for finding maximal strongly connected subgraphs in directed graphs.
Path compression in Union-Find modifies parent[] during a find() call. What does it do and why?