DSA Tutorial
🔍

Union Find (DSU)

What Is Union-Find (DSU)?

Union-Find (also called Disjoint Set Union or DSU) is a data structure that efficiently tracks which elements belong to the same group (component), and allows merging groups together.

OPERATIONS:
  find(x)    → Returns the REPRESENTATIVE (root) of x's component.
               All elements in the same component have the same root.
  union(x,y) → Merges the components containing x and y.
  connected(x,y) → Returns true if find(x) == find(y).

EXAMPLE: 6 elements, initially all separate:
  Elements: {0, 1, 2, 3, 4, 5}
  Components: {0}, {1}, {2}, {3}, {4}, {5}

  union(0,1): {0,1}, {2}, {3}, {4}, {5}
  union(2,3): {0,1}, {2,3}, {4}, {5}
  union(0,2): {0,1,2,3}, {4}, {5}

  find(1) = find(3)?   YES — both in {0,1,2,3}
  find(1) = find(4)?   NO  — 4 is in {4}

USE CASES:
  ✓ Connected components in a graph
  ✓ Cycle detection (Kruskal's MST)
  ✓ Dynamic connectivity queries
  ✓ Percolation problems
  ✓ Number of islands (dynamic)
  ✓ Redundant connections

The Internal Representation

PARENT ARRAY:
  parent[i] = i          → i is a root (its own representative)
  parent[i] = j (j ≠ i) → i's parent is j (follow to find root)

Initially:  parent = [0, 1, 2, 3, 4, 5]
             (every element is its own root)

After union(0,1): parent = [0, 0, 2, 3, 4, 5]
                              ↑
                           1's parent is now 0 (0 is the root)

After union(2,3): parent = [0, 0, 2, 2, 4, 5]
After union(0,2): parent = [0, 0, 0, 2, 4, 5]
                                     ↑
                                  2's parent is 0 (component root)

find(3): parent[3]=2, parent[2]=0, parent[0]=0 → root = 0 ✓
find(1): parent[1]=0, parent[0]=0 → root = 0 ✓

So find(3) == find(1) == 0 → same component ✓

Building the DSU Step by Step

Step 1: Naive Implementation (no optimisations)

find(x): follow parent chain to root → O(n) worst case
union(x,y): attach root of one to root of other → O(n) worst case

Worst case: union(0,1), union(1,2), union(2,3), ...
Creates a chain: 0→1→2→3→4...n
find(n) must traverse entire chain = O(n).

Step 2: Path Compression

During find(x), make every visited node point directly to root.

Before path compression:
  3 → 2 → 0 → 0 (root)

After find(3) with path compression:
  3 → 0 (root)
  2 → 0 (root)
  (both updated in one traversal)

Result: future find(3) and find(2) are O(1).

Step 3: Union by Rank (or Size)

Attach SMALLER tree under LARGER tree root.

Without union by rank:
  union(0,1): root=0, attach 1 under 0. rank[0]=1.
  union(0,2): root=0, attach 2 under 0. rank[0]=1.
  union(0,3): root=0, attach 3 under 0. rank[0]=1.
  Tree stays flat — height 1.

  union(a,b) where both have rank 1:
  Merge: height becomes 2. rank[root]=2.

  The merged rank increases ONLY when two equal-rank trees merge.
  Height is always O(log n).

Combined — Path Compression + Union by Rank:
  Amortised O(α(n)) per operation ≈ O(1).

Complete Implementation

Java
1public class UnionFind { 2 3 private int[] parent; 4 private int[] rank; // Upper bound on tree height 5 private int[] size; // Actual component size 6 private int components; // Number of distinct components 7 8 public UnionFind(int n) { 9 parent = new int[n]; 10 rank = new int[n]; 11 size = new int[n]; 12 components = n; 13 14 for (int i = 0; i < n; i++) { 15 parent[i] = i; // Each element is its own root 16 size[i] = 1; // Each component starts with size 1 17 rank[i] = 0; 18 } 19 } 20 21 // FIND with path compression — O(α(n)) amortised 22 public int find(int x) { 23 if (parent[x] != x) { 24 parent[x] = find(parent[x]); // Recursive path compression 25 } 26 return parent[x]; 27 } 28 29 // Iterative path compression (avoids stack overflow for large n) 30 public int findIterative(int x) { 31 int root = x; 32 while (parent[root] != root) root = parent[root]; // Find root 33 34 // Path compression — point all visited nodes to root 35 while (parent[x] != root) { 36 int next = parent[x]; 37 parent[x] = root; 38 x = next; 39 } 40 return root; 41 } 42 43 // UNION by rank — O(α(n)) amortised 44 public boolean union(int x, int y) { 45 int px = find(x), py = find(y); 46 if (px == py) return false; // Already same component 47 48 // Attach smaller rank under larger rank 49 if (rank[px] < rank[py]) { int t = px; px = py; py = t; } 50 parent[py] = px; 51 size[px] += size[py]; // Update component size 52 53 if (rank[px] == rank[py]) rank[px]++; // Rank increases only on tie 54 55 components--; 56 return true; // Successfully merged two different components 57 } 58 59 // UNION by size (alternative to rank) 60 public boolean unionBySize(int x, int y) { 61 int px = find(x), py = find(y); 62 if (px == py) return false; 63 64 // Attach smaller size under larger size 65 if (size[px] < size[py]) { int t = px; px = py; py = t; } 66 parent[py] = px; 67 size[px] += size[py]; 68 components--; 69 return true; 70 } 71 72 // Check if two elements are in the same component 73 public boolean connected(int x, int y) { 74 return find(x) == find(y); 75 } 76 77 // Get size of x's component 78 public int getSize(int x) { return size[find(x)]; } 79 80 // Get number of distinct components 81 public int countComponents() { return components; } 82 83 public static void main(String[] args) { 84 UnionFind uf = new UnionFind(6); 85 86 System.out.println("Components: " + uf.countComponents()); // 6 87 88 uf.union(0, 1); 89 uf.union(2, 3); 90 System.out.println("Components: " + uf.countComponents()); // 4 91 92 uf.union(0, 2); 93 System.out.println("Components: " + uf.countComponents()); // 3 94 95 System.out.println("Connected(1,3): " + uf.connected(1, 3)); // true 96 System.out.println("Connected(1,4): " + uf.connected(1, 4)); // false 97 98 System.out.println("Size of 1's component: " + uf.getSize(1)); // 4 99 100 uf.union(4, 5); 101 System.out.println("Components: " + uf.countComponents()); // 2 102 } 103}
Output:
Components: 6
Components: 4
Components: 3
Connected(1,3): true
Connected(1,4): false
Size(1): 4
Components: 2

Path Compression Visualised

BEFORE PATH COMPRESSION:

  find(6):
    6 → 5 → 3 → 1 → 0 (root)

  Tree structure:
         0
         │
         1
         │
         3
         │
         5
         │
         6

AFTER find(6) with path compression:

  All nodes on path now point directly to root (0):
         0
       / │ \ \
      1  3  5  6
  (1,3,5,6 all point to 0)

  Future find(6): 6→0 directly — O(1)
  Future find(5): 5→0 directly — O(1)
  Future find(3): 3→0 directly — O(1)

HALVING COMPRESSION (alternative — faster in practice):
  parent[x] = parent[parent[x]]  (skip one level each call)
  Less aggressive, also achieves O(α(n)) amortised.

Complexity Table: Four Variants

VARIANT                          find()        union()      Space
──────────────────────────────────────────────────────────────────
Naive (no optimisations)         O(n)          O(n)         O(n)
Path compression only            O(log n) amort O(log n)    O(n)
Union by rank only               O(log n)      O(log n)     O(n)
Path compression + union by rank O(α(n))       O(α(n))      O(n)

α(n) = inverse Ackermann ≈ O(1) for all practical inputs.
Combined: effectively O(1) per operation.
Total for m operations on n elements: O(m × α(n)) ≈ O(m).

Application 1: Cycle Detection in Undirected Graph

ALGORITHM:
  For each edge (u, v):
    If find(u) == find(v): CYCLE (both already connected)
    Else: union(u, v) — merge components

EXAMPLE:
  V=3, Edges: (0,1), (1,2), (0,2)

  Process (0,1): find(0)=0, find(1)=1 → different → union(0,1)
    parent=[0,0,2] (1's parent = 0)

  Process (1,2): find(1)=0, find(2)=2 → different → union(0,2)
    parent=[0,0,0] (all in same component)

  Process (0,2): find(0)=0, find(2)=0 → SAME → CYCLE DETECTED! ✓
Java
1import java.util.*; 2 3public class CycleDetectionDSU { 4 5 static int[] parent, rank; 6 7 static void init(int n) { 8 parent = new int[n]; rank = new int[n]; 9 for (int i = 0; i < n; i++) parent[i] = i; 10 } 11 12 static int find(int x) { 13 if (parent[x] != x) parent[x] = find(parent[x]); 14 return parent[x]; 15 } 16 17 static boolean union(int x, int y) { 18 int px = find(x), py = find(y); 19 if (px == py) return false; // Cycle! 20 21 if (rank[px] < rank[py]) { int t=px; px=py; py=t; } 22 parent[py] = px; 23 if (rank[px]==rank[py]) rank[px]++; 24 return true; 25 } 26 27 public static boolean hasCycle(int V, int[][] edges) { 28 init(V); 29 for (int[] edge : edges) { 30 if (!union(edge[0], edge[1])) return true; // union returns false = cycle 31 } 32 return false; 33 } 34 35 public static void main(String[] args) { 36 // Cyclic: 0─1─2─0 (triangle) 37 int[][] cyclic = {{0,1},{1,2},{0,2}}; 38 System.out.println("Has cycle: " + hasCycle(3, cyclic)); // true 39 40 // Acyclic: 0─1─2 (path) 41 int[][] acyclic = {{0,1},{1,2}}; 42 System.out.println("Has cycle: " + hasCycle(3, acyclic)); // false 43 44 // 4 vertices, 4 edges with cycle 45 int[][] g2 = {{0,1},{1,2},{2,3},{3,1}}; 46 System.out.println("Has cycle: " + hasCycle(4, g2)); // true 47 } 48}
Has cycle: true
Has cycle: false
Has cycle: true

Application 2: Number of Islands (Dynamic)

Count connected land cells, handling cells revealed one by one.

PROBLEM: given a grid, cells added one at a time. After each add,
report the number of connected islands.

APPROACH:
  Map each cell (r,c) to index = r*cols + c.
  When cell (r,c) is added:
    components++  (new island)
    For each adjacent cell already added:
      If not connected: union(cell, adjacent) → components--

GRID EXAMPLE (4 adds):
  ┌───┬───┬───┐       After add (0,0): 1 island
  │ 0 │   │   │       After add (0,2): 2 islands
  ├───┼───┼───┤       After add (1,1): 3 islands (diagonal, not connected)
  │   │   │   │       After add (0,1): 2 islands (joins (0,0),(0,1),(0,2))
  └───┴───┴───┘
Java
1import java.util.*; 2 3public class NumberOfIslands { 4 5 int[] parent, size; 6 int islands; 7 8 public NumberOfIslands(int rows, int cols) { 9 int n = rows * cols; 10 parent = new int[n]; 11 size = new int[n]; 12 Arrays.fill(parent, -1); // -1 = water (not added) 13 islands = 0; 14 } 15 16 int find(int x) { 17 if (parent[x] != x) parent[x] = find(parent[x]); 18 return parent[x]; 19 } 20 21 void addLand(int r, int c, int cols, int[][] dirs) { 22 int id = r * cols + c; 23 if (parent[id] != -1) return; // Already land 24 25 parent[id] = id; 26 size[id] = 1; 27 islands++; // New isolated island 28 29 for (int[] d : dirs) { 30 int nr = r + d[0], nc = c + d[1]; 31 if (nr < 0 || nc < 0 || nc >= cols) continue; 32 int nid = nr * cols + nc; 33 if (parent[nid] == -1) continue; // Neighbour is water 34 35 // Neighbour is land — merge if different components 36 int pr = find(id), pn = find(nid); 37 if (pr != pn) { 38 if (size[pr] < size[pn]) { int t=pr; pr=pn; pn=t; } 39 parent[pn] = pr; 40 size[pr] += size[pn]; 41 islands--; 42 } 43 } 44 } 45 46 public static List<Integer> numIslands2(int rows, int cols, int[][] positions) { 47 NumberOfIslands noi = new NumberOfIslands(rows, cols); 48 int[][] dirs = {{0,1},{0,-1},{1,0},{-1,0}}; 49 List<Integer> result = new ArrayList<>(); 50 51 for (int[] pos : positions) { 52 noi.addLand(pos[0], pos[1], cols, dirs); 53 result.add(noi.islands); 54 } 55 return result; 56 } 57 58 public static void main(String[] args) { 59 int[][] positions = {{0,0},{0,2},{1,1},{0,1}}; 60 System.out.println(numIslands2(2, 3, positions)); 61 // [1, 2, 3, 2] 62 } 63}
Output:
[1, 2, 3, 2]
Explanation:
  After (0,0): one island {(0,0)}         → 1
  After (0,2): two islands {(0,0)}, {(0,2)} → 2
  After (1,1): three islands (1,1 not adjacent to either) → 3
  After (0,1): (0,1) connects (0,0) and (0,2) → merges to 1 island → 2

DSU with Rollback (Persistent DSU)

For problems that need UNDO of union operations (e.g., offline dynamic connectivity with deletions).

KEY CHANGE: Use union by RANK ONLY (no path compression).
  Path compression modifies multiple parents — hard to undo.
  Union by rank modifies exactly TWO parent pointers per union.
  → Store (old_parent, old_rank) on a stack for undo.

UNDO STACK: [(node, old_parent, old_rank), ...]

union(x, y):
  px, py = find(x), find(y)
  if px == py: push_noop to stack
  else:
    if rank[px] < rank[py]: swap
    stack.push((py, parent[py], rank[px]))   ← record BEFORE change
    parent[py] = px
    if rank[px] == rank[py]: rank[px]++

rollback():
  py, old_parent, old_rank = stack.pop()
  parent[py] = old_parent
  rank[find(old_parent)] = old_rank   ← restore root's rank

find() WITHOUT path compression: O(log n) per call.
But rollback is O(1) — just pop stack and restore.

Complexity Summary

OperationNaivePath CompressUnion by RankBoth
find()O(n)O(log n) amortO(log n)O(α(n))
union()O(n)O(log n) amortO(log n)O(α(n))
Total m opsO(mn)O(m log n)O(m log n)O(mα(n))

α(n) ≤ 4 for n ≤ 2^(2^(2^(2^2))) — effectively O(1).

Common Mistakes

Not initialising parent[i] = i. If parent[] is initialised to 0 or -1, find(0) returns 0 (correct) but find(1) follows parent[1]=0 → returns 0, making 1 appear to already be in 0's component. Always initialise parent[i] = i for each i.

Calling union(x, y) when find(x) == find(y) and not returning false. When both elements are already in the same component, union should do nothing and return false. Without this check, you may corrupt rank[] by incrementing the root's rank unnecessarily.

Using path compression with rollback DSU. Path compression modifies multiple parent pointers during find(). Undoing this requires restoring every modified pointer, making rollback O(path length). Use union by rank only when rollback is required.

Incorrect path compression — updating parent BEFORE finding root. In iterative path compression, find the root FIRST in a separate loop, then update all nodes to point to the root. Mixing the two loops corrupts the structure.

Forgetting to update component count in union(). If you track components-- in union(), ensure it only decrements when a REAL merge happens (find(x) ≠ find(y)). The early return if px == py: return false must come before the decrement.

Interview Questions

Q: Why does path compression alone give O(log n) amortised rather than O(α(n))?

Path compression makes find() fast for subsequent calls by flattening the tree. However, without union by rank, the tree can grow to height O(log n) but not beyond (trees without rank control can still be unbalanced). Union by rank ensures the tree height is always O(log n) after any sequence of unions. Combined, path compression flattens the tree and union by rank keeps it from growing deep in the first place — together achieving O(α(n)).

Q: What is the difference between union by rank and union by size?

Union by rank tracks an upper bound on tree height. Union by size tracks the actual number of elements in the component. Both guarantee O(log n) tree height. The difference: size gives you accurate component counts (useful for queries like "how many elements in this component?"), while rank only tells you which root to attach under. For pure connectivity, both work equally well. For problems requiring size queries, use size.

Q: How would you use DSU to find the number of connected components in a graph?

Initialise DSU with n elements (n = number of vertices). For each edge (u,v), call union(u,v). Each successful union (that returns true) reduces the component count by 1. The component count starts at n (each vertex isolated) and decreases with each successful merge. Alternatively, count the number of unique roots after all unions: the number of distinct find(i) values for i in 0..n-1.

Summary

Union-Find (DSU) tracks components and answers connectivity queries in near-constant time.

Two core operations:

  • find(x) — returns the root (representative) of x's component
  • union(x,y) — merges x's component with y's component

Two key optimisations (always use both):

  • Path compression — during find, make every node point directly to root → future finds O(1)
  • Union by rank/size — attach smaller tree under larger → tree height O(log n)

Combined: O(α(n)) ≈ O(1) per operation amortised.

Three classic applications:

ApplicationMechanism
Cycle detectionIf find(u) == find(v) before adding edge → cycle
Kruskal's MSTSort edges; add if find(u) ≠ find(v)
Number of islandsAdd land cell; union with adjacent land cells; track component count

Rollback DSU: union by rank only (no path compression); store undo stack; O(log n) per find, O(1) per undo.

In the next topic you will explore Strongly Connected Components — Kosaraju's and Tarjan's algorithms for finding maximal strongly connected subgraphs in directed graphs.

Suggested Quiz

Path compression in Union-Find modifies parent[] during a find() call. What does it do and why?

1/6
Union Find (DSU)