Richard Layte

Disjoint Sets

A good example to consider for disjoint–sets is a friendship group. A circle of friends is a loosely connected set of people who share some connection to each other, but not necessarily every member in the set. For example, Bill and Ted are friends and Bill knows Rufus. Therefore, Bill, Ted and Rufus can all said to be connected. Another set of friends could be Socrates, Plato and Aristotle, but this set is completely unconnected from the first. However, if by some weird time-traveling trick Bill or Ted were to meet one of these historical figures then the two sets would be merged.

The basic principal of a disjoint-set is to use one member of each subset as the representative for that set and then link all members to that item. So in our case Bill (although any member could be chosen) is the representative of the first set of friends. It’s then possible to find out if Ted and Rufus are friends by checking that they’re represented by the same member — in this case Bill. To add new members to the set, e.g. when Bill and Ted become friends with Socrates, all we need to do is update all members to be represented by the same item.

Implementation

The disjoint–set data structure supports three operations:

  • MakeSet — creates a new set with a single item, which represents itself.
  • Find — returns set that the member belongs to (or the representative of that set).
  • Union — merges two subsets.

One simple way of constructing disjoint-sets is to use a linked list and using the head of the list as the representative. Union simply join one list to the other and Find traverses up the list until it reaches the head, which is the set representative. However, this makes Find an O(n) operation.

An alternative approach is to use a tree structure to represent each set where the representative is the root and all members have a direct pointer to that node. Union attaches one root to the other, which is again a constant-time operation. Find traverses up the tree to the set representative, which is an O(h) operation where h is the height of the tree.

I’ll use the tree based approach for this example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
package disjoint

type DisjointSet struct {
	data   interface{}
	parent *DisjointSet
}

func MakeSet(item interface{}) *DisjointSet {
	return &DisjointSet{data: item}
}

func (s *DisjointSet) Union(s2 *DisjointSet) {
	s.Find().parent = s2.Find()
}

func (s *DisjointSet) Find() *DisjointSet {
	if s.parent != nil {
		return s.parent.Find()
	} else {
		return s
	}
}

First, we create a DisjointSet type that holds some kind of data and a pointer to it’s parent, which is the root of the tree and the current representative of the set. MakeSet is essentially a constructor function that returns a new DisjointSet with a null parent pointer, which signifies this node must be the root of the tree.

Find traverses up the tree until it reaches the root, which is defined by not having a parent pointer. It then returns the root node because this acts as the representative of the set the target member belongs to.

Union is a method on the DisjoinSet type that takes another set and makes both sets share the same representative. It does this by calling Find on both sets to get each root and making the parent of one the other root.

Optimizations

The implementation above uses trees to represent the sets in order to avoid the linear time required to perform Find on a linked list structure. However, if the tree becomes unbalanced O(h) can begin to approach O(n).

However, there are two optimizations we can make to Find and Union, which will reduce both operations to an amortized running time of O(1).

This first optimization we can make is called path compression. This flattens the tree structure over time by attaching each node Find traverses directly to the root node as part of its normal operations. To achieve this we can modify our Find method to:

1
2
3
4
5
6
7
8
func (s *DisjointSet) Find() *DisjointSet {
	if s.parent != nil {
		s.parent = s.parent.Find()
		return s.parent
	} else {
		return s
	}
}

The second optimization we can make is union by rank, which simply means attaching the smaller tree to the larger one. This helps us because the running time depends on the depth of the tree so by attaching them in this order we maintain the same depth as the larger tree unless they are equally big.

To achieve this we need to keep track of a trees rank as we merge sets in order to know which is the shallower tree.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
func (s *DisjointSet) Union(s2 *DisjointSet) {
	r1 := s.Find()
	r2 := s2.Find()

	if r1 == r2 {
		return
	}

	if r1.rank > r2.rank {
		r2.parent = r1
	} else if r2.rank > r1.rank {
		r1.parent = r2
	} else {
		r2.parent = r1
		r1.rank += 1
	}
}

Application

If we go back to our original problem of modeling the relationship between friends, how can we use DisjointSet to represent this? First, we need to use MakeSet to create a single set with one person who initial has no friends. We can then form friendship groups by merging these individuals. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
bill := MakeSet("Bill")
ted := MakeSet("Ted")
rufus := MakeSet("Rufus")

bill.Union(ted)
rufus.Union(ted)

if bill.Find() != ted.Find() {
  t.Error("Single sets should belong to themselves")
}

if rufus.Find() != bill.Find() {
  t.Error("Single sets should belong to themselves")
}

Now that Bill, Ted, Rufus are part of the same friendship group we can combine other groups into this by calling Union on any two members from each of the sets. In the example below, even though Death and Plato aren’t connected directly they are part of the same set.

1
2
3
4
5
6
7
8
9
10
11
socrates := MakeSet("Socrates")
plato := MakeSet("Plato")
death := MakeSet("Death")

socrates.Union(plato)
bill.Union(socrates)
bill.Union(death)

if plato.Find() != death.Find() {
  t.Error("Single sets should belong to themselves")
}