A good example to consider for disjoint–sets is a friendship group. A circle of friends is a loosely connected set of people who share some connection to each other, but not necessarily every member in the set. For example, Bill and Ted are friends and Bill knows Rufus. Therefore, Bill, Ted and Rufus can all said to be connected. Another set of friends could be Socrates, Plato and Aristotle, but this set is completely unconnected from the first. However, if by some weird time-traveling trick Bill or Ted were to meet one of these historical figures then the two sets would be merged.
The basic principal of a disjoint-set is to use one member of each subset as the representative for that set and then link all members to that item. So in our case Bill (although any member could be chosen) is the representative of the first set of friends. It’s then possible to find out if Ted and Rufus are friends by checking that they’re represented by the same member — in this case Bill. To add new members to the set, e.g. when Bill and Ted become friends with Socrates, all we need to do is update all members to be represented by the same item.
Implementation
The disjoint–set data structure supports three operations:
MakeSet
— creates a new set with a single item, which represents itself.Find
— returns set that the member belongs to (or the representative of that set).Union
— merges two subsets.
One simple way of constructing disjoint-sets is to use a linked list and using the head of the list as the representative. Union
simply join one list to the other and Find
traverses up the list until it reaches the head, which is the set representative. However, this makes Find
an O(n) operation.
An alternative approach is to use a tree structure to represent each set where the representative is the root and all members have a direct pointer to that node. Union
attaches one root to the other, which is again a constant-time operation. Find
traverses up the tree to the set representative, which is an O(h) operation where h is the height of the tree.
I’ll use the tree based approach for this example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
First, we create a DisjointSet
type that holds some kind of data and a pointer to it’s parent, which is the root of the tree and the current representative of the set. MakeSet
is essentially a constructor function that returns a new DisjointSet
with a null parent pointer, which signifies this node must be the root of the tree.
Find
traverses up the tree until it reaches the root, which is defined by not having a parent
pointer. It then returns the root node because this acts as the representative of the set the target member belongs to.
Union
is a method on the DisjoinSet
type that takes another set and makes both sets share the same representative. It does this by calling Find
on both sets to get each root and making the parent of one the other root.
Optimizations
The implementation above uses trees to represent the sets in order to avoid the linear time required to perform Find
on a linked list structure. However, if the tree becomes unbalanced O(h) can begin to approach O(n).
However, there are two optimizations we can make to Find
and Union
, which will reduce both operations to an amortized running time of O(1).
This first optimization we can make is called path compression. This flattens the tree structure over time by attaching each node Find
traverses directly to the root node as part of its normal operations. To achieve this we can modify our Find
method to:
1 2 3 4 5 6 7 8 |
|
The second optimization we can make is union by rank, which simply means attaching the smaller tree to the larger one. This helps us because the running time depends on the depth of the tree so by attaching them in this order we maintain the same depth as the larger tree unless they are equally big.
To achieve this we need to keep track of a trees rank as we merge sets in order to know which is the shallower tree.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Application
If we go back to our original problem of modeling the relationship between friends, how can we use DisjointSet
to represent this? First, we need to use MakeSet
to create a single set with one person who initial has no friends. We can then form friendship groups by merging these individuals. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Now that Bill, Ted, Rufus are part of the same friendship group we can combine other groups into this by calling Union
on any two members from each of the sets. In the example below, even though Death and Plato aren’t connected directly they are part of the same set.
1 2 3 4 5 6 7 8 9 10 11 |
|