DSU is a Data-Structure to model a collection of disjoint sets, which has the ability to efficiently perform two kinds of operations:
- 'Find' if two items belong to the same set? (which item belongs to which set?)
- 'Union' two disjoint sets into a bigger set.
These two operations are useful for algorithms like Kruskal's Algorithm or problems that involve partitioning like keeping track of connected components in an undirected graph.
Algorithm Edit
The key idea behind this data structure is that we keep track of a representative ('parent') of each set. This information is stored in vector<int> pset, where pset[i] tells the representative item of the set that contains the item i.
When we want to merge two sets, we call unionSet(i, j) which causes both items 'i' and 'j' to have same representative items -- directly or indirectly. This is done by calling findSet(j) -- which is the representative of j, and assigning this value to pset[findSet(i)] -- update the parent of representative item of item 'i'. The function recursively finds if pset[i] is equal to 'i'. Then once it finds the representative element (say x) then for that set, it compresses the path by making pset[i] = x. The subsequent calls of findSet[i] will be in O(1). This simple heuristic strategy is aptly named as 'Path compression'.
Here's is the code:
vector<int> pset(1000); void initSet(int _size) { pset.resize(_size); for(int i=0;i<_size;i++) { pset[i]=i; } } int findSet(int i) { return (pset[i]==i) ? i : pset[i]=findSet(pset[i]); } void unionSet(int i, int j) { pset[findSet(i)] = findSet(j); } bool isSameSet(int i, int j) { return findSet(i) == findSet(j); }
There are two more queries commonly performed on DSU, apart from the ones listed above:
- int numberOfSets( ): which returns the number of disjoint sets currently in the structure.
- int sizeOfSet( int i ): which returns the size of set that contains set that contains item i.
Union by Rank Edit
The 'Union by Rank' Heuristic speeds up this data structure in specific cases. Here, we always always attach the root of the smaller tree to the larger tree.
struct node { int rank; int p; } vector<node> dset(1000); void initSet(int _size) { dset.resize(_size); for(int i=0;i<_size;i++) { dset[i].p=i; dset[i].rank=0; } } int findSet(int i) { return (dset[i]==i) ? i : dset[i]=findSet(dset[i]); } void unionSet(int i, int j) { int iroot = findSet(i); int jroot = findSet(j); // i and j are the same set (have same representative element) if(iroot == jroot) return; // i and j are not the same set, so we need to merge them if (iroot.rank < jroot.rank) iroot.p = jroot; else if(jroot.rank < iroot.rank) jroot.p = iroot; else { jroot.p = iroot; iroot.rank = iroot.rank + 1; } } bool isSameSet(int i, int j) { return findSet(i) == findSet(j); }