Find all subsets of a given set - Optimization?

Question

mrboolf 122 Junior Poster

16 Years Ago

Hi all there.
I was given the task to implement a Subset class with a few function.
It should be structured so to contain the original set and make it possible to generate all subsets.
Right now I'm focusing on the "generate all subsets" part.
I've done a little searching and I thought to implement it as an array (the original set) and a bitmask to indicate what elements are present in the current subset. That idea came to me from a Java forum (I hope I understood it correctly since I'm not into Java at all).
As of now it works fine, only it seems to me a little slow... any suggestion on how to optimize it? I'm open to different approaches as well.

Here's the (relevant) code:

// subset.h
#include <iostream>

#ifndef H_SUBSET
#define H_SUBSET

template <class T>
class Subset {
	private:
		int Dimension;
		T *Set;
		bool *Bitmask;
	public:
		Subset(T *set, int dim);
		~Subset(void);
		void Show(void);
		void NextSubset(void);
		void EmptySet(void);
		void FullSet(void);
		int Cardinality(void);
		T operator[](int index);
};		

// ... other (not relevant) functions

template <class T>
void Subset<T>::NextSubset(void) {
	int i = Dimension - 1;
	while(!Bitmask[i]&&i>=0) {
		i--;
	}
	if(i>=0) {
		Bitmask[i] = !Bitmask[i];
	}
	for(int j = i+1; j < Dimension; j++) {
		Bitmask[j] = !Bitmask[j];
	}
	return;
}

#endif // H_SUBSET

// main.cpp
#include "subset.h"

#define N 5

using std::cout;
using std::endl;

int main(void) {
	char a[N];
	for(int i = 0; i < N; i++) {
		a[i] = 'a' + i;
	}
	Subset<char> mySubset(a, N);
	for(int i = 0; i < (1 << N); i++) {
		cout << endl;
		mySubset.Show();
		mySubset.NextSubset();
	}
	return 0;
}

Some timing:

N = 5   -->  0m0.004s
N = 10  -->  0m0.036s
N = 15  -->  0m3.570s
N = 20  -->  5m16s

Thanks for your time :)

c++

2 Contributors
7 Replies
743 Views
1 Day Discussion Span
Latest Post 16 Years Ago Latest Post by mrboolf

ArkM 1,090 Postaholic

16 Years Ago

1. Try to profile subset generator without any printing. There are ~1 million subsets (2^20) for N==20. To print 1 million lines... The program PRINTS all the time.
2. NextSubset code is obviously wrong: while(!Bitmask[i]&&i>=0) refers to Bitmask[-1] at the last loop. Present all class Subset definition.

mrboolf commented: thanks again: fast, precise and helpful +1

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mrboolf 122 Junior Poster · Answer 1 · 2008-11-28T22:19:40+00:00

Thanks for your reply.

1. You're right, absolutely. Even with N = 30 it takes just ~36 seconds to complete the loop without any printing. :) I didn't know printing calls were so expensive...
2. I apologize. Here's the full code (so far, some functions are still missing) corrected.

#include <iostream>

#ifndef H_SUBSET
#define H_SUBSET

template <class T>
class Subset {
	private:
		int Dimension;
		T *Set;
		bool *Bitmask;
	public:
		Subset(T *set, int dim);
		~Subset(void);
		void Show(void);
		void NextSubset(void);
		void EmptySet(void);
		void FullSet(void);
		int SubsetCardinality(void);
		int SetCardinality(void);
		T operator[](int index);
};		

template <class T>
int Subset<T>::SetCardinality(void) {
	return Dimension;
}

template <class T>
int Subset<T>::SubsetCardinality(void) {
	int dim = 0;
	for(int i = 0; i<Dimension; i++) {
		if(Bitmask[i]) {
			dim++;
		}
	}
	return dim;
}

template <class T>
void Subset<T>::EmptySet(void) {
	for(int i = 0; i<Dimension; i++) {
		Bitmask[i] = 0;
	}
	return;
}

template <class T>
void Subset<T>::FullSet(void) {
	for(int i = 0; i<Dimension; i++) {
		Bitmask[i] = 1;
	}
	return;
}

template <class T>
void Subset<T>::NextSubset(void) {
	int i = Dimension - 1;
	while(!Bitmask[i]&&i>=0) {
		i--;
		if(i<0) {
			break;
		}
	}
	if(i>=0) {
		Bitmask[i] = !Bitmask[i];
	}
	for(int j = i+1; j < Dimension; j++) {
		Bitmask[j] = !Bitmask[j];
	}
	return;
}

template <class T>
void Subset<T>::Show(void) {
	std::cout << "{ ";
	for(int i=0; i<Dimension; i++) {
		if(Bitmask[i]) {
			std::cout << Set[i] << ", ";
		}
	}
	std::cout << "}";
	return;
}

template <class T>
Subset<T>::Subset(T *set, int dim) {
	Set = new T[dim];
	Bitmask = new bool[dim];
	Dimension = dim;
	for(int i=0; i<dim; i++) {
		Set[i] = set[i];
		Bitmask[i] = true;
	}
}

template <class T>
Subset<T>::~Subset(void) {
	delete [] Set;
	delete [] Bitmask;
}

#endif // H_SUBSET

I think I can mark this as solved, assuming from your reply that you don't suggest a better approach and given that my doubt was clarified. Anyway, feel free to add :)

ArkM 1,090 Postaholic · Answer 2 · 2008-11-28T22:50:03+00:00

Keep it simpler, swap subexpressions:

while (i >= 0 && !Bitmask[i]) // that's all...

I'll see your code later ;)
Good luck!

ArkM 1,090 Postaholic · Answer 3 · 2008-11-28T23:03:04+00:00

Apropos, if you want to get ALL subsets, use one long long int 64-bit word. That's ALL subsets generator ;) :

long long unsigned n = (N < 1?0:(1 << N));
for (long long unsigned bit = 0; bit < n; ++bit) {
    // Now bits in bit are the same as your Bitmask
    // Use >> and & operators to select elements 
}  // long unsigned is a good type too (2^32 subsets)

Supersonic code ;)...

mrboolf 122 Junior Poster · Answer 4 · 2008-11-29T16:27:05+00:00

Thanks again!

Is you supersonic code meant to become something like this? (I am not practical with bitwise operators so it took me a while to figure this out)

long long unsigned int n = (N<1)?0:(1<<N);
for(long long unsigned int bit = 0; bit < n; ++bit) {
	cout << "{";
	for(int i = N; i>0; i--) {
		if((bit >> (i-1)) & 1) {
			cout << mySubset[N-i] << ", ";
		}
	}
	cout << " };" << endl;
}
return 0;

And now a perhaps stupid question: even with long long unsigned I'm still bound to 2^32 (if I enter a greater N the compiler issues the following warning: test.cpp:31: warning: left shift count >= width of type ).
2^N grows fast indeed, so how could I make this work with arbitrary size sets? I don't care much about speed in this case.

A first idea that came to me was, given that my first bitmask-array method generated the subsets from { Whole Set } to { }, to use it in combination with a bool IsEmptySubset(void) function. Something like:

do {
     mySubset.NextSubset();
} while (!mySubset.IsEmptySet());

is this that much stupid? :P

ArkM 1,090 Postaholic · Answer 5 · 2008-11-29T17:33:56+00:00

Something of the sort ;).
Slightly optimized version:

typedef long long unsigned int ULong;
    unsigned n;
    unsigned j;
    ULong i, bit, bits;
    for (i = 0, bit = 0; i < n; ++i, ++bit) {
        for (j = 0, bits = bit; bits; bits >>= 1, ++j)
            if ((bits & 1) != 0)
                cout << mySet[j];
        cout << '\n';
    }

With VC++ 2008 64-bit ULong type this code will work some tens of million years. That's the answer to your doubts: you may enumerate all subsets for cardinality > 32 but you can't ;) do that...

If your compiler supported "too short" long long (32 bits, for example), there is a very simple solution: emulate 64-bit (of 32*n) bit arithmetics, add yet another outer loop. I don't want to write that code (do it yourself if you wish): see remarks above...

However if you need to fix/process large set subsets (not for absurd enumeration, of course), arbitrary size bitvectors are useful artifacts (sometimes list of subset member indicies are much more useful)...

mrboolf 122 Junior Poster · Answer 6 · 2008-11-29T17:57:27+00:00

My ideas are bit(s) clearer now :P
Now I should be able to work out the rest of the assignment.

Thanks again for your time!