| | |
Find the duplicate numbers in a set of ints
Please support our C++ advertiser: Intel Parallel Studio Home
![]() |
•
•
Join Date: Jun 2007
Posts: 16
Reputation:
Solved Threads: 0
Hi, everybody.
The question is :
if the user input a set of numbers, you don't know how long the
set is, like 1, 2, 3, 4, 5, 5, 5, 6, 1, 3, -1, -12141, 123, -1...., you are
asked to find the dumplicate numbers in this set. I mean 1, 5, -1.
Acturally, I just can find an o(n ^ 2) algorithm by scaning the
set for n times.
I think there must be a better way, maybe O(n) or less
Just for fun
The question is :
if the user input a set of numbers, you don't know how long the
set is, like 1, 2, 3, 4, 5, 5, 5, 6, 1, 3, -1, -12141, 123, -1...., you are
asked to find the dumplicate numbers in this set. I mean 1, 5, -1.
Acturally, I just can find an o(n ^ 2) algorithm by scaning the
set for n times.
I think there must be a better way, maybe O(n) or less

Just for fun
Last edited by kinggarden; Jun 19th, 2007 at 6:31 am.
Rock Mu
I will keep walking!
Shanghai, China
I will keep walking!
Shanghai, China
•
•
Join Date: Dec 2006
Posts: 1,089
Reputation:
Solved Threads: 164
if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage
this would take O( N log N ). instead of a set, if you use a hashtable with a good hash function, you could do this in linear time.
C++ Syntax (Toggle Plain Text)
#include <vector> #include <set> #include <iostream> #include <string> using namespace std; template< typename CNTR > void print_duplicates( const CNTR& cntr ) { set< typename CNTR::value_type > unique ; for( size_t i=0 ; i<cntr.size() ; ++i ) if( !unique.insert( cntr[i] ).second ) cout << "duplicate " << cntr[i] << '\n' ; } int main() { string str ; cin >> str ; vector<int> v( str.begin(), str.end() ) ; print_duplicates(v) ; }
>sorting the list should help.
Sorting the list with a general algorithm gives you (at best) O(NlogN) performance just for the sort. Then you have to determine the duplicates. It also means that you have to store the list somewhere first, which means your storage cost is O(N). If there are a lot of duplicates, you're wasting storage for this problem. See below.
>this would take O( N log N ).
That's a curious solution. Why didn't you go with a map instead?
Not only does this give you a unique list, it also marks each of the duplicates for later use. That's a great deal more flexible than what you have, which can only mark duplicates once.
Of course, it still has a storage problem because of the counter for each item. Even if you use the map as your primary data structure rather than the vector you're using now, it's still effectively a 2N storage cost for integers. Since the OP only seems to want the duplicates, you can remove all but the duplicates with fair ease and limit the storage penalty to when nearly every number has at least one duplicate:
Sorting the list with a general algorithm gives you (at best) O(NlogN) performance just for the sort. Then you have to determine the duplicates. It also means that you have to store the list somewhere first, which means your storage cost is O(N). If there are a lot of duplicates, you're wasting storage for this problem. See below.
>this would take O( N log N ).
That's a curious solution. Why didn't you go with a map instead?
C++ Syntax (Toggle Plain Text)
#include <cstdlib> #include <iostream> #include <map> int main() { std::map<int, int> freq; for ( int i = 0; i < 30; i++ ) ++freq[std::rand() % 20]; std::cout<<"Duplicates:\n"; std::map<int, int>::iterator it = freq.begin(); while ( it != freq.end() ) { if ( it->second > 1 ) std::cout<< it->first <<" -- "<< it->second <<" times\n"; ++it; } }
Of course, it still has a storage problem because of the counter for each item. Even if you use the map as your primary data structure rather than the vector you're using now, it's still effectively a 2N storage cost for integers. Since the OP only seems to want the duplicates, you can remove all but the duplicates with fair ease and limit the storage penalty to when nearly every number has at least one duplicate:
C++ Syntax (Toggle Plain Text)
#include <cstdlib> #include <iostream> #include <map> void mark_sweep ( std::map<int, int>& mark ) { std::map<int, int>::iterator it = mark.begin(); std::map<int, int> sweep; while ( it != mark.end() ) { if ( it->second > 1 ) sweep.insert ( *it ); ++it; } mark.swap ( sweep ); } int main() { std::map<int, int> freq; for ( int i = 0; i < 30; i++ ) ++freq[std::rand() % 20]; mark_sweep ( freq ); std::cout<<"Duplicates:\n"; std::map<int, int>::iterator it = freq.begin(); while ( it != freq.end() ) { std::cout<< it->first <<" -- "<< it->second <<" times\n"; ++it; } }
Last edited by Narue; Jun 19th, 2007 at 3:04 pm.
I'm here to prove you wrong.
•
•
•
•
if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage
this would take O( N log N ). instead of a set, if you use a hashtable with a good hash function, you could do this in linear time.C++ Syntax (Toggle Plain Text)
#include <vector> #include <set> #include <iostream> #include <string> using namespace std; template< typename CNTR > void print_duplicates( const CNTR& cntr ) { set< typename CNTR::value_type > unique ; for( size_t i=0 ; i<cntr.size() ; ++i ) if( !unique.insert( cntr[i] ).second ) cout << "duplicate " << cntr[i] << '\n' ; } int main() { string str ; cin >> str ; vector<int> v( str.begin(), str.end() ) ; print_duplicates(v) ; }
*Voted best profile in the world*
>What input are you using, I don't get anything intelligible for the output?
It's kind of goofy. He uses the characters in a string to initialize the vector of integers. You have to input a string with no spaces, and the output will be the integer code for each of the characters in the string. Try "012345123" as input. Assuming ASCII, you'll get 49, 50, and 51 as the duplicates.
It's kind of goofy. He uses the characters in a string to initialize the vector of integers. You have to input a string with no spaces, and the output will be the integer code for each of the characters in the string. Try "012345123" as input. Assuming ASCII, you'll get 49, 50, and 51 as the duplicates.
I'm here to prove you wrong.
•
•
•
•
>What input are you using, I don't get anything intelligible for the output?
It's kind of goofy. He uses the characters in a string to initialize the vector of integers. You have to input a string with no spaces, and the output will be the integer code for each of the characters in the string. Try "012345123" as input. Assuming ASCII, you'll get 49, 50, and 51 as the duplicates.
*Voted best profile in the world*
•
•
Join Date: Dec 2006
Posts: 1,089
Reputation:
Solved Threads: 164
> why didn't you go with a map instead
because i was working under the premise:
if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage
> ...it also marks each of the duplicates for later use. That's a great deal
> more flexible than what you have ...
true.
here is an extract from the python cookbook (this is to remove, not just identify duplicates) [Raymond Hettinger, 2002]
because i was working under the premise:
if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage
> ...it also marks each of the duplicates for later use. That's a great deal
> more flexible than what you have ...
true.
here is an extract from the python cookbook (this is to remove, not just identify duplicates) [Raymond Hettinger, 2002]
python Syntax (Toggle Plain Text)
def uniq(alist) # Fastest order preserving set = {} return [set.setdefault(e,e) for e in alist if e not in set] def uniq(alist) # Fastest without order preserving set = {} map(set.__setitem__, alist, []) return set.keys()
![]() |
Other Threads in the C++ Forum
- Previous Thread: Difference between DLL & Process ???
- Next Thread: GUIs with c++
| Thread Tools | Search this Thread |
Tag cloud for C++
6 api array arrays based beginner binary bmp c++ c/c++ calculator char class classes code compile compiler console conversion convert count data delete deploy desktop directshow dll download dynamic encryption error file forms fstream function functions game givemetehcodez google graph gui homeworkhelp iamthwee ifstream input int java lib library linkedlist linker list loop looping loops map math matrix memory microsoft newbie news number output pointer problem program programming project python random read recursion recursive reference simple string strings studio system temperature template templates test text text-file tree unix url variable vector video visual visualstudio void win32 windows winsock wordfrequency wxwidgets







