Find the duplicate numbers in a set of ints

Please support our C++ advertiser: Intel Parallel Studio Home
Reply

Join Date: Jun 2007
Posts: 16
Reputation: kinggarden is an unknown quantity at this point 
Solved Threads: 0
kinggarden kinggarden is offline Offline
Newbie Poster

Find the duplicate numbers in a set of ints

 
0
  #1
Jun 19th, 2007
Hi, everybody.

The question is :

if the user input a set of numbers, you don't know how long the

set is, like 1, 2, 3, 4, 5, 5, 5, 6, 1, 3, -1, -12141, 123, -1...., you are

asked to find the dumplicate numbers in this set. I mean 1, 5, -1.

Acturally, I just can find an o(n ^ 2) algorithm by scaning the

set for n times.

I think there must be a better way, maybe O(n) or less

Just for fun
Last edited by kinggarden; Jun 19th, 2007 at 6:31 am.
Rock Mu

I will keep walking!

Shanghai, China
Reply With Quote Quick reply to this message  
Join Date: Jun 2007
Posts: 16
Reputation: kinggarden is an unknown quantity at this point 
Solved Threads: 0
kinggarden kinggarden is offline Offline
Newbie Poster

Re: Find the dumplicate numbers in a set of ints

 
0
  #2
Jun 19th, 2007
Sorry, dumplicate -> duplicate

I don't know how to edit the title
Rock Mu

I will keep walking!

Shanghai, China
Reply With Quote Quick reply to this message  
Join Date: Feb 2007
Posts: 539
Reputation: thekashyap will become famous soon enough thekashyap will become famous soon enough 
Solved Threads: 50
thekashyap's Avatar
thekashyap thekashyap is offline Offline
Posting Pro

Re: Find the duplicate numbers in a set of ints

 
0
  #3
Jun 19th, 2007
sorting the list should help.
Are you Agile.. ?
Reply With Quote Quick reply to this message  
Join Date: Dec 2006
Posts: 1,089
Reputation: vijayan121 is a name known to all vijayan121 is a name known to all vijayan121 is a name known to all vijayan121 is a name known to all vijayan121 is a name known to all vijayan121 is a name known to all 
Solved Threads: 164
vijayan121 vijayan121 is offline Offline
Veteran Poster

Re: Find the duplicate numbers in a set of ints

 
0
  #4
Jun 19th, 2007
if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage
  1. #include <vector>
  2. #include <set>
  3. #include <iostream>
  4. #include <string>
  5. using namespace std;
  6.  
  7. template< typename CNTR >
  8. void print_duplicates( const CNTR& cntr )
  9. {
  10. set< typename CNTR::value_type > unique ;
  11. for( size_t i=0 ; i<cntr.size() ; ++i )
  12. if( !unique.insert( cntr[i] ).second )
  13. cout << "duplicate " << cntr[i] << '\n' ;
  14. }
  15.  
  16. int main()
  17. {
  18. string str ; cin >> str ;
  19. vector<int> v( str.begin(), str.end() ) ;
  20. print_duplicates(v) ;
  21. }
this would take O( N log N ). instead of a set, if you use a hashtable with a good hash function, you could do this in linear time.
Reply With Quote Quick reply to this message  
Join Date: Sep 2004
Posts: 7,802
Reputation: Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute 
Solved Threads: 747
Team Colleague
Narue's Avatar
Narue Narue is offline Offline
Code Goddess

Re: Find the duplicate numbers in a set of ints

 
1
  #5
Jun 19th, 2007
>sorting the list should help.
Sorting the list with a general algorithm gives you (at best) O(NlogN) performance just for the sort. Then you have to determine the duplicates. It also means that you have to store the list somewhere first, which means your storage cost is O(N). If there are a lot of duplicates, you're wasting storage for this problem. See below.

>this would take O( N log N ).
That's a curious solution. Why didn't you go with a map instead?
  1. #include <cstdlib>
  2. #include <iostream>
  3. #include <map>
  4.  
  5. int main()
  6. {
  7. std::map<int, int> freq;
  8.  
  9. for ( int i = 0; i < 30; i++ )
  10. ++freq[std::rand() % 20];
  11.  
  12. std::cout<<"Duplicates:\n";
  13. std::map<int, int>::iterator it = freq.begin();
  14.  
  15. while ( it != freq.end() ) {
  16. if ( it->second > 1 )
  17. std::cout<< it->first <<" -- "<< it->second <<" times\n";
  18.  
  19. ++it;
  20. }
  21. }
Not only does this give you a unique list, it also marks each of the duplicates for later use. That's a great deal more flexible than what you have, which can only mark duplicates once.

Of course, it still has a storage problem because of the counter for each item. Even if you use the map as your primary data structure rather than the vector you're using now, it's still effectively a 2N storage cost for integers. Since the OP only seems to want the duplicates, you can remove all but the duplicates with fair ease and limit the storage penalty to when nearly every number has at least one duplicate:
  1. #include <cstdlib>
  2. #include <iostream>
  3. #include <map>
  4.  
  5. void mark_sweep ( std::map<int, int>& mark )
  6. {
  7. std::map<int, int>::iterator it = mark.begin();
  8. std::map<int, int> sweep;
  9.  
  10. while ( it != mark.end() ) {
  11. if ( it->second > 1 )
  12. sweep.insert ( *it );
  13. ++it;
  14. }
  15.  
  16. mark.swap ( sweep );
  17. }
  18.  
  19. int main()
  20. {
  21. std::map<int, int> freq;
  22.  
  23. for ( int i = 0; i < 30; i++ )
  24. ++freq[std::rand() % 20];
  25. mark_sweep ( freq );
  26.  
  27. std::cout<<"Duplicates:\n";
  28. std::map<int, int>::iterator it = freq.begin();
  29.  
  30. while ( it != freq.end() ) {
  31. std::cout<< it->first <<" -- "<< it->second <<" times\n";
  32. ++it;
  33. }
  34. }
Last edited by Narue; Jun 19th, 2007 at 3:04 pm.
I'm here to prove you wrong.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,273
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 378
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: Find the duplicate numbers in a set of ints

 
0
  #6
Jun 19th, 2007
Originally Posted by vijayan121 View Post
if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage
  1. #include <vector>
  2. #include <set>
  3. #include <iostream>
  4. #include <string>
  5. using namespace std;
  6.  
  7. template< typename CNTR >
  8. void print_duplicates( const CNTR& cntr )
  9. {
  10. set< typename CNTR::value_type > unique ;
  11. for( size_t i=0 ; i<cntr.size() ; ++i )
  12. if( !unique.insert( cntr[i] ).second )
  13. cout << "duplicate " << cntr[i] << '\n' ;
  14. }
  15.  
  16. int main()
  17. {
  18. string str ; cin >> str ;
  19. vector<int> v( str.begin(), str.end() ) ;
  20. print_duplicates(v) ;
  21. }
this would take O( N log N ). instead of a set, if you use a hashtable with a good hash function, you could do this in linear time.
What input are you using, I don't get anything intelligible for the output?
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Join Date: Sep 2004
Posts: 7,802
Reputation: Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute 
Solved Threads: 747
Team Colleague
Narue's Avatar
Narue Narue is offline Offline
Code Goddess

Re: Find the duplicate numbers in a set of ints

 
0
  #7
Jun 19th, 2007
>What input are you using, I don't get anything intelligible for the output?
It's kind of goofy. He uses the characters in a string to initialize the vector of integers. You have to input a string with no spaces, and the output will be the integer code for each of the characters in the string. Try "012345123" as input. Assuming ASCII, you'll get 49, 50, and 51 as the duplicates.
I'm here to prove you wrong.
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,273
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 378
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: Find the duplicate numbers in a set of ints

 
0
  #8
Jun 19th, 2007
Originally Posted by Narue View Post
>What input are you using, I don't get anything intelligible for the output?
It's kind of goofy. He uses the characters in a string to initialize the vector of integers. You have to input a string with no spaces, and the output will be the integer code for each of the characters in the string. Try "012345123" as input. Assuming ASCII, you'll get 49, 50, and 51 as the duplicates.
Ah...
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Join Date: Dec 2006
Posts: 1,089
Reputation: vijayan121 is a name known to all vijayan121 is a name known to all vijayan121 is a name known to all vijayan121 is a name known to all vijayan121 is a name known to all vijayan121 is a name known to all 
Solved Threads: 164
vijayan121 vijayan121 is offline Offline
Veteran Poster

Re: Find the duplicate numbers in a set of ints

 
0
  #9
Jun 19th, 2007
> why didn't you go with a map instead
because i was working under the premise:
if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage

> ...it also marks each of the duplicates for later use. That's a great deal
> more flexible than what you have ...
true.

here is an extract from the python cookbook (this is to remove, not just identify duplicates) [Raymond Hettinger, 2002]
  1. def uniq(alist) # Fastest order preserving
  2. set = {}
  3. return [set.setdefault(e,e) for e in alist if e not in set]
  4.  
  5. def uniq(alist) # Fastest without order preserving
  6. set = {}
  7. map(set.__setitem__, alist, [])
  8. return set.keys()
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,273
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 378
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: Find the duplicate numbers in a set of ints

 
0
  #10
Jun 19th, 2007
Is that map solution something like the first page of this narue.
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Other Threads in the C++ Forum
Thread Tools Search this Thread



Tag cloud for C++
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC