944,051 Members | Top Members by Rank

Ad:
  • C++ Discussion Thread
  • Unsolved
  • Views: 11559
  • C++ RSS
You are currently viewing page 1 of this multi-page discussion thread
Jun 19th, 2007
0

Find the duplicate numbers in a set of ints

Expand Post »
Hi, everybody.

The question is :

if the user input a set of numbers, you don't know how long the

set is, like 1, 2, 3, 4, 5, 5, 5, 6, 1, 3, -1, -12141, 123, -1...., you are

asked to find the dumplicate numbers in this set. I mean 1, 5, -1.

Acturally, I just can find an o(n ^ 2) algorithm by scaning the

set for n times.

I think there must be a better way, maybe O(n) or less

Just for fun
Last edited by kinggarden; Jun 19th, 2007 at 6:31 am.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
kinggarden is offline Offline
16 posts
since Jun 2007
Jun 19th, 2007
0

Re: Find the dumplicate numbers in a set of ints

Sorry, dumplicate -> duplicate

I don't know how to edit the title
Reputation Points: 10
Solved Threads: 0
Newbie Poster
kinggarden is offline Offline
16 posts
since Jun 2007
Jun 19th, 2007
0

Re: Find the duplicate numbers in a set of ints

sorting the list should help.
Reputation Points: 254
Solved Threads: 74
Practically a Posting Shark
thekashyap is offline Offline
804 posts
since Feb 2007
Jun 19th, 2007
0

Re: Find the duplicate numbers in a set of ints

if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage
C++ Syntax (Toggle Plain Text)
  1. #include <vector>
  2. #include <set>
  3. #include <iostream>
  4. #include <string>
  5. using namespace std;
  6.  
  7. template< typename CNTR >
  8. void print_duplicates( const CNTR& cntr )
  9. {
  10. set< typename CNTR::value_type > unique ;
  11. for( size_t i=0 ; i<cntr.size() ; ++i )
  12. if( !unique.insert( cntr[i] ).second )
  13. cout << "duplicate " << cntr[i] << '\n' ;
  14. }
  15.  
  16. int main()
  17. {
  18. string str ; cin >> str ;
  19. vector<int> v( str.begin(), str.end() ) ;
  20. print_duplicates(v) ;
  21. }
this would take O( N log N ). instead of a set, if you use a hashtable with a good hash function, you could do this in linear time.
Reputation Points: 1159
Solved Threads: 285
Posting Virtuoso
vijayan121 is offline Offline
1,606 posts
since Dec 2006
Jun 19th, 2007
1

Re: Find the duplicate numbers in a set of ints

>sorting the list should help.
Sorting the list with a general algorithm gives you (at best) O(NlogN) performance just for the sort. Then you have to determine the duplicates. It also means that you have to store the list somewhere first, which means your storage cost is O(N). If there are a lot of duplicates, you're wasting storage for this problem. See below.

>this would take O( N log N ).
That's a curious solution. Why didn't you go with a map instead?
C++ Syntax (Toggle Plain Text)
  1. #include <cstdlib>
  2. #include <iostream>
  3. #include <map>
  4.  
  5. int main()
  6. {
  7. std::map<int, int> freq;
  8.  
  9. for ( int i = 0; i < 30; i++ )
  10. ++freq[std::rand() % 20];
  11.  
  12. std::cout<<"Duplicates:\n";
  13. std::map<int, int>::iterator it = freq.begin();
  14.  
  15. while ( it != freq.end() ) {
  16. if ( it->second > 1 )
  17. std::cout<< it->first <<" -- "<< it->second <<" times\n";
  18.  
  19. ++it;
  20. }
  21. }
Not only does this give you a unique list, it also marks each of the duplicates for later use. That's a great deal more flexible than what you have, which can only mark duplicates once.

Of course, it still has a storage problem because of the counter for each item. Even if you use the map as your primary data structure rather than the vector you're using now, it's still effectively a 2N storage cost for integers. Since the OP only seems to want the duplicates, you can remove all but the duplicates with fair ease and limit the storage penalty to when nearly every number has at least one duplicate:
C++ Syntax (Toggle Plain Text)
  1. #include <cstdlib>
  2. #include <iostream>
  3. #include <map>
  4.  
  5. void mark_sweep ( std::map<int, int>& mark )
  6. {
  7. std::map<int, int>::iterator it = mark.begin();
  8. std::map<int, int> sweep;
  9.  
  10. while ( it != mark.end() ) {
  11. if ( it->second > 1 )
  12. sweep.insert ( *it );
  13. ++it;
  14. }
  15.  
  16. mark.swap ( sweep );
  17. }
  18.  
  19. int main()
  20. {
  21. std::map<int, int> freq;
  22.  
  23. for ( int i = 0; i < 30; i++ )
  24. ++freq[std::rand() % 20];
  25. mark_sweep ( freq );
  26.  
  27. std::cout<<"Duplicates:\n";
  28. std::map<int, int>::iterator it = freq.begin();
  29.  
  30. while ( it != freq.end() ) {
  31. std::cout<< it->first <<" -- "<< it->second <<" times\n";
  32. ++it;
  33. }
  34. }
Last edited by Narue; Jun 19th, 2007 at 3:04 pm.
Administrator
Reputation Points: 6442
Solved Threads: 1393
Bad Cop
Narue is offline Offline
11,807 posts
since Sep 2004
Jun 19th, 2007
0

Re: Find the duplicate numbers in a set of ints

Click to Expand / Collapse  Quote originally posted by vijayan121 ...
if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage
C++ Syntax (Toggle Plain Text)
  1. #include <vector>
  2. #include <set>
  3. #include <iostream>
  4. #include <string>
  5. using namespace std;
  6.  
  7. template< typename CNTR >
  8. void print_duplicates( const CNTR& cntr )
  9. {
  10. set< typename CNTR::value_type > unique ;
  11. for( size_t i=0 ; i<cntr.size() ; ++i )
  12. if( !unique.insert( cntr[i] ).second )
  13. cout << "duplicate " << cntr[i] << '\n' ;
  14. }
  15.  
  16. int main()
  17. {
  18. string str ; cin >> str ;
  19. vector<int> v( str.begin(), str.end() ) ;
  20. print_duplicates(v) ;
  21. }
this would take O( N log N ). instead of a set, if you use a hashtable with a good hash function, you could do this in linear time.
What input are you using, I don't get anything intelligible for the output?
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Jun 19th, 2007
0

Re: Find the duplicate numbers in a set of ints

>What input are you using, I don't get anything intelligible for the output?
It's kind of goofy. He uses the characters in a string to initialize the vector of integers. You have to input a string with no spaces, and the output will be the integer code for each of the characters in the string. Try "012345123" as input. Assuming ASCII, you'll get 49, 50, and 51 as the duplicates.
Administrator
Reputation Points: 6442
Solved Threads: 1393
Bad Cop
Narue is offline Offline
11,807 posts
since Sep 2004
Jun 19th, 2007
0

Re: Find the duplicate numbers in a set of ints

Click to Expand / Collapse  Quote originally posted by Narue ...
>What input are you using, I don't get anything intelligible for the output?
It's kind of goofy. He uses the characters in a string to initialize the vector of integers. You have to input a string with no spaces, and the output will be the integer code for each of the characters in the string. Try "012345123" as input. Assuming ASCII, you'll get 49, 50, and 51 as the duplicates.
Ah...
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Jun 19th, 2007
0

Re: Find the duplicate numbers in a set of ints

> why didn't you go with a map instead
because i was working under the premise:
if you are not alowed to reorder the original sequence, but are allowed to use auxiliary storage

> ...it also marks each of the duplicates for later use. That's a great deal
> more flexible than what you have ...
true.

here is an extract from the python cookbook (this is to remove, not just identify duplicates) [Raymond Hettinger, 2002]
python Syntax (Toggle Plain Text)
  1. def uniq(alist) # Fastest order preserving
  2. set = {}
  3. return [set.setdefault(e,e) for e in alist if e not in set]
  4.  
  5. def uniq(alist) # Fastest without order preserving
  6. set = {}
  7. map(set.__setitem__, alist, [])
  8. return set.keys()
Reputation Points: 1159
Solved Threads: 285
Posting Virtuoso
vijayan121 is offline Offline
1,606 posts
since Dec 2006
Jun 19th, 2007
0

Re: Find the duplicate numbers in a set of ints

Is that map solution something like the first page of this narue.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in C++ Forum Timeline: Difference between DLL & Process ???
Next Thread in C++ Forum Timeline: GUIs with c++





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC