Hello Everyone. Could anyone please help me solve this problem. I am actually new to using STL. I have been given a task of searching a file of over 60000 records. The data extracted from the file is stored in struct and that struct is in turn stored in vectors i-e a vector of struct having over 60000 struct elements. I have to look for duplicates in that vector of structs and storing those duplicates in another container so that the original vector is left with only those elements which have no duplicates. For example:

OrgVector = 1 2 3 4 1 5 6 2 4 7 9 10 1
DupVectpr = 1 1 1 2 2 4 4
and the original vector is now left with
OrgVector = 3 5 6 7 9 10

the struct is something like this

struct MyPred
{
int x, y, z;
};

then I have to find the difference between x and z for each struct element stored in the DupVector and will push_back the vector again to the original vector (from the DupVector) where the difference between x and z of its struct element would be the minimum. I have done the same thing with simple ints and even string type of vectors but am unable to do the same thing for vector of structs. And is there any fast way of searching 60000 records for duplicates because simple find/find_if would be too slow since they perform linear search. I tried multimaps where it returned all values against a single key and it was a really very quick search i-e I stored same numbers in multimap both as its key and value and the equal_range returned all values against a single key. E-g considering the above vector

Multimap Keys = 1 2 3 4 1 5 6 2 4 7 9 10 1
Multimap Values = 1 2 3 4 1 5 6 2 4 7 9 10 1

Values against key 1: 1 1 1
Values against key 2: 2 2
Values against key 4: 4 4

But again I could not do it with vectors of structs. Could anyone please help do this assignment. Any help along with code would be greatly appreciated because it is basically the syntax that I am stuck in.

Recommended Answers

All 15 Replies

Write comparison operators for operator< and operator== for the struct. eg.

struct A
{
    int x, y, z ;

    inline bool operator< ( const A& that ) const ;

    inline bool operator== ( const A& that ) const
    { return (x==that.x) && (y==that.y) && (z==that.z) ; }
};

bool A::operator< ( const A& that ) const
{
    if( x < that.x ) return true ;
    else
    {
        if( x > that.x ) return false ;
        if( y < that.y ) return true ;
        else
        {
            if( y > that.y ) return false ;
            return z < that.z ;
        }
    }
}

Use std::sort to sort the vector and then std::unique to move duplicates to the end of the sequence. Finally, use std::vector<>::erase to remove the duplicate elements from the vector.

void remove_dups( std::vector<A>& seq )
{
    std::sort( seq.begin(), seq.end() ) ;
    seq.erase( std::unique( seq.begin(), seq.end() ), seq.end() ) ;
}

Thank you so much for your reply and your help. The struct actually contains string type of variables. Sorry for not mentioning it. There are 16 such string type struct members? How can I sort string variables? What does unique actually do? Does it remove duplicates or just move duplicates along with the actual variable to the end of the sequence and I dont want to remove the duplicates but to store them in another vector along with the actual variable. And will this method be fast enough for using with over 60000 records???

I must mention that the way you have written the overloaded operator== has helped me A LOT. I was always confused about writting it for struct type of objects.

> There are 16 such string type struct members? How can I sort string variables?
std::string has overloaded comparison operators. So you could sort them in the same way as ints.

> What does unique actually do? Does it remove duplicates
> or just move duplicates along with the actual variable to the end of the sequence
Just move the duplicate elements to the end of the sequence.
http://stdcxx.apache.org/doc/stdlibug/13-5.html

> I dont want to remove the duplicates but to store them in another vector
Something like this would do the trick.

void copy_duplicates( std::vector<A>& seq, std::vector<A>& duplicates  )
{
    std::sort( seq.begin(), seq.end() ) ;
    std::copy( std::unique( seq.begin(), seq.end() ), seq.end(),
               std::back_inserter(duplicates) ) ;
}

> And will this method be fast enough for using with over 60000 records???
The time taken would be O( N log N ). If 16 strings are being compared, it would take a few seconds.

Maybe I'm looking at this all wrong but won't you just copy the multimap to a map container of the same type? Ooops didn't read the entire post...please ignore.

oh thank you so soo soo very much for taking your time by trying to help me understand through bits of codes and not just saying it :). That is honestly very helpful. Is sorting necessary? and vector does not contain simple int variables but struct elements. Is this how I can sort it???

bool A::operator< ( const A& that ) const ???

The string members inside the struct are some thing like this:

string trID, ordID ... dateTime.

trID could be anything like TRDS14KT5, same is the case with ordIDs which could contain anything i-e a combination of both string+ints. How can I sort this type of struct?

Pleaseeeeeee dont get irritated with my questions :p. Pleaseee. "Sorry" in advance

> Is sorting necessary?
Yes, if you want to use std::unique

> How can I sort this type of struct?
By writing an overloaded operator bool A::operator< ( const A& that ) const and then using std::sort

;'( didnt get you ;'(. Elaborating my point:

struct contains string type of variables a, x, y, z and if string x, y, z (i-e excluding variable a) are all equal to each other then convert a and z to int (which I am sure there would be some function in C++ for this type of conversion) and fnd out the the difference between a and z of each such duplicate and then push_back the vector with minimum a-z difference to the original vector. How can I sort a struct of strings of this type? ;'(

;'( didnt get you ;'(. Elaborating my point:

struct contains string type of variables a, x, y, z and if string x, y, z (i-e excluding variable a) are all equal to each other then convert a and z to int (which I am sure there would be some function in C++ for this type of conversion) and fnd out the the difference between a and z of each such duplicate and then push_back the vector with minimum a-z difference to the original vector. How can I sort a struct of strings of this type? ;'(

I think you'll have to go through your vector twice.
First add a variable "int difference" to your struct.
Then go through your vector (may have to #include math.h and whichever lib has atoi)

for (int i = 0; i < vec.size(); ++i){
  if (x == y && y == z){
    difference = abs( atoi(a.c_str()) - atoi(z.c_str()) );//always >= 0
  else
    difference = -1;
  }
}

Next write the compare function like Vilkjfeioa said, except have it operate on difference. Put everything with default -1 at the end.

bool A::operator< ( const A& that ) const {
   if (difference == -1) return false;
   return (difference < that.difference);
}

So basically if difference == -1 (no match) it's always considered greater. I'm not entirely comfortable with how sort will deal with that.. you'll just have to try it.

-Greywolf

thanks for your reply. Let me try to explain the whole scenario here.

I am totally new to using STL.

I have a vector which contains struct type of elements and that struct contains 16 string type of members. There are over 60000 elements in that vector. The vector contains duplicate elements (of struct). I have to remove all the duplicates along with the actual element (having duplicates) and store them in another vector or any container. I copied the whole vector into a multimap. Key and values both contain the same vector e-g to simplify it, in case of simple integers it would look something like this:

k v
1 1
2 2
3 3
4 4
1 1
3 3
5 5
5 5

and I want it to return:

p k v
1 1 1 1 (there are two 1s, one at pos 1 and the other one at pos 5)
2 2 2
3 3 3 3 3 (there are three 3s, at pos 3, pos 6 and pos 9)
4 4 4
5 1 1 1
6 3 3 3 3
7 5 5 5
8 5 5 5
9 3 3 3 3

it works fine with simple ints or string type of vectors but it doesnt work with struct type of elements and instead returns:

k v
1 1
2 2
3 3
4 4
1 1
3 3
5 5
5 5

that is NO difference.

Afterwards I want this multimap to contain only the duplicate elements i-e

p k v
1 1 1 1 (there are two 1s, one at pos 1 and the other one at pos 5)
3 3 3 3 3 (there are three 3s, at pos 3, pos 6 and pos 9)
7 5 5 5

and the actual vector to contain only this elements which have no duplicates i-e element 2 and 4 because they do not have any duplicates.

struct contains string type of variables e-g a, x, y, z and if string x, y, z (i-e excluding variable a) of one element are equal to the x, y, z of any other element (duplicate found) of the same vector then convert a and z to int (which I am sure there would be some function in C++ for this type of conversion) and fnd out the the difference between a and z of each such duplicate and then push_back the vector with minimum a-z difference to the original vector. could you please help me do this? Any container that you think would be the best in this situation.

Actually try this instead.. ignore the previous post about vectors. I just woke up and didn't fully read the problem.

bool A::operator< ( const A& that ) const {
   float this_difference;
   float that_difference;

   if (x == y && y == z)
     this_difference == abs( atoi(a.c_str()) - atoi(z.c_str() );
   else this_difference = -1;

   if (that->x == that->y && that.y == that.z)
     that_difference == abs( atoi( (that->a).c_str() ) - atoi( (that->z).c_str() );
   else that_difference = -1;

   if (this_difference == -1) return false;
   if (that_difference == -1) return true;

   return (this_difference < that_difference);

-Greywolf

I am sorry I didnt explain it correctly. Could you please go through my last reply.

I am sorry I didnt explain it correctly. Could you please go through my last reply.

I think I get it.

You could try it like this:

Make the keys be strings before you insert into the multimap by making a big string out of all the little strings in your struct.

i.e. string key = b + c + d + e + f

Then, setup your multimap like: multimap<string, struct_type> mm

Now you can see if there are duplicates by using mm.count(key_string);

And then you can copy the elements that are duplicates into a vector, and erase them from the multimap afterward.
-Greywolf

Thanks a TON :D. Good idea. Could you please write a small sample simplest code for me so that it becomes more understandable. THANX AGAIN

Thanks a TON :D. Good idea. Could you please write a small sample simplest code for me so that it becomes more understandable. THANX AGAIN

So if you have a vector of structs, where each struct has many strings, make the key for each struct like so and then insert it into your multimap. I'm going to store the list of keys for later use in another container. Note that this solution is not optimal. I don't use maps that much and don't remember how to get the keys out easily. Also I'm in a hurry so this isn't proofread.

string superkey = "";
multimap<string, your_struct_type> mm;
vector<your_struct_type> other_vec;
vector<string> key_store;

for (int i = 0; i < vec.size(); ++i){
  superkey += vec[i].b;
  superkey += vec[i].c;
  superkey += vec[i].d;
  //... and so on

  mm[superkey] = vec[i];
  key_store.push_back(superkey);
}

Next check for duplicates in your mm and remove them and put them somewhere else.

for (int i = 0; i < key_store.size(); ++i){
  while (mm.count(key_store[i]) >= 1){
    other_vec.push_back(*(mm.find(key_store[i]));
    mm.erase(key_store[i]);}
}

-Greywolf

oh Thank you so much :) for your help.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.