Hello,

I need to find repeated sequences within a vector.

I have been thinking about using search, as is described here. But this uses an array to search, I don't need to use extra memory, as I am seaching for sequences within the same vector.

Any suggestions on how to do this? Perhaps there is an easier methode than search?

Thanks

Recommended Answers

All 14 Replies

Search sounds like a good way to go. I don't know what you mean by it uses "an array to search", but this shouldn't use any extra memory, it looks you just specify a range of indices which is your search patter, and a range of indices in which to search.

You could sort the vector then repeated entries will be easy to find with minimum effort and time. Use an interator if you want to erase duplicates because erase() return the iterator to the next item following the one it erased.

Thanks Ancient Dragon,

But I am trying to find sequences of 2, 3, 4, etc subsequent elements.

@daviddoria:
I was referring to the match1 array that is a subvector of myvector in the page I referred to

Ok,

I implemented the solution from the forum, using the search function.

Thing is now, since I am looking for a REPEATED sequence, search always finds at least one occurance. How can I see if there are two occurences?

Now, the check only does:

if (it!= myvector.end())

but since it always finds at least one occurance, this is not a valid check. Should I remove the search term from the searched vector and insert it again later? That seems a little over the top... Any ideas?

I still think I might be doing a few redundant things in copying vectors over and forth. But here is my solution, in case people want to learn from it:

//for each length of sequence

     for(i=mlength/2; i>= 2; i--){

     //for(i=3; i>= 2; i--){
         //for each position of the sequence within searchgroup

        for (vi = (notegroup.begin()); vi != (notegroup.end()); vi++){

             searchgroup = notegroup;

             //populate sequence
             sequence.clear();
             sequence.insert(sequence.begin(), vi, vi+i);

            

             //for each sequence
             //create a searchgroup
             //search for sequence

           
             //temp remove first sequence from searchgroup
             xi = search(searchgroup.begin(), searchgroup.end(), sequence.begin(), sequence.end());


             if (xi!=searchgroup.end()){
                 searchgroup.erase(xi, xi+i);
             }
           

             //search for other occurences
             xi = search(searchgroup.begin(), searchgroup.end(), sequence.begin(), sequence.end());

             //search for repeated sequence
             if (xi!=searchgroup.end()){
             

                 //leave sequence deleted from vector
                 notegroup = searchgroup;

             }



         }

     }

Oeps... I just thought of something. Perhaps by erasing a few elements, I am creating a new sequence. Looks like this approach is falted...

Mmmm... I am starting to think about writing a manual search function.

No way to count how many occurances the search function finds? Perhaps with the predicate or so?

You dont have to delete or erase, run it in a while loop, every time you find a occurence, start searching after the found position of the sequence

here is a sample, modify it for your need:

int main () {
  vector<int> myvector;
  vector<int>::iterator it;

  // set some values:        myvector: 10 20 30 40 50 60 70 80 90
  for (int i=1; i<10; i++) myvector.push_back(i*10);

  //INSERTING TWICE TO MAKE TWO SEQUESNCE
  for (int i=1; i<10; i++) myvector.push_back(i*10); myvector: 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90

  // using default comparison:
  int match1[] = {40,50,60,70};
  it = search (myvector.begin(), myvector.end(), match1, match1+4);

  if (it!=myvector.end())
    cout << "match1 found at position " << int(it-myvector.begin()) << endl;
  else
    cout << "match1 not found" << endl;

  //NOTICE I THE FIRST PARAMETER, We ADVANCED IT TO 4 AS THERE ARE 4 INTEGER IN SEQUENCE

  it = search (it+4, myvector.end(), match1, match1+4);

  if (it!=myvector.end())
    cout << "match1 found at position " << int(it-myvector.begin()) << endl;
  else
    cout << "match1 not found" << endl;

  return 0;
}

I think it would be a good idea to define the problem precisely. Are you looking for every sequence that repeats? Including sequences of a single element? Must the repetitions be consecutive, or can other elements separate them? Can the repetition overlap the original? Are you looking for every sequence that repeats, even if two such repetitions overlap each other?


Until the problem is defined precisely enough that these questions and others like them are easily answered, it is difficult to proceed.

Let me try to define it better:

I want to count the vector elements that are part of a repeated sequence.

A repeated sequence, is minimum 2 (until max vectorlength/2) elements. It can overlap, but the overlapping notes should not be counted. So, the largest repetition should be counted (if a sequence of 3 is repeated, count should be 3, not 2x2).

I hope it is a bit clearer like this.

Sorry, it's not clearer. In particular, I do not know what a "repeated sequence" is.

For example, how many repeated sequences are there in

1, 1, 1, 1, 1, 1, 1, 1, 1

? How does your answer follow from your definition?

Hi,

Since I want to count how many elements are part of a repeated sequence, the answer in the case of 1, 1, 1, 1, 1, 1, 1, 1, 1 is 9.

in case of 1 2 3 4 5 6 1 2, it would be 4
in case of 1 2 3 4 5 1 2 3 it would be 6

I realise this problem can become very complex.

My idea to solve it is, first check if there are any repeated sequences of 2. If there are none, no need to search further, next search those of three. Again if there are none, no need to look for bigger sequences, etc.

I think I am going to defer from the search method an write a count method myself, think that might be easier.

its a very easy program, simply sort the vector and then do a linear count.

sort(myvector.begin(), myvector.end());

iter1 = myvector.begin();

while(iter1 is not up to the end)
{
    while(iter1 is not last eliment AND iter1 and iter1+1 same element)
     { 
             iter1++;
             //calculate your count. simple count++ wont work ;) you have to think of this two situate, 112 and 11112 .. notice in first sequence one two 1 and in second sequence 4 1.
     } 
}

Think about it and let me know if you have problem of understanding it.

That wont work, as I loose the sequences 1-2 1-2 for example. It is not just repeated elements.

My solution:

//--------------------------------------------------------------------------------------------------
//subscore 19: no repetition of sequences
//--------------------------------------------------------------------------------------------------

//for each start element
     for (vi = (music.begin()); vi != (music.end()-1); vi++){


         //check if equal to any other element

         for (xi = (vi+1) ; xi != music.end(); xi++){

             j= 0;

             //check all the following elements

             for (yi = xi; yi != music.end(); yi++){

                 if (*(vi+j) == *(yi)){

                     if (j>0){

                         repeated.push_back(vi+j);
                         repeated.push_back(yi);

                         if (j==1){
                             //also save the first element
                             repeated.push_back(vi);
                             repeated.push_back(yi-j);

                         }
                     }
                 }
                 else
                 {
                     break;
                 }
                 j++;
             }
         }
     }



     //count the (unique) notes part of a repeated sequence

     sort(repeated.begin(), repeated.end());
     repeated.erase(std::unique(repeated.begin(), repeated.end()), repeated.end());

     //cout << "\nrepeated: " << repeated.size();
     subscore.push_back((double)(repeated.size())/mlength);

@alwayslearning0: you are absolutely correct, how could I have overlooked that parameter... lol.

Anyway, I seem to have found another solution.

Thanks everybody.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.