Find repeated sequence within a vector

Question

dorien 0 Light Poster

14 Years Ago

Hello,

I need to find repeated sequences within a vector.

I have been thinking about using search, as is described here. But this uses an array to search, I don't need to use extra memory, as I am seaching for sequences within the same vector.

Any suggestions on how to do this? Perhaps there is an easier methode than search?

Thanks

c++

5 Contributors
14 Replies
2K Views
6 Days Discussion Span
Latest Post 14 Years Ago Latest Post by dorien

daviddoria 334 Posting Virtuoso

14 Years Ago

Search sounds like a good way to go. I don't know what you mean by it uses "an array to search", but this shouldn't use any extra memory, it looks you just specify a range of indices which is your search patter, and a range of indices in which to search.

Ancient Dragon 5,243 Achieved Level 70

14 Years Ago

You could sort the vector then repeated entries will be easy to find with minimum effort and time. Use an interator if you want to erase duplicates because erase() return the iterator to the next item following the one it erased.

Edited 14 Years Ago by Ancient Dragon because: n/a

alwaysLearning0 39 Junior Poster

14 Years Ago

You dont have to delete or erase, run it in a while loop, every time you find a occurence, start searching after the found position of the sequence

here is a sample, modify it for your need:

int main () {
  vector<int> myvector;
  vector<int>::iterator it;

  // set some values:        myvector: 10 20 30 40 50 60 70 80 90
  for (int i=1; i<10; i++) myvector.push_back(i*10);

  //INSERTING TWICE TO MAKE TWO SEQUESNCE
  for (int i=1; i<10; i++) myvector.push_back(i*10); myvector: 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90

  // using default comparison:
  int match1[] = {40,50,60,70};
  it = search (myvector.begin(), myvector.end(), match1, match1+4);

  if (it!=myvector.end())
    cout << "match1 found at position " << int(it-myvector.begin()) << endl;
  else
    cout << "match1 not found" << endl;

  //NOTICE I THE FIRST PARAMETER, We ADVANCED IT TO 4 AS THERE ARE 4 INTEGER IN SEQUENCE

  it = search (it+4, myvector.end(), match1, match1+4);

  if (it!=myvector.end())
    cout << "match1 found at position " << int(it-myvector.begin()) << endl;
  else
    cout << "match1 not found" << endl;

  return 0;
}

arkoenig 340 Practically a Master Poster

14 Years Ago

I think it would be a good idea to define the problem precisely. Are you looking for every sequence that repeats? Including sequences of a single element? Must the repetitions be consecutive, or can other elements separate them? Can the repetition overlap the original? Are you looking for every sequence that repeats, even if two such repetitions overlap each other?

Until the problem is defined precisely enough that these questions and others like them are easily answered, it is difficult to proceed.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

dorien 0 Light Poster · Answer 1 · 2010-10-29T23:01:21+00:00

Thanks Ancient Dragon,

But I am trying to find sequences of 2, 3, 4, etc subsequent elements.

@daviddoria:
I was referring to the match1 array that is a subvector of myvector in the page I referred to

dorien 0 Light Poster · Answer 2 · 2010-11-03T16:58:23+00:00

Ok,

I implemented the solution from the forum, using the search function.

Thing is now, since I am looking for a REPEATED sequence, search always finds at least one occurance. How can I see if there are two occurences?

Now, the check only does:

if (it!= myvector.end())

but since it always finds at least one occurance, this is not a valid check. Should I remove the search term from the searched vector and insert it again later? That seems a little over the top... Any ideas?

dorien 0 Light Poster · Answer 3 · 2010-11-03T20:53:03+00:00

I still think I might be doing a few redundant things in copying vectors over and forth. But here is my solution, in case people want to learn from it:

//for each length of sequence

     for(i=mlength/2; i>= 2; i--){

     //for(i=3; i>= 2; i--){
         //for each position of the sequence within searchgroup

        for (vi = (notegroup.begin()); vi != (notegroup.end()); vi++){

             searchgroup = notegroup;

             //populate sequence
             sequence.clear();
             sequence.insert(sequence.begin(), vi, vi+i);

            

             //for each sequence
             //create a searchgroup
             //search for sequence

           
             //temp remove first sequence from searchgroup
             xi = search(searchgroup.begin(), searchgroup.end(), sequence.begin(), sequence.end());


             if (xi!=searchgroup.end()){
                 searchgroup.erase(xi, xi+i);
             }
           

             //search for other occurences
             xi = search(searchgroup.begin(), searchgroup.end(), sequence.begin(), sequence.end());

             //search for repeated sequence
             if (xi!=searchgroup.end()){
             

                 //leave sequence deleted from vector
                 notegroup = searchgroup;

             }



         }

     }

dorien 0 Light Poster · Answer 4 · 2010-11-03T22:21:19+00:00

Oeps... I just thought of something. Perhaps by erasing a few elements, I am creating a new sequence. Looks like this approach is falted...

Mmmm... I am starting to think about writing a manual search function.

No way to count how many occurances the search function finds? Perhaps with the predicate or so?

dorien 0 Light Poster · Answer 5 · 2010-11-04T00:16:15+00:00

Let me try to define it better:

I want to count the vector elements that are part of a repeated sequence.

A repeated sequence, is minimum 2 (until max vectorlength/2) elements. It can overlap, but the overlapping notes should not be counted. So, the largest repetition should be counted (if a sequence of 3 is repeated, count should be 3, not 2x2).

I hope it is a bit clearer like this.

arkoenig 340 Practically a Master Poster · Answer 6 · 2010-11-04T00:23:13+00:00

Sorry, it's not clearer. In particular, I do not know what a "repeated sequence" is.

For example, how many repeated sequences are there in

1, 1, 1, 1, 1, 1, 1, 1, 1

? How does your answer follow from your definition?

dorien 0 Light Poster · Answer 7 · 2010-11-04T14:35:41+00:00

Hi,

Since I want to count how many elements are part of a repeated sequence, the answer in the case of 1, 1, 1, 1, 1, 1, 1, 1, 1 is 9.

in case of 1 2 3 4 5 6 1 2, it would be 4
in case of 1 2 3 4 5 1 2 3 it would be 6

I realise this problem can become very complex.

My idea to solve it is, first check if there are any repeated sequences of 2. If there are none, no need to search further, next search those of three. Again if there are none, no need to look for bigger sequences, etc.

I think I am going to defer from the search method an write a count method myself, think that might be easier.

alwaysLearning0 39 Junior Poster · Answer 8 · 2010-11-04T16:14:51+00:00

its a very easy program, simply sort the vector and then do a linear count.

sort(myvector.begin(), myvector.end());

iter1 = myvector.begin();

while(iter1 is not up to the end)
{
    while(iter1 is not last eliment AND iter1 and iter1+1 same element)
     { 
             iter1++;
             //calculate your count. simple count++ wont work ;) you have to think of this two situate, 112 and 11112 .. notice in first sequence one two 1 and in second sequence 4 1.
     } 
}

Think about it and let me know if you have problem of understanding it.

dorien 0 Light Poster · Answer 9 · 2010-11-04T16:33:36+00:00

That wont work, as I loose the sequences 1-2 1-2 for example. It is not just repeated elements.

My solution:

//--------------------------------------------------------------------------------------------------
//subscore 19: no repetition of sequences
//--------------------------------------------------------------------------------------------------

//for each start element
     for (vi = (music.begin()); vi != (music.end()-1); vi++){


         //check if equal to any other element

         for (xi = (vi+1) ; xi != music.end(); xi++){

             j= 0;

             //check all the following elements

             for (yi = xi; yi != music.end(); yi++){

                 if (*(vi+j) == *(yi)){

                     if (j>0){

                         repeated.push_back(vi+j);
                         repeated.push_back(yi);

                         if (j==1){
                             //also save the first element
                             repeated.push_back(vi);
                             repeated.push_back(yi-j);

                         }
                     }
                 }
                 else
                 {
                     break;
                 }
                 j++;
             }
         }
     }



     //count the (unique) notes part of a repeated sequence

     sort(repeated.begin(), repeated.end());
     repeated.erase(std::unique(repeated.begin(), repeated.end()), repeated.end());

     //cout << "\nrepeated: " << repeated.size();
     subscore.push_back((double)(repeated.size())/mlength);

dorien 0 Light Poster · Answer 10 · 2010-11-04T16:35:33+00:00

@alwayslearning0: you are absolutely correct, how could I have overlooked that parameter... lol.

Anyway, I seem to have found another solution.

Thanks everybody.