Hey everyone,

(Hopefully) a rather quick question. I'm currently trying to use regex to search a string for the location of the longest continuous match in a string where the *same* character keeps repeating. It doesn't matter which character it is, as long it is the highest number of consecutive repeats of the same character then I want the position in the multiline string where it occurs.

So as a brief example if I had a string "cataaaaac" I would want re.start() to return "3".

I've been using '+' but it keeps giving me the first instance of a match rather than the longest instance.

Any advice? Thanks.

Edited 6 Years Ago by sphynx_25: n/a

It does the work, but it's not regex.

words = ['cataaaaac', 'poolooo']

for word in words:
    rep = ''
    count = 0
    i = 0
    most = ''
    great = 0
    index = 0
    start = 0
    for char in word:
        if char == rep:
            count += 1
            if count > great:
                most = rep
                great = count
                index = start
        else:
            start = i
            rep = char
            count = 1
        i += 1
    print 'Word:', word
    print 'Most:', most, 'with', great, 'starting at index', index

Cheers and Happy coding

Thanks for the speedy reply Beat_slayer. I'm trying to learn more about libraries and regex looks pretty snazzy, I just want to learn more about its proper usage. I have to say that your solution is very neat though.

Thanks again! I really appreciate your help.

import re

words = ['cataaaaac', 'poolooo']

for word in words:
    longest = 0
    letterlist = set(word)
    for char in letterlist:
        for item in re.findall('%s+' % char, word):
            lenght = len(item)
            if lenght > longest:
                longest = lenght
                sequence = item
    print 'Word:', word
    print 'Most:', sequence[0], 'with', len(sequence), 'starting at index', word.index(sequence)

Cheers and Happy coding

Edited 6 Years Ago by Beat_Slayer: n/a

It does it now, but I'm not a regex expert, maybe they can make other way.

I can only see as searching with regex, catching the longest, then search for that group with the regex and ask the start.

words = ['cataaaaac', 'pooooloo']

for word in words:
    longest = 0
    letterlist = set(word)
    for char in letterlist:
        seq = sorted(re.findall('%s+' % char, word))[-1:]
        sequence = ''.join(seq)
        lenght = len(sequence)
        if lenght > longest:
            longest = lenght
            letters = sequence
    m = re.search(letters, word)
    print 'Word:', word
    print 'Most:', letters[0], 'with', len(letters), 'starting at index', m.start()

Cheers and Happy coding

Edited 6 Years Ago by Beat_Slayer: n/a

Regex I would not use also, but find itertools.groupby often usefull:

import itertools as it
words = ['cataaaaac', 'poolooo']
for test in words:
    groups = ((len(list(letters)), group)
              for group, letters in it.groupby(test, lambda x: x)
              )
    maxnum, letter = max(groups)
    print test, maxnum, '*', letter , 'index: ',test.find(maxnum * letter)

Edited 6 Years Ago by pyTony: n/a

As you probably have guessed by now, regular expression are considered bad style by some and so learning them is possibly a complete waste of time. If you aren't aware of Jamie Zawinski's (in)famous quote
"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
http://regex.info/blog/2006-09-15/247

Edited 6 Years Ago by woooee: n/a

We have related discussion in stackoverflow on when to use regex.

Top answer at the moment says:

No, don't avoid regular expressions. They're actually quite a nifty little tool and will save you a lot of work if you use them wisely.

What you do need to avoid is trying to use it for everything, a malaise that appears to strike those new to regular expressions before they become a little more tempered and a little less enamoured :-)

Edited 6 Years Ago by pyTony: n/a

Eliminated the second search by find by keeping enumerate result and same time eliminated the string method so this works in all sequences (unlike regexp):

from __future__ import print_function
import itertools as it
words = ['cataaaaac', 'poolooo vaudeeee', [1, 3,4,5,5,6,6,6,7,9,9,9,9,9,4,3,2,1]]
for test in words:
    groups = ((list(value), group)
              for group, value in it.groupby(enumerate(test), lambda x: x[1])
              )
    value, letter = max(groups, key = lambda x: len(x[0]))
    index = value[0][0]
    print('In', repr(test),'longest sequence value: ',  letter ,  'index: ', index)

Thanks for all your replies everyone! Unfortunately I've got a bunch of deadlines all at once so I haven't been able to do any experimentation with your fantastic feedback but definitely will do.

Once again, thanks so much. This is incredibly helpful =)

This question has already been answered. Start a new discussion instead.