I have a string that looks like:
IiiiiiiiiiiHHHHHHHHHHHHHHHHHHHHHHHHHooooooooooooooooHHHHHHHHHHHHHHHHHHHHiiiiiiiiiiiiiiiiiiHHHHHHHHHHHHHHHHHHHHHHHHHooooooooooooooooHHHHHHHHHHHHHHHHHHHHHHHHHiiiiiiiiiiiiHHHHHHHHHHHHHHHHHHHHoooooooooooooooOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooooHHHHHHHHHHHHHHHHHHHHHHHHHiiiiiiiiiiiiiiiiiiHHHHHHHHHHHHHHHHHHHHooooooooooooooooooooooHHHHHHHHHHHHHHHHHHHHHHHHHiiiiiiiiiiiiHHHHHHHHHHHHHHHHHHHHHHHHHooooooooooooooooooooooHHHHHHHHHHHHHHHHHHHHi

I want to to capture all the segments that have H's in them and return their respective start & stop string positions.

tms = re.compile("H+")
print tms.findall(string)

This will find all the Hs but I cant get the string positions

tms = re.compile("(H+)+")
print tms.search(string).group()

I can get the start and end of match objects that are returned from search() method. But that only gives me ONE match object. I need them all.

How do I do this?

Recommended Answers

All 3 Replies

finditer returns a sequence of match objects

Just a hint:

# use module re to find index values

import re

s = "HHHHHHooooooooooooooooHHHHHHHH"

rcp = re.compile("H+")

print(s)
# index help line
print('0123456789'*3)
print('-'*30)

for item in rcp.findall(s):
    print item
    found = re.search(item, s)
    # span() returns (start, end) end is exclusive
    print found.span()

'''check the hint

HHHHHHooooooooooooooooHHHHHHHH
012345678901234567890123456789
------------------------------
HHHHHH
(0, 6)
HHHHHHHH
(22, 30)

'''

Improvement (necessary):

# use module re to find index values

import re

s = "HHHHHHoooooooHHHoooooooooHHHHHHHH"

rcp = re.compile("H+")

print(s)
# index help line
print('0123456789'*4)
print('-'*40)

find = rcp.findall(s)
new = s
for item in find:
    print item
    rcp2 = re.compile(item)
    found = rcp2.search(new)
    # span() returns (start, end) end is exclusive
    print(found.span())
    # sub item with string of *
    replace = '*'*len(item)
    new =  rcp2.subn(replace, new, 1)[0]
    #print(new, replace)  # test
        

'''check the hint

HHHHHHoooooooHHHoooooooooHHHHHHHH
0123456789012345678901234567890123456789
----------------------------------------
HHHHHH
(0, 6)
HHH
(13, 16)
HHHHHHHH
(25, 33)

'''
commented: clever +15
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.