Hello everyone, new programmer here. In my class we have an oppurtunity to make up credit on a previous homework by modifying a program if we got any of it wrong. The problem for the progam was:

Problem #4. Write a program to solve the following problem. Find a word in English that contains 3 consecutive pairs of identical letters. For example, committee does not quite work because there is an "i" between the "mm" and "ttee". Mississippi also does not work because of the extra i's between the pairs of "ss", "ss", and "pp". You should use the text file words.txt which is on the COS 125 website.

The answer is here:

def main():
    infile = open("words.txt",'r')
    for w in infile:
        if td(w): print w

    print "Done!"

def td(w):
    for i in range(len(w)-5):
        if w[i] == w[i+1] and \
           (w[i+2] == w[i+3]) and \
           (w[i+4] == w[i+5]):
            return True
    return False


I now need to modify this to get more credit, but I don't really understand what is going on in the td(w) function. How does the length of the word - 5 work? I get the w == stuff, its just the range part that is confusing me.

Here are some notes he gave us:

*A string like s = ...xxyyzz... looks like the following
*s = s[i+1] = x
*s[i+2] = s[i+3] = y
*s[i+4] = s[i+5] = z
*You must have i+5 <= len(s) - 1
*Thus i <= len(s) - 6
*i is in range(len(s) - 5)
*What if len(s) < 6?

I feel like I should be able to get this from those notes, but i'm confused. Any clarification is much appreciated.

In the if statement you are using w[i+5].

This means that you need to make sure the maximum i plus 5 does not go out of bounds for the string. Let me demonstrate:

>>> w = 'aabbccdd'
>>> print len(w), len(w) - 5
8 3
>>> range(3)
[0, 1, 2]
>>> print w[0], w[1], w[2], w[3]
a a b b
>>> print w[2+2], w[2+3], w[2+4], w[2+5]
 c c d d
>>> print w[3+5]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
IndexError: string index out of range

You see that the index of the string can only go up to 7 (the length - 1; and you need to subtract 1 because the first index is 0).

If we try to look at index 8 we get an IndexError. So your code just is making sure we're not breaking the indexing.


commented: Thank you +1

Since you are testing 6 letters total, you want to use "len(w)-5" so you won't run out of letters. To set up an example using "aabbcc" as the simplest example, you only want the for loop to test the first letter. It will test a->a, b->b, and c->c. If you go beyond the first letter in this example, then you will run out of letters before you are done testing, so obviously it will not satisfy the conditions. So, the length of "aabbcc" is 6 and we want to test the first element, or 0 offset, so we want to stop at 6-6 or zero. If there were seven letters, "xaabbcc", we would test length=7 minus 6 = 1, so would test the first 2 letters, (or the zero and 1 offset), against the next 6 and not run out of letters. The for loop goes up to, but does not include the upper number, so it is len(w)-5.

def td(w):
    for num in range(0, len(w)-5):
        print "first test =", w[num], w[num+1]
        print "2nd test  =", w[num+2], w[num+3]
        print "3rd test  =", w[num+4], w[num+5]
        if w[num] == w[num+1] and \
           (w[num+2] == w[num+3]) and \
           (w[num+4] == w[num+5]):
            return True
    return False