Suppose I have this textfile:

Generation #1:
trash_jsdbjsabnf
trash_nsdjklfndsnf
trash_jlsndfknsf
...
trash_akjsdlkjasdasd
Game_List:
game = 111
game = 222
game = 333

Generation #2:
trash_jsdbjsabnf
trash_nsdjklfndsnf
trash_jlsndfknsf
...
trash_slajdlaskjdlas
Game_List:
game = 119
game = 262
game = 323

...
...

Generation #500:
trash_jsdbjsabnf
trash_nsdjklfndsnf
trash_jlsndfknsf
...
trash_jkansdklnalsda
Game_List:
game = 323
game = 213
game = 211

I was wondering if there is a simple way to extract only the Generation # and the corresponding list of game under Game_List. That is, ignoring all the trash lines, which literally is about 50,000 lines.

Thank you so much in advance.

check start of line with startswith method. It can take tuple of line starts, not only single string.

try something like this:

data = open("file.txt", 'r')
while True:
    line = data.readline()
    if line.startswith("Generation"):
        while True:
            list = []
            line2 = data.readline()
            if line2.startswith("game"):
                dict[line1] = list.append(line2)
            elif line.startswith("generation"):
                break
            else: break

So my version explicitely would be:

with open("file.txt") as infile:
    interesting = [line for line in infile if line.lower().startswith(('game =', 'generation #'))
print(interesting)

Edited 4 Years Ago by pyTony

Thank you for all of the prompt responses. I have tried both methods (and several more I found) but none worked. Since I'm a one-week-old Pythoner, I'm not sure what the mistakes are. Dilbert's code didn't run although it makes near perfect logical sense to me; and when I removed the last "else:break" part, it reported "TypeError: 'type' object does not support item assignment." And Tony, this might be some stupid mistake that I couldn't see, but the command "print" on the 3rd line reported a syntax error... Any ideas?

ok, thanks Tony, i got it finally! I took your pseudocode too literally. This is cool stuff. Thanks all for the support!

Mine was working code, I would guess (without trying it out), but this line

dict[line1] = list.append(line2)

was small clitch from dilbert, as dict is type and he musted have thought one instance of it, which must be initilized before loop, like

d = dict()
#.... inside loop
d.setdefault(line1, []).append(line2)

Ok, that's why. Thanks.

The textfile I obtained from this code is:

1 game = 111
1 game = 222
1 game = 333
2 game = 121
2 game = 231
2 game = 432

500 game = 321
500 game = 311
500 game = 121
Where the first number is the Generation #.

When I count how many different unique games in each generation, I use this code:

text = open (file.txt)
gen = 1
count = 0
for line in text:
    fields = line.split()
    if int(fields[0])> gen:
        print gen, count
        gen = int(fields[0])
        count = 1
    else:
        count += 1

Which works fine. But I couldn’t find a way to incorporate this code into the original code. It would save me a ton of time if I knew how to do it as I wouldn't have to wait for the computer to print out 500 generations worth it 'games'. Do you have any suggestions?

If you can not assume it is number after last #. You could print in my code:

print('%i generations' % sum('#' in line for line in interesting))
This article has been dead for over six months. Start a new discussion instead.