hi experts, I have a new quesiton following what I asked before:
http://www.daniweb.com/software-development/python/threads/436159/python-remove-lines-when-two-specified-words-in-the-same-line
I am still new python user, and hope and get your help again.

my task is to do the following
Python Remove current and following 3 lines when two specified words in the same line
I have a test.txt like this:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    2.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    3.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

hope the output file to be:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

is there any one can help? in the last discusstion, you helped and saying that

target = '4.00000000E+00'
remove_next = False
with open('test1.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            if remove_next:
                remove_next = False
                print 'removed next:\t', line,
            elif line.startswith('BNBCD') and target in line:
                remove_next = True
                print 'removed:\t', line,
            else:
                outfile.write(line)

Edited 4 Years Ago by bintony.ma_1

I have a code now, but it seems python runs super slow if size of remove_index[] is more than 100.
any idea?

here is the code, thank you!

target = '3.00000000E+00'
print '\n' *100
remove_next = False
count=-1
remove_index=[]
with open('input.txt') as infile:
    for line in infile:
        count=count+1
        if line.startswith('BEUSLO') and target in line:
            remove_index.append(count)
            remove_index.append(count+1)
            remove_index.append(count+2)
        else:
            continue
tt=len(remove_index)
count=-1
with open('input.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            count=count+1
            if count in remove_index:
                outfile.write(line)
            else:
                continue

Also, you read the file twice. If you want to use your existing code, use a set or dictionary for remove_index instead of a list, as the time is taken up by sequential lookups in "if count in remove_index:". Or use instead

recs_to_write=0

with open('input.txt') as infile:
    for line in infile:
        if line.startswith('BEUSLO') and target in line:
            recs_to_write=3
        if recs_to_write:
            output_file.write(line)
            recs_to_write -= 1

Edited 4 Years Ago by woooee

Comments
nice

You can also try to read the file by chunks

import io
MB = 1 << 20
# read the file by chunks of 64 megabytes
with io.open('input.txt', mode="r+b", buffering = 64 * MB) as infile:
    # etc

thank you!
woooee's method is very smart.
also I will try to use Gribouilli's method on huge file to see what's going

This article has been dead for over six months. Start a new discussion instead.