hi experts, I have a new quesiton following what I asked before:
http://www.daniweb.com/software-development/python/threads/436159/python-remove-lines-when-two-specified-words-in-the-same-line
I am still new python user, and hope and get your help again.

my task is to do the following
Python Remove current and following 3 lines when two specified words in the same line
I have a test.txt like this:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    2.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    3.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

hope the output file to be:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

is there any one can help? in the last discusstion, you helped and saying that

target = '4.00000000E+00'
remove_next = False
with open('test1.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            if remove_next:
                remove_next = False
                print 'removed next:\t', line,
            elif line.startswith('BNBCD') and target in line:
                remove_next = True
                print 'removed:\t', line,
            else:
                outfile.write(line)

Recommended Answers

All 8 Replies

I have a code now, but it seems python runs super slow if size of remove_index[] is more than 100.
any idea?

here is the code, thank you!

target = '3.00000000E+00'
print '\n' *100
remove_next = False
count=-1
remove_index=[]
with open('input.txt') as infile:
    for line in infile:
        count=count+1
        if line.startswith('BEUSLO') and target in line:
            remove_index.append(count)
            remove_index.append(count+1)
            remove_index.append(count+2)
        else:
            continue
tt=len(remove_index)
count=-1
with open('input.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            count=count+1
            if count in remove_index:
                outfile.write(line)
            else:
                continue

since the input.txt file is 6GB huge, so I did not use readlines()

The speed of reading and writing huge files is limited by your hardware.

yes, but is there any other method to accelerate it or make the code better

Also, you read the file twice. If you want to use your existing code, use a set or dictionary for remove_index instead of a list, as the time is taken up by sequential lookups in "if count in remove_index:". Or use instead

recs_to_write=0

with open('input.txt') as infile:
    for line in infile:
        if line.startswith('BEUSLO') and target in line:
            recs_to_write=3
        if recs_to_write:
            output_file.write(line)
            recs_to_write -= 1
commented: nice +13

You can also try to read the file by chunks

import io
MB = 1 << 20
# read the file by chunks of 64 megabytes
with io.open('input.txt', mode="r+b", buffering = 64 * MB) as infile:
    # etc

thank you!
woooee's method is very smart.
also I will try to use Gribouilli's method on huge file to see what's going

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.