Python - Remove current and following 3 lines when two specified words in t

Question

bintony.ma_1 0 Newbie Poster

12 Years Ago

hi experts, I have a new quesiton following what I asked before:
http://www.daniweb.com/software-development/python/threads/436159/python-remove-lines-when-two-specified-words-in-the-same-line
I am still new python user, and hope and get your help again.

my task is to do the following
Python Remove current and following 3 lines when two specified words in the same line
I have a test.txt like this:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    2.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    3.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

hope the output file to be:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

is there any one can help? in the last discusstion, you helped and saying that

target = '4.00000000E+00'
remove_next = False
with open('test1.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            if remove_next:
                remove_next = False
                print 'removed next:\t', line,
            elif line.startswith('BNBCD') and target in line:
                remove_next = True
                print 'removed:\t', line,
            else:
                outfile.write(line)

python

Edited 12 Years Ago by bintony.ma_1

4 Contributors
8 Replies
454 Views
1 Day Discussion Span
Latest Post 12 Years Ago Latest Post by bintony.ma_1

All 8 Replies

woooee 814 Nearly a Posting Maven

12 Years Ago

Also, you read the file twice. If you want to use your existing code, use a set or dictionary for remove_index instead of a list, as the time is taken up by sequential lookups in "if count in remove_index:". Or use instead

recs_to_write=0

with open('input.txt') as infile:
    for line in infile:
        if line.startswith('BEUSLO') and target in line:
            recs_to_write=3
        if recs_to_write:
            output_file.write(line)
            recs_to_write -= 1

Edited 12 Years Ago by woooee

Gribouillis commented: nice +13

Gribouillis 1,391 Programming Explorer

12 Years Ago

You can also try to read the file by chunks

import io
MB = 1 << 20
# read the file by chunks of 64 megabytes
with io.open('input.txt', mode="r+b", buffering = 64 * MB) as infile:
    # etc

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

bintony.ma_1 0 Newbie Poster · Answer 1 · 2012-12-03T07:06:36+00:00

hi experts, I have a new quesiton following what I asked before:
http://www.daniweb.com/software-development/python/threads/436159/python-remove-lines-when-two-specified-words-in-the-same-line

bintony.ma_1 0 Newbie Poster · Answer 2 · 2012-12-03T15:04:59+00:00

I have a code now, but it seems python runs super slow if size of remove_index[] is more than 100.
any idea?

here is the code, thank you!

target = '3.00000000E+00'
print '\n' *100
remove_next = False
count=-1
remove_index=[]
with open('input.txt') as infile:
    for line in infile:
        count=count+1
        if line.startswith('BEUSLO') and target in line:
            remove_index.append(count)
            remove_index.append(count+1)
            remove_index.append(count+2)
        else:
            continue
tt=len(remove_index)
count=-1
with open('input.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            count=count+1
            if count in remove_index:
                outfile.write(line)
            else:
                continue

bintony.ma_1 0 Newbie Poster · Answer 3 · 2012-12-03T16:22:52+00:00

since the input.txt file is 6GB huge, so I did not use readlines()

sneekula 969 Nearly a Posting Maven · Answer 4 · 2012-12-03T16:34:05+00:00

The speed of reading and writing huge files is limited by your hardware.

bintony.ma_1 0 Newbie Poster · Answer 5 · 2012-12-03T16:42:28+00:00

yes, but is there any other method to accelerate it or make the code better

bintony.ma_1 0 Newbie Poster · Answer 6 · 2012-12-04T07:13:33+00:00

thank you!
woooee's method is very smart.
also I will try to use Gribouilli's method on huge file to see what's going

Python - Remove current and following 3 lines when two specified words in t

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers