1,105,625 Community Members

Python - Remove current and following 3 lines when two specified words in t

Member Avatar
bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 1 [?]
Skill Endorsements: 0 [?]
 
0
 

hi experts, I have a new quesiton following what I asked before:
http://www.daniweb.com/software-development/python/threads/436159/python-remove-lines-when-two-specified-words-in-the-same-line
I am still new python user, and hope and get your help again.

my task is to do the following
Python Remove current and following 3 lines when two specified words in the same line
I have a test.txt like this:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    2.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    3.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

hope the output file to be:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

is there any one can help? in the last discusstion, you helped and saying that

target = '4.00000000E+00'
remove_next = False
with open('test1.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            if remove_next:
                remove_next = False
                print 'removed next:\t', line,
            elif line.startswith('BNBCD') and target in line:
                remove_next = True
                print 'removed:\t', line,
            else:
                outfile.write(line)
Member Avatar
bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 1 [?]
Skill Endorsements: 0 [?]
 
0
 
Member Avatar
bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 1 [?]
Skill Endorsements: 0 [?]
 
0
 

I have a code now, but it seems python runs super slow if size of remove_index[] is more than 100.
any idea?

here is the code, thank you!

target = '3.00000000E+00'
print '\n' *100
remove_next = False
count=-1
remove_index=[]
with open('input.txt') as infile:
    for line in infile:
        count=count+1
        if line.startswith('BEUSLO') and target in line:
            remove_index.append(count)
            remove_index.append(count+1)
            remove_index.append(count+2)
        else:
            continue
tt=len(remove_index)
count=-1
with open('input.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            count=count+1
            if count in remove_index:
                outfile.write(line)
            else:
                continue
Member Avatar
bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 1 [?]
Skill Endorsements: 0 [?]
 
0
 

since the input.txt file is 6GB huge, so I did not use readlines()

Member Avatar
sneekula
Nearly a Posting Maven
2,496 posts since Oct 2006
Reputation Points: 917 [?]
Q&As Helped to Solve: 263 [?]
Skill Endorsements: 5 [?]
 
0
 

The speed of reading and writing huge files is limited by your hardware.

Member Avatar
bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 1 [?]
Skill Endorsements: 0 [?]
 
0
 

yes, but is there any other method to accelerate it or make the code better

Member Avatar
woooee
Posting Maven
2,798 posts since Dec 2006
Reputation Points: 783 [?]
Q&As Helped to Solve: 836 [?]
Skill Endorsements: 12 [?]
 
2
 

Also, you read the file twice. If you want to use your existing code, use a set or dictionary for remove_index instead of a list, as the time is taken up by sequential lookups in "if count in remove_index:". Or use instead

recs_to_write=0

with open('input.txt') as infile:
    for line in infile:
        if line.startswith('BEUSLO') and target in line:
            recs_to_write=3
        if recs_to_write:
            output_file.write(line)
            recs_to_write -= 1
Member Avatar
Gribouillis
Posting Maven
3,454 posts since Jul 2008
Reputation Points: 1,140 [?]
Q&As Helped to Solve: 884 [?]
Skill Endorsements: 18 [?]
Moderator
 
1
 

You can also try to read the file by chunks

import io
MB = 1 << 20
# read the file by chunks of 64 megabytes
with io.open('input.txt', mode="r+b", buffering = 64 * MB) as infile:
    # etc
Member Avatar
bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 1 [?]
Skill Endorsements: 0 [?]
 
0
 

thank you!
woooee's method is very smart.
also I will try to use Gribouilli's method on huge file to see what's going

You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: