We're a community of 1076K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,075,692 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

Python - Remove current and following 3 lines when two specified words in t

hi experts, I have a new quesiton following what I asked before:
http://www.daniweb.com/software-development/python/threads/436159/python-remove-lines-when-two-specified-words-in-the-same-line
I am still new python user, and hope and get your help again.

my task is to do the following
Python Remove current and following 3 lines when two specified words in the same line
I have a test.txt like this:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    2.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BEUSLO    3.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

hope the output file to be:

BNBCD     1.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00
BNBCD     4.00000000E+00  3.00000000E+00  0.00000000E+00  0.00000000E+00
          1.00000000E+00  1.00000000E+00  0.00000000E+00  2.00000000E+00

is there any one can help? in the last discusstion, you helped and saying that

target = '4.00000000E+00'
remove_next = False
with open('test1.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            if remove_next:
                remove_next = False
                print 'removed next:\t', line,
            elif line.startswith('BNBCD') and target in line:
                remove_next = True
                print 'removed:\t', line,
            else:
                outfile.write(line)
4
Contributors
8
Replies
1 Day
Discussion Span
5 Months Ago
Last Updated
13
Views
bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0
Solved Threads: 1
Skill Endorsements: 0
bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0
Solved Threads: 1
Skill Endorsements: 0

I have a code now, but it seems python runs super slow if size of remove_index[] is more than 100.
any idea?

here is the code, thank you!

target = '3.00000000E+00'
print '\n' *100
remove_next = False
count=-1
remove_index=[]
with open('input.txt') as infile:
    for line in infile:
        count=count+1
        if line.startswith('BEUSLO') and target in line:
            remove_index.append(count)
            remove_index.append(count+1)
            remove_index.append(count+2)
        else:
            continue
tt=len(remove_index)
count=-1
with open('input.txt') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            count=count+1
            if count in remove_index:
                outfile.write(line)
            else:
                continue
bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0
Solved Threads: 1
Skill Endorsements: 0

since the input.txt file is 6GB huge, so I did not use readlines()

bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0
Solved Threads: 1
Skill Endorsements: 0

The speed of reading and writing huge files is limited by your hardware.

sneekula
Nearly a Posting Maven
2,483 posts since Oct 2006
Reputation Points: 1,000
Solved Threads: 231
Skill Endorsements: 2

yes, but is there any other method to accelerate it or make the code better

bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0
Solved Threads: 1
Skill Endorsements: 0

Also, you read the file twice. If you want to use your existing code, use a set or dictionary for remove_index instead of a list, as the time is taken up by sequential lookups in "if count in remove_index:". Or use instead

recs_to_write=0

with open('input.txt') as infile:
    for line in infile:
        if line.startswith('BEUSLO') and target in line:
            recs_to_write=3
        if recs_to_write:
            output_file.write(line)
            recs_to_write -= 1
woooee
Posting Maven
2,703 posts since Dec 2006
Reputation Points: 827
Solved Threads: 779
Skill Endorsements: 9

You can also try to read the file by chunks

import io
MB = 1 << 20
# read the file by chunks of 64 megabytes
with io.open('input.txt', mode="r+b", buffering = 64 * MB) as infile:
    # etc
Gribouillis
Posting Maven
Moderator
3,101 posts since Jul 2008
Reputation Points: 1,130
Solved Threads: 761
Skill Endorsements: 11

thank you!
woooee's method is very smart.
also I will try to use Gribouilli's method on huge file to see what's going

bintony.ma_1
Newbie Poster
12 posts since Oct 2012
Reputation Points: 0
Solved Threads: 1
Skill Endorsements: 0

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page rendered in 0.1091 seconds using 2.77MB