1,105,578 Community Members

Confused about reading files

Member Avatar
Rebecca_2
Newbie Poster
11 posts since Sep 2013
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

I have a series of (~950KB) '.txt' output files from a computational chemistry program; Each file will contain the line '****optimisation achieved****' at least once and, depending on the result of the calculation, possibly twice. Does the following code, in which I am trying to find specific lines and print them to a new file, differentiate between the two occurrences of that line?

import os

with open('results.txt', 'a') as writer:
    for file in os.listdir('.'):
        if file.endswith('.out'):
            print(file + ' ', end= ' ', file=writer)
            with open(file, 'r') as reader:
                for line in reader.readlines():
                    s=line.strip()               
                    if s=='**** Optimisation achieved ****':
                        opt='y'                        
                    elif s.startswith('Final energy ='):
                        if opt=='y':
                            print(s + ' ', end=' ', file=writer)            
                    elif s.startswith('Total number of defects'):                        
                        print(s + ' ', end=' ', file=writer)            
                    elif s.startswith('Total charge on defect'):                        
                        print(s + ' ', end=' ', file=writer)            
                    elif s.startswith('Defect centre'):
                        print(s+ ' ', end=' ', file=writer)
                    elif s.startswith('Fractional'):
                        if s!='Fractional coordinates of asymmetric unit :':
                            print(s + ' ', end=' ', file=writer)                    
                    elif s.startswith('Final defect energy'):
                        if opt=='y':
                            print(s, file=writer)      

(Please be patient, I am relatively new to programming)

Member Avatar
Gribouillis
Posting Maven
3,456 posts since Jul 2008
Reputation Points: 1,140 [?]
Q&As Helped to Solve: 884 [?]
Skill Endorsements: 18 [?]
Moderator
 
0
 

I would rather count the optimisation achieved lines while reading:

import os

with open('results.txt', 'a') as writer:
    for file in os.listdir('.'):
        if file.endswith('.out'):
            print(file + ' ', end= ' ', file=writer)
            opt_cnt = 0 # <-- reset counter for each file
            with open(file, 'r') as reader:
                for line in reader.readlines():
                    s=line.strip()               
                    if s=='**** Optimisation achieved ****':
                        opt_cnt += 1 # <-- count those lines     
                    elif s.startswith('Final energy ='):
                        if opt_cnt >= 1: # <-- base decisions on the current value
                            print(s + ' ', end=' ', file=writer)
                    ...
Member Avatar
woooee
Posting Maven
2,798 posts since Dec 2006
Reputation Points: 783 [?]
Q&As Helped to Solve: 836 [?]
Skill Endorsements: 12 [?]
 
1
 

You can send a tuple to startswith

 ## note that opt is never set back to "" or "n"
 if s.startwith(('Total number of defects', 'Total charge on defect', etc.))

Are you saying that you want to stop looking after the second if s=='**** Optimisation achieved ****':

if s=='**** Optimisation achieved ****':
    if opt=='y':
        opt='n'
    else:
        opt="y"            
Member Avatar
Rebecca_2
Newbie Poster
11 posts since Sep 2013
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

sorry, but what prevents the code from only ever reaching the first '****optimisation achieved****' and counting that line repeatedly?

Member Avatar
Gribouillis
Posting Maven
3,456 posts since Jul 2008
Reputation Points: 1,140 [?]
Q&As Helped to Solve: 884 [?]
Skill Endorsements: 18 [?]
Moderator
 
3
 

Here is a complete (simplified) running example. Try it in the directory with the .out files

#!/usr/bin/env python3
#-*-coding: utf8-*-
import os

# split the code into several functions to lighten it

def main():
    with open('results.txt', 'a') as writer:
        for file in os.listdir('.'):
            if not file.endswith('.out'):
                continue
            with open(file, 'r') as reader:
                handle_reader(reader, writer)

def handle_reader(reader, writer):
    print('reading file:', reader.name, file = writer)
    opt_cnt = 0
    for line in reader:
        s=line.strip()               
        if s=='**** Optimisation achieved ****':
            opt_cnt += 1 # <-- count those lines
            print('optimisation line number', opt_cnt, end ='\n', file = writer)
        else:
            pass

if __name__ == '__main__':
    main()
Member Avatar
snippsat
Veteran Poster
1,041 posts since Aug 2008
Reputation Points: 483 [?]
Q&As Helped to Solve: 382 [?]
Skill Endorsements: 10 [?]
 
1
 

sorry, but what prevents the code from only ever reaching the first '****optimisation achieved****' and counting that line repeatedly?

Here is an another way to this.
Here i want to get file name and count back,and also correct line number back.

file_1.txt:

fire
fox
**** Optimisation achieved ****

file_2.txt:

**** Optimisation achieved ****
car ¨
123
**** Optimisation achieved ****
**** Optimisation achieved ****
**** Optimisation achieved ****

file_3.txt:

**** Optimisation achieved ****
hello
world
**** Optimisation achieved ****

So a manual count would be.
file_1 has 1 "Optimisation" count at line 3
file_2 has 4 "Optimisation" count at line 1,4,5,6
file_3 has 2 "Optimisation" count at line 1,4

Some code for this.

import re
from glob import glob

count = {}
line_numb = []
for files in glob('*.txt'):
    #print(files)
    with open(files) as f_in:
        for num, line in enumerate(f_in, 1):
            line = line.strip()
            if '**** Optimisation achieved ****' in line:
                count[f_in.name] = count.get(f_in.name, 0) + 1
                line_numb.append(num)
        line_numb.append(f_in.name)

line_numb = ' '.join(str(i) for i in line_numb)
line_numb = re.split(r'\w+.txt', b)
line_numb.pop()
opt_count = (sorted(count.items(), key=lambda x: x[0]))

print('-'*5)
print(line_numb)
print(opt_count)
print('-'*5)

with open('result.txt', 'w') as f_out:
    for line, count in zip(line_numb, opt_count):
        print('{} has "Optimisation" count of {}\n"Optimisation" occur at line nr: {}\n'.format(count[0], count[1], line.strip()))
        #f_out.write('{} has "Optimisation" count of {}\n"Optimisation" occur at line nr: {}\n'.format(count[0], count[1], line.strip()))


"""Ouptput-->
-----
['3 ', ' 1 4 5 6 ', ' 1 4 ']
[('file_1.txt', 1), ('file_2.txt', 4), ('file_3.txt', 2)]
-----
file_1.txt has "Optimisation" count of 1
"Optimisation" occur at line nr: 3

file_2.txt has "Optimisation" count of 4
"Optimisation" occur at line nr: 1 4 5 6

file_3.txt has "Optimisation" count of 2
"Optimisation" occur at line nr: 1 4
"""
Question Answered as of 4 Months Ago by Gribouillis, snippsat and woooee
You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article