Member Avatar for Rebecca_2

I have a series of (~950KB) '.txt' output files from a computational chemistry program; Each file will contain the line '****optimisation achieved****' at least once and, depending on the result of the calculation, possibly twice. Does the following code, in which I am trying to find specific lines and print them to a new file, differentiate between the two occurrences of that line?

import os

with open('results.txt', 'a') as writer:
    for file in os.listdir('.'):
        if file.endswith('.out'):
            print(file + ' ', end= ' ', file=writer)
            with open(file, 'r') as reader:
                for line in reader.readlines():
                    s=line.strip()               
                    if s=='**** Optimisation achieved ****':
                        opt='y'                        
                    elif s.startswith('Final energy ='):
                        if opt=='y':
                            print(s + ' ', end=' ', file=writer)            
                    elif s.startswith('Total number of defects'):                        
                        print(s + ' ', end=' ', file=writer)            
                    elif s.startswith('Total charge on defect'):                        
                        print(s + ' ', end=' ', file=writer)            
                    elif s.startswith('Defect centre'):
                        print(s+ ' ', end=' ', file=writer)
                    elif s.startswith('Fractional'):
                        if s!='Fractional coordinates of asymmetric unit :':
                            print(s + ' ', end=' ', file=writer)                    
                    elif s.startswith('Final defect energy'):
                        if opt=='y':
                            print(s, file=writer)      

(Please be patient, I am relatively new to programming)

Recommended Answers

All 5 Replies

I would rather count the optimisation achieved lines while reading:

import os

with open('results.txt', 'a') as writer:
    for file in os.listdir('.'):
        if file.endswith('.out'):
            print(file + ' ', end= ' ', file=writer)
            opt_cnt = 0 # <-- reset counter for each file
            with open(file, 'r') as reader:
                for line in reader.readlines():
                    s=line.strip()               
                    if s=='**** Optimisation achieved ****':
                        opt_cnt += 1 # <-- count those lines     
                    elif s.startswith('Final energy ='):
                        if opt_cnt >= 1: # <-- base decisions on the current value
                            print(s + ' ', end=' ', file=writer)
                    ...

You can send a tuple to startswith

 ## note that opt is never set back to "" or "n"
 if s.startwith(('Total number of defects', 'Total charge on defect', etc.))

Are you saying that you want to stop looking after the second if s=='**** Optimisation achieved ****':

if s=='**** Optimisation achieved ****':
    if opt=='y':
        opt='n'
    else:
        opt="y"            
commented: I forgot about the tuple ! +14
Member Avatar for Rebecca_2

sorry, but what prevents the code from only ever reaching the first '****optimisation achieved****' and counting that line repeatedly?

Here is a complete (simplified) running example. Try it in the directory with the .out files

#!/usr/bin/env python3
#-*-coding: utf8-*-
import os

# split the code into several functions to lighten it

def main():
    with open('results.txt', 'a') as writer:
        for file in os.listdir('.'):
            if not file.endswith('.out'):
                continue
            with open(file, 'r') as reader:
                handle_reader(reader, writer)

def handle_reader(reader, writer):
    print('reading file:', reader.name, file = writer)
    opt_cnt = 0
    for line in reader:
        s=line.strip()               
        if s=='**** Optimisation achieved ****':
            opt_cnt += 1 # <-- count those lines
            print('optimisation line number', opt_cnt, end ='\n', file = writer)
        else:
            pass

if __name__ == '__main__':
    main()

sorry, but what prevents the code from only ever reaching the first '****optimisation achieved****' and counting that line repeatedly?

Here is an another way to this.
Here i want to get file name and count back,and also correct line number back.

file_1.txt:

fire
fox
**** Optimisation achieved ****

file_2.txt:

**** Optimisation achieved ****
car ¨
123
**** Optimisation achieved ****
**** Optimisation achieved ****
**** Optimisation achieved ****

file_3.txt:

**** Optimisation achieved ****
hello
world
**** Optimisation achieved ****

So a manual count would be.
file_1 has 1 "Optimisation" count at line 3
file_2 has 4 "Optimisation" count at line 1,4,5,6
file_3 has 2 "Optimisation" count at line 1,4

Some code for this.

import re
from glob import glob

count = {}
line_numb = []
for files in glob('*.txt'):
    #print(files)
    with open(files) as f_in:
        for num, line in enumerate(f_in, 1):
            line = line.strip()
            if '**** Optimisation achieved ****' in line:
                count[f_in.name] = count.get(f_in.name, 0) + 1
                line_numb.append(num)
        line_numb.append(f_in.name)

line_numb = ' '.join(str(i) for i in line_numb)
line_numb = re.split(r'\w+.txt', b)
line_numb.pop()
opt_count = (sorted(count.items(), key=lambda x: x[0]))

print('-'*5)
print(line_numb)
print(opt_count)
print('-'*5)

with open('result.txt', 'w') as f_out:
    for line, count in zip(line_numb, opt_count):
        print('{} has "Optimisation" count of {}\n"Optimisation" occur at line nr: {}\n'.format(count[0], count[1], line.strip()))
        #f_out.write('{} has "Optimisation" count of {}\n"Optimisation" occur at line nr: {}\n'.format(count[0], count[1], line.strip()))


"""Ouptput-->
-----
['3 ', ' 1 4 5 6 ', ' 1 4 ']
[('file_1.txt', 1), ('file_2.txt', 4), ('file_3.txt', 2)]
-----
file_1.txt has "Optimisation" count of 1
"Optimisation" occur at line nr: 3

file_2.txt has "Optimisation" count of 4
"Optimisation" occur at line nr: 1 4 5 6

file_3.txt has "Optimisation" count of 2
"Optimisation" occur at line nr: 1 4
"""
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.