Hi,
I have an output file (.txt) from a computational chemistry program. At some point in this file, following an unknown number of iterative steps, the following table will be found:

Comparison of initial and final structures : 

--------------------------------------------------------------------------------
  Parameter   Initial value   Final value   Difference    Units      Percent
--------------------------------------------------------------------------------
    Volume       
    a            
    b             
    c             
    alpha         
    beta         
    gamma         
      1 x         
      1 y          
      1 z          
      2 x          
      2 y          
      2 z          
      3 x          
      3 y          
      3 z          
      4 x          
      4 y          
      4 z          
      5 x          
      5 y          
      5 z          
      6 x          
      6 y          
      6 z          
      7 x          
      7 y          
      7 z          
      8 x          
      8 y          
      8 z          
--------------------------------------------------------------------------------

Is there an easy method for:
1. numbering the lines in the file?
2. finding this table
3. copying the table exactly
4. extracting each row (or rather rows 1-6) into separate files

I know how to open and read a file, [i.e. with open('output.txt', 'r') as f] but am a little bemused by the rest.

I ask as, although this is a very easy 'point and click using a mouse' task for one file, but that would be very tedious and time consuming to do for the several hundred files I actually have.

Any help would be appreciated, although please be patient as I am not fully up to speed with programming yet - particularly with regard to formatting.

Cheers

Rebecca_2
Deleted Member

Please be more specific.

Some ideas:

  1. numbering the lines in the file?
with open('output.txt', 'r') as f
    for line_number, line in enumerate(f):
        pass # do something with the line
  1. finding this table
if line.startswith("Comparison of initial and final structures :"):
    table_begin=True
if table_begin and line.startswith("------"):
    table_end=True
  1. copying the table exactly

Copy to where?

if table_begin and not table_end:
    pass # copy the line to whereever
  1. extracting each row (or rather rows 1-6) into separate files

Open separate files for writing. Donno what is expected to do.

Hi Slate,

Please be more specific.

Can I ask what is ambiguous?
What I have given in my original post is a sample (albeit empty) table: all of my output files will have one occurrence of this table in the same formatting; the only difference will be the numbers contained but, as these are not known and are specific to each file, are useless information here.

Taking each point in turn:
**1. Numbering lines in a file **
I ask as this is potentially useful for and I don't know how to do it.
For instance (as per another of my posts):
The output file will have a line which gives the final (lattice) energy. However, this is only useful if an energy minimum is found which, if so, is designated by the phrase '****optimisation achieved****' appearing three lines above the final energy.
i.e.

     **** Optimisation achieved ****


      Final energy =    -385.41833439 eV

I was thinking about a way of extracting this data without having multiple if/elif statements as I have been doing. So I was thinking, if if number the lines of the file, search for the line '****Optimisation achieved****: if it is found return the line number, then use this in order to extract the final (lattice) energy.

Whilst I am at it, I know there is a feature where I can strip white space from the start and end of a line but, is there a way of removing blank lines entirely?

2. Finding the table
3. Copying the table exactly
When I say copy, I mean into a new text file, 'table.txt', which of course needs creating.
The table gives me important information that I need. However, thinking about it, copying it exactly may not be the most useful of things to do. However, it would be usful to know how to extract a complete table from the file. Again, I was wondering if, potentially, numbering the lines of the file may in fact be useful with this.

if table_begin and line.startswith("------"):
    table_end=True
if table_begin and not table_end:
    pass # copy the line to whereever 

Joining together the bits of code you have posted - won't this simply copy the table headers and not the rows which make up the table (and thus contain the data)?

4. Extracting each row (or rather rows 1-6) into separate files
This is going to be far more useful to me than copying the whole table. As I said towards the end of the original post, whilst this may seem like a simple 'point and click' task for one file, I actually have several hunderd output files and thus would be time consuming - not to mention tedious.
I literally want to split the table by each row into separate text files (or maybe even a csv file) - which I can then use in another program.
If I create 6 textfiles:
'a.txt', 'b.txt', 'c.txt', 'alpha.txt', 'beta.txt', 'gamma.txt'
I then extract the corresponding row from the table I am trying to find/copy along with the output filename. If I iterate this over all my output files, I can then build a potential energy surface etc.

The bits I am asking on here about is the finding and copying of the table; what happens after that is not important here.

Hope this clarifies any ambiguity....

  1. Finding the table
  2. Copying the table exactly

    with open('data.txt') as data:
        data_parts = data.read().split("""Comparison of initial and final structures : 
    
    --------------------------------------------------------------------------------""")
    print data_parts[1]
    

Edited 2 Years Ago by pyTony

First of all:

Can I ask what is ambiguous?

Requirenements are ambiguous, not full and vague.

  • Is there only one of this table in the input file?
  • Ends the input at the end of the given sample?
  • We should number the lines in the file. Every line, specific lines, what to do with the numbers?

And so on.

I have the impression, you are better at writing essays then specifications.

Secondly:
In the first post your requirements seem to be the following.

  • There is an input file called output.txt which ends as the given sample.
  • Find the table given in the example
  • Extract the table without the first 3 line and write it into a file named table.txt
  • All sentences with "or better" or "instead of how I am doing now".

This can be done with PyTony's code, the following way:

with open('output.txt') as fdata:
    data_parts = fdata.read().split("""Comparison of initial and final structures : 

--------------------------------------------------------------------------------""")

with open('table.txt',"w") as ftable:
    ftable.write(data_parts[1])

In the second post these requirements seemed to change into this:

  • There are input files called output1.txt, output2.txt and so on.
  • All input files end as given in the sample in this table.
  • In every input file find this "table" in the file.
  • Find the first line contains only "a" (without leading and trailing whitespace) in the "table" and append this line to a file "a.txt". Create the file if necessary.

That is achieved by this code:

from glob import glob


afile=None
for fname in glob("output*.txt"):
    with open(fname) as fdata:
        table_found=False
        for line in fdata:
            if not line.startswith("Comparison of initial and final structures :"):
                table_found=True
            if table_found and line.strip()=="a":
                if not afile:
                    afile=open("a.txt","w")
                afile.write(line)
                break

if afile: afile.close()
This article has been dead for over six months. Start a new discussion instead.