Greetings all! I'm a new python programmer working for a physics lab. I've got 11 numbered .txt files, all laid out in the exact same format shown below:

ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
55
ITEM: BOX BOUNDS
0 20.4
0 20.4
-0.0001 20.4001
ITEM: ATOMS
0.883194 0.969209 -1.02474
0.0378892 -0.209928 -2.53493
-1.91866 -0.00183195 -0.148284
-2.55231 1.87999 -0.0287856
1.10493 1.81091 -1.28696
2.55788 -1.23035 0.582496

This repeats for however many timesteps I have. Everything above and including ITEM:ATOMS is just a header, used to organize the lists of data. Each coordinate triple represents velocity components of a particular atom. I need to a) take all these velocities for one particular timestep and find the average, b) repeat that for all 11 files for a given timestep, c) iterate over every timestep. The end result I'm trying to achieve is a 2 column list of data where the first column is the filenumber and the 2nd is the average velocity for that file for that timestep. Each timestep in the final output would need to be separated by a blank space. I', not sure how to start attacking this problem, so if any of you could point me in a direction to go, I would be very appreciative!

Many thanks!
Connor

HEre's a few things to get you on your way... first iterating over a list of files using the os module:

import os

my_dir = '/home/AtomData/'
## path example for Windows:
## my_dir = 'C:\\XXX\\AtomData\\' 

my_files = os.listdir(mydir)

for each_file in my_files:
    ## Here we will do our file actions

Note: The windows path looks funny because in Python '\' is an escape character, so you need to escape the escape character to use it normally in a string (if that makes any sense)

So in the past example we see how to list the contents of directory. Note that this command, much like ls or dir will list all contents including sub-directories, which is why it may be appropriate to check the following condition before opening a file if os.path.isfile(each_file): . This way you don't try to open a directory as a file handle!

Next we'll look at iterating over a file. There are many approaches to doing this, I will show you the one I most often use.

f = open( os.path.join( my_dir, each_file ), 'r' )
lines = f.readlines()
f.close()

for each_line in lines:
   ## Do your data parsing here

So the open() function returns a file handle, which points to the file that you provided, and since we provide 'r' for the Mode, the file will open in 'read' mode. Similarly you could use 'w' for write (this will begin a new file if it does not exist or clear the contents of a file if it does exist (so be careful). There is also 'a' for append if you would like to add modify an existing file, then finally there are modifiers like '+' and 'b', which I will leave up to your to research here at the Python docs.

A very useful function os.path.join was demonstrated in the previous example. This is a platform-independent path generator such that you wouldn't need to worry about \\ on Windows or / on *nix systems. Here's some examples of using join on a Windows box:

>>> import os
>>> os.path.join('C:\\', 'My Docs', 'Fun Stuff', 'Foo', 'Bar')
'C:\\My Docs\\Fun Stuff\\Foo\\Bar'
>>> os.path.join('C:\\', 'My Docs', 'Fun Stuff', 'Foo', 'my_data.txt')
'C:\\My Docs\\Fun Stuff\\Foo\\my_data.txt'
>>>

As you can see this is what we used when opening the file, which provided us with an absolute path. This way you don't need to worry about your current working directory (which tends to get me). Anyway, sorry for the long post but I hope something in here has helped get you on your way. LEt us know if you have more questions!!

Oh man - thanks! That helps a lot, I was brute-forcing my files open by concatenating parts of strings that I defined and blech. It was a mess. I do still have one question, however. Getting the computation done on a particular part of a file is easy, but how would I write the result of that computation to a specific line in an output file, based on the file it came from? For example, the first computation from file1 would be on line 0, the 2nd would be on line 11, 3rd on line 22, and the first computation from file2 would be on line1, 2nd on line12, and 3rd on line 23...etc.

Again, Thanks A MILLION for your input so far! Very helpful!

Hmm... there is no way to simply tell Python to insert a line at a specific "index" of lines (that I know of, although it would be a great way to contribute to the on-going Python development!!)... We could go about this two ways:

Approach 1: keep each line that we want to write to output in a list (this is the format in which using readlines() returns our data). That way, we can make use of the list's insert([index],[value]) function.
When you are done with your calculations, you can iterate over the list and write each line back to an output file.

Approach 2 (much more complicated): Open the output file in append mode, and using a combination of seek() and tell() commands [tell returns an int that represents the position of where the file handler is pointing within the file], [seek moves the file handler to a specific spot within the file (ie, seek(0) returns to the beginning of the file)] we could actively modify the file. Don't forget to flush() !! [flush writes and changes to the file that are currently stored in the buffer]

I would suggest Approach number 1, as number 2 has too much potential for bugs and silly mistakes! I hope it makes sense to you what I've suggested and if not I will clarify and provide an example if you don't know how to work with lists.

This article has been dead for over six months. Start a new discussion instead.