The basic framework would look something like this ...
# find selected files in working folder using module glob
# glob takes care of upper/lower case
# glob works with Windows and Unix
import os
import glob
# pick a directory/folder where your .dat data files are
folder = "C:/temp"
# change to that folder
os.chdir(folder)
# process all the .dat data files in that folder
for fname in glob.glob('*.dat'):
fh = open(fname, "r")
# creates a list of data line items as strings
data_list = fh.readlines()
fh.close()
# now process the data_list
for count, item in enumerate(data_list):
#
# do something with each data line item
#
# and save the result
#
# optionally show progress
print( "file % s line %d processed" % (fname, count+1) )
It assumes that the data file may contain one piece of processable data per line.In the future give your thread a more meaningful title, more people will help. This sounds more like a last minute homework problem.
vegaseat
DaniWeb's Hypocrite
5,989 posts since Oct 2004
Reputation Points: 1,345
Solved Threads: 1,417
Some info are missing in your posts:
1) What is the command that you want to run for each file ? I read a few google results about pbs scripts, and may be you want to run commands like qsub script on the command line. So the question is how would you run muscle by hand if you had only one file to process. Also if the job must be completed for 1 file before you start the next file, how do you know that the job is finished ?
2) What are your file names ? are they all in the same directory ?
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
Perhaps you could follow this kind of design (if I understand correctly the meanings of the arguments to blastall)
#!/usr/bin/env python
# myscript.py
import subprocess as sp
from os path import join as pjoin
def name_pairs():
"yield a sequence of pairs (input file name, output file name)"
for i in xrange(1, 2222):
yield ("fam%d.mcl.fas" % i, "output%d" %i)
def path_pairs(input_dir, output_dir):
"yield a sequence of pairs (input path, output path)"
for iname, oname in name_pairs:
yield pjoin(input_dir, iname), pjoin(output_dir, oname)
def commands(input_dir, output_dir, db_path):
"yield the commands to run"
for ipath, opath in path_pairs:
yield "blastall -p blastp -i %s -d %s -o %s" % (ipath, db_path, opath)
def run_commands(input_dir, output_dir, db_path):
for cmd in commands(input_dir, output_dir, db_path):
process = sp.Popen(cmd, shell=True)
process.wait()
def main():
input_dir, output_dir, db_path = sys.argv[-3:]
run_commands(input_dir, output_dir, db_path)
if __name__ == "__main__":
main()
You could run this script in a shell as myscript.py input_dir output_dir db_path Also you should create a fresh directory as output_dir.
Gribouillis
Posting Maven
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691