I need to read a list of names from different files using python and produce the list of names that appear the first time from each file. Help!!
Start by reading a list of names from a single file and print these names in the console. Post your code!
I was doing something similar. I have all my files in a folder, so I was using the glob module and the wildcard (*) symbol. This may help point you in the right direction.
I am writing to an outfile... merge_BLASTP_results.out
for file in glob.glob('/Users/sueparks/BlastP_Results/*'): myfile = open(file,'r') #open each file lines = myfile.readlines() # read lines of each file with open('merge_BLASTP_results.out', 'a') as f: for line in lines: line.strip('\n') f.write(line) # write lines to new file
@Sue_2 You can improve this by opening the ouput file only once. Also, you can use fileinput to iterate over the lines of many files
with open('merge_BLASTP_results.out', 'w') as f: for line in fileinput.input(glob.glob('/Users/sueparks/BlastP_Results/*')): f.write(line) # write lines to new file
* may be a little too permissive, if the directory contains binary files for example. Depending on what you need,
*.dat for example is safer.
So, if I have 200 protein files that start with 'P_1'...and end with '.txt', but I also have another 100 text files that are labeled rna1.txt,rna2.txt,....Can you show me how to exclusively work with the P_1.txt files?? I seen an example with the wild card (*), but I was curious about what you thought??
I seen something like this maybe a week ago...
'P_1*' + '*.txt'
Well, you can use
P_1*.txt to work only with these files. For the
rna files, you can use
rna*.txt or simply
rna* if only
.txt files start with this prefix.
Note that if you know regular expressions (the re module), you can get a regular expression equivalent to your glob pattern by using
fnmatch.translate(), for example
>>> import fnmatch >>> fnmatch.translate('P_1*.txt') 'P_1.*\\.txt\\Z(?ms)'
The re module could be used for more sophisticated filtering.