-1

I need to read a list of names from different files using python and produce the list of names that appear the first time from each file. Help!!

4
Contributors
7
Replies
69
Views
3 Weeks
Discussion Span
Last Post by Sue_2
0

I was doing something similar. I have all my files in a folder, so I was using the glob module and the wildcard (*) symbol. This may help point you in the right direction.

  1. I am using python 2.7 with Mac OS. So my path is referencing the path to my folder.
  2. My files ONLY HAVE 1 line....could you post some details about the first line of your files??
  3. I am writing to an outfile... merge_BLASTP_results.out

    for file in glob.glob('/Users/sueparks/BlastP_Results/*'):
        myfile = open(file,'r')  #open each file
        lines = myfile.readlines() # read lines of each file
    
        with open('merge_BLASTP_results.out', 'a') as f:
            for line in lines:
                line.strip('\n')
                f.write(line)  # write lines to new file

Edited by Sue_2

2

@Sue_2 You can improve this by opening the ouput file only once. Also, you can use fileinput to iterate over the lines of many files

with open('merge_BLASTP_results.out', 'w') as f:
    for line in fileinput.input(glob.glob('/Users/sueparks/BlastP_Results/*')):
        f.write(line)  # write lines to new file

Also * may be a little too permissive, if the directory contains binary files for example. Depending on what you need, *.txt
or *.dat for example is safer.

Edited by Gribouillis

0

So, if I have 200 protein files that start with 'P_1'...and end with '.txt', but I also have another 100 text files that are labeled rna1.txt,rna2.txt,....Can you show me how to exclusively work with the P_1.txt files?? I seen an example with the wild card (*), but I was curious about what you thought??

I seen something like this maybe a week ago...

'P_1*'  + '*.txt'  

Edited by Sue_2

2

Well, you can use P_1*.txt to work only with these files. For the rna files, you can use rna*.txt or simply rna* if only .txt files start with this prefix.

Note that if you know regular expressions (the re module), you can get a regular expression equivalent to your glob pattern by using fnmatch.translate(), for example

>>> import fnmatch
>>> fnmatch.translate('P_1*.txt')
'P_1.*\\.txt\\Z(?ms)'

The re module could be used for more sophisticated filtering.

Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.