I need to read a list of names from different files using python and produce the list of names that appear the first time from each file. Help!!

7 Months
Discussion Span
Last Post by Sue_2

I was doing something similar. I have all my files in a folder, so I was using the glob module and the wildcard (*) symbol. This may help point you in the right direction.

  1. I am using python 2.7 with Mac OS. So my path is referencing the path to my folder.
  2. My files ONLY HAVE 1 line....could you post some details about the first line of your files??
  3. I am writing to an outfile... merge_BLASTP_results.out

    for file in glob.glob('/Users/sueparks/BlastP_Results/*'):
        myfile = open(file,'r')  #open each file
        lines = myfile.readlines() # read lines of each file
        with open('merge_BLASTP_results.out', 'a') as f:
            for line in lines:
                f.write(line)  # write lines to new file

Edited by Sue_2


@Sue_2 You can improve this by opening the ouput file only once. Also, you can use fileinput to iterate over the lines of many files

with open('merge_BLASTP_results.out', 'w') as f:
    for line in fileinput.input(glob.glob('/Users/sueparks/BlastP_Results/*')):
        f.write(line)  # write lines to new file

Also * may be a little too permissive, if the directory contains binary files for example. Depending on what you need, *.txt
or *.dat for example is safer.

Edited by Gribouillis


So, if I have 200 protein files that start with 'P_1'...and end with '.txt', but I also have another 100 text files that are labeled rna1.txt,rna2.txt,....Can you show me how to exclusively work with the P_1.txt files?? I seen an example with the wild card (*), but I was curious about what you thought??

I seen something like this maybe a week ago...

'P_1*'  + '*.txt'  

Edited by Sue_2


Well, you can use P_1*.txt to work only with these files. For the rna files, you can use rna*.txt or simply rna* if only .txt files start with this prefix.

Note that if you know regular expressions (the re module), you can get a regular expression equivalent to your glob pattern by using fnmatch.translate(), for example

>>> import fnmatch
>>> fnmatch.translate('P_1*.txt')

The re module could be used for more sophisticated filtering.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.