I need to read a list of names from different files using python and produce the list of names that appear the first time from each file. Help!!

Recommended Answers

All 7 Replies

This text should be a good start to read.

Start by reading a list of names from a single file and print these names in the console. Post your code!

I was doing something similar. I have all my files in a folder, so I was using the glob module and the wildcard (*) symbol. This may help point you in the right direction.

  1. I am using python 2.7 with Mac OS. So my path is referencing the path to my folder.
  2. My files ONLY HAVE 1 line....could you post some details about the first line of your files??
  3. I am writing to an outfile... merge_BLASTP_results.out

    for file in glob.glob('/Users/sueparks/BlastP_Results/*'):
        myfile = open(file,'r')  #open each file
        lines = myfile.readlines() # read lines of each file
    
        with open('merge_BLASTP_results.out', 'a') as f:
            for line in lines:
                line.strip('\n')
                f.write(line)  # write lines to new file

@Sue_2 You can improve this by opening the ouput file only once. Also, you can use fileinput to iterate over the lines of many files

with open('merge_BLASTP_results.out', 'w') as f:
    for line in fileinput.input(glob.glob('/Users/sueparks/BlastP_Results/*')):
        f.write(line)  # write lines to new file

Also * may be a little too permissive, if the directory contains binary files for example. Depending on what you need, *.txt
or *.dat for example is safer.

So, if I have 200 protein files that start with 'P_1'...and end with '.txt', but I also have another 100 text files that are labeled rna1.txt,rna2.txt,....Can you show me how to exclusively work with the P_1.txt files?? I seen an example with the wild card (*), but I was curious about what you thought??

I seen something like this maybe a week ago...

'P_1*'  + '*.txt'  

Well, you can use P_1*.txt to work only with these files. For the rna files, you can use rna*.txt or simply rna* if only .txt files start with this prefix.

Note that if you know regular expressions (the re module), you can get a regular expression equivalent to your glob pattern by using fnmatch.translate(), for example

>>> import fnmatch
>>> fnmatch.translate('P_1*.txt')
'P_1.*\\.txt\\Z(?ms)'

The re module could be used for more sophisticated filtering.

NICE!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.