python

Question

Abdullah_17 0 Newbie Poster

7 Years Ago

I need to read a list of names from different files using python and produce the list of names that appear the first time from each file. Help!!

python

4 Contributors
7 Replies
354 Views
2 Days Discussion Span
Latest Post 7 Years Ago Latest Post by Sue_2

Gribouillis 1,391 Programming Explorer

7 Years Ago

@Sue_2 You can improve this by opening the ouput file only once. Also, you can use fileinput to iterate over the lines of many files

with open('merge_BLASTP_results.out', 'w') as f:
    for line in fileinput.input(glob.glob('/Users/sueparks/BlastP_Results/*')):
        f.write(line)  # write lines to new file

Also * may be a little too permissive, if the directory contains binary files for example. Depending on what you need, *.txt
or *.dat for example is safer.

Edited 7 Years Ago by Gribouillis

Gribouillis 1,391 Programming Explorer

7 Years Ago

Well, you can use P_1*.txt to work only with these files. For the rna files, you can use rna*.txt or simply rna* if only .txt files start with this prefix.

Note that if you know regular expressions (the re module), you can get a regular expression equivalent to your glob pattern by using fnmatch.translate(), for example

>>> import fnmatch
>>> fnmatch.translate('P_1*.txt')
'P_1.*\\.txt\\Z(?ms)'

The re module could be used for more sophisticated filtering.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

ddanbe 2,724 Professional Procrastinator Featured Poster · Answer 1 · 2017-07-30T03:51:22+00:00

ddanbe 2,724 Professional Procrastinator

7 Years Ago

This text should be a good start to read.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 2 · 2017-07-30T07:46:18+00:00

Start by reading a list of names from a single file and print these names in the console. Post your code!

Sue_2 0 Newbie Poster · Answer 3 · 2017-07-31T15:49:27+00:00

I was doing something similar. I have all my files in a folder, so I was using the glob module and the wildcard (*) symbol. This may help point you in the right direction.

I am using python 2.7 with Mac OS. So my path is referencing the path to my folder.
My files ONLY HAVE 1 line....could you post some details about the first line of your files??

I am writing to an outfile... merge_BLASTP_results.out

for file in glob.glob('/Users/sueparks/BlastP_Results/*'):
    myfile = open(file,'r')  #open each file
    lines = myfile.readlines() # read lines of each file

    with open('merge_BLASTP_results.out', 'a') as f:
        for line in lines:
            line.strip('\n')
            f.write(line)  # write lines to new file

Sue_2 0 Newbie Poster · Answer 4 · 2017-08-01T13:44:22+00:00

So, if I have 200 protein files that start with 'P_1'...and end with '.txt', but I also have another 100 text files that are labeled rna1.txt,rna2.txt,....Can you show me how to exclusively work with the P_1.txt files?? I seen an example with the wild card (*), but I was curious about what you thought??

I seen something like this maybe a week ago...

'P_1*'  + '*.txt'

Sue_2 0 Newbie Poster · Answer 5 · 2017-08-01T18:48:58+00:00

Sue_2 0 Newbie Poster

7 Years Ago

NICE!