Hi Guys,
This seems easy, it just doesnt seem to work.
I want to:
loop through each file in a directory.
each file contains a number of lines like this ">XXXXX"
I want to extract each line that contains this,move it to a new file labelled almost identical to the old.

so basically: fam0.aln has 10000 lines containing just">XXXX", and other lines, I want Nfam0.aln to just contain the 10000 ">XXXXX" lines and no other stuff up to fam199.aln which will be moved to Nfam199.aln

I came up with this:

#!/bin/usr/env python

import os
import sys
import glob

count = 0
file_list = glob.glob(fam*.aln)
for file in file_list:
        fileOPEN = open(file, 'r')
        for line in fileOPEN:
                  handle = "Nfam" + str(count) + ".aln"
                  test = open(handle, 'a')
                  if line[0]==">":
                                    test.write(line +'\n')
                  count+=1

the problem is it doesnt move on to Nfam1.aln.....it seems to jsut put everything from fam0.aln - fam199.aln into one file, Nfam0.aln

Edited 3 Years Ago by Nick Evan: Fixed formatting

in my script, i have actually indented, i literally saw the CODE button 2 seconds ago!

Edited 4 Years Ago by happygeek: fixed formatting

Why didn't you go back and edit your post? It's impossible to read Python without indentation, you know.

Regardless, I found your problem and fixed it up for you.

#!/bin/usr/env python

import os
import sys
import glob

count = 0
file_list = glob.glob("fam*.aln")
for file in file_list:
    fileOPEN = open(file, 'r')
    handle = "Nfam" + str(count) + ".aln"
    test = open(handle, 'w')
    for line in fileOPEN:
        if line[0]==">":
            test.write(line)
    test.close()
    fileOPEN.close()
    count+=1

You put your handle opener in the wrong spot, it should have been right there next to where you open the parent file. Of course, without seeing the actual indentation, I have no idea what else could have been wrong.

This article has been dead for over six months. Start a new discussion instead.