Dear all.

I have a file with lines like this.

query    210    ACTTGGACTC 219
  query    311    ACTTGGACTC 320 ....

From every line, I need to extract the number coming right after 'query' and then the DNA sequence. I currently read each line as a list but was only able to write that line into a file.

for line in filein.readlines():
    if line.startswith('@') or line.startswith('>'):
        continue
    else:
        strline=str(line) 
        if 'query' in strline:
           fileout.write(strline)

.
If I try with print strline[3], I get 'u'. I also checked with .split(). Although I did get the line split, accessing the 3rd element was not possible, since this is not an array. I must be making a mistake as there must be some way to do this. Please help.

for line in filein.readlines():
    if line.startswith('@') or line.startswith('>'):
        continue
    else:
        strline=str(line) # actually this is not necessary, already a string.
        if 'query' in strline:
           tokens = strline.split()
           fileout.write("%s %s\n" % (tokens[1], tokens[2])) # This will write number and dna sequence

Concise version of the code:

with open('filein.txt') as filein:
    with open('fileout.txt', 'w') as fileout:
        for tag, num1, dna, num2 in (thisline.split()
                     for thisline in filein
                     if 'query' in thisline and not(thisline.startswith('@') or thisline.startswith('>'))):
            fileout.write("%s %s\n" % (num1, dna)) # This will write number and dna sequence
This question has already been answered. Start a new discussion instead.