I am fairly new to Python and trying towork on this problem. I want to split the file which contains two seuqence of letters by the blank line that separates them then 'compare' them:

What you need to do: Text file genesequences.txt contains two gene sequences, separated
from each other by an empty line. Write a program that will read the gene sequences in
(make sure to discard the ‘\n’ characters when you read in the gene sequences), and find the longest region that is shared between the sequences that is also homozygous (has “A” and“B” but no “C”). You may assume that the shared regions will be in the same location in
both gene sequences, so you will only have to check regions starting at the same location in
both sequences.

I think you will want to read and store both gene sequences in two variables, and
discard any extraneous characters but A, B and C. (ii) Use a window that starts with a size of1, but increases by 1 for each iteration, and goes upto length of the entire string. In each
iteration, check window-sized regions of the two sequences. If a match is found, and it does
not contain a ‘C’, note the location and length of the match. (iii) Note that if you use an
increasing window size, any subsequent match will be bigger than previous matches.)


Any ideas? Thanks.

Recommended Answers

All 3 Replies

Which part are you struggling with?

Here's how you open a file:

fh = open( 'file.txt' )

how do i split a file that has 3 separate sequences all divided by a blank line-essentially i want each sequence in a different variable

There are many ways to do it; however if it were me this is the approach I would take:

fh = open( 'myfile.txt' )
lines = fh.readlines()
fh.close()

my_lines = [ line.strip() for line in lines if line.strip() ]
if len( my_lines ) == 3:
    var1, var2, var3 = my_lines
else:
    print 'File contains incorrect data:'
    print ''.join( lines )

The file method readlines() returns a list containing each line in a file as an element.

After closing the file handle, I used a "list comprehension" to iterate through each element of lines (each line), strip() off any leading/trailing whitespace (such as \n newline seperators) and then store them if they contained any characters.

Finally I just made sure that I ended up with three variables by checking the length of my lines container. If it contains more or less than three elements, I print it out so that I can check what the contents of the file were. The join() method simply joins each line back together into a string so that you can read it easier when printing.

HTH

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.