I need to take a text file of a number of gene sequences in fasta format eg


and put it into:

geneA agctactactacgatcgaacgtagctactactacgatcgaacgtagctac

where all of the sequence is on one line. I can concatenate it in excel for one sequence but i have 200+ to fit into the two column format such that i can use python to open the text file and stuff the txt file into an SQL file. I only need some ideas on how to put the sequences into the two column format.


9 Years
Discussion Span
Last Post by woooee

To take a series of lines in a text file that are separated by newline characters and mash them into a single line of text:

# Read from your input file using readlines(), then:
out_txt = '%s %s' % (my_txt[0], ''.join(my_txt[1:]))

Then you can write the out_txt to your new file. Note that the above code assumes you read your file using readlines into a variable named my_txt

Votes + Comments
nice solution

The above code doesn't strip the newline so you want to use .strip() as well. If the file contains multiple "geneX" codes, you want to append the records to a list until the next "gene" is found. When found, send the list to a function that will concatenate, similiar to the above code example (if you .strip() before appending to the list then you should be able to use the above code as is), and write to the SQL file. Then initialize the list to an empty list and start appending with the current "gene" record. After the loop that is reading the records finishes, you will have to add one more "send the final list to the function" line of code.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.