I'm trying to take a file that looks like this:

taxon1
ACCGTGGATC
CCTATTGATT
GGATATTATC
taxon2
TTCATATGTA
GGATTTCATA
GATGGCCCCC

And get it to look like this
taxon1 ACCGTGGATCCCTATTGATTGGATATTATC

I'm using a python script, so far this is what I have:

#!/usr/bin/python

import sys

if len(sys.argv) < 2:
    print "usage: finalmyscript.py infile.txt"
    sys.exit(1)

fname = sys.argv[1]

handle = open(fname, "r")
list = handle.readlines()

for line in list:
    parts = line.rstrip('\n')
    linearr = parts.split()
    combine = ''.join(linearr[0])
    print combine

handle.close()

The script removes the '\n' at the end of each line, but it still won't join the lines all on a single line. Can anyone help with where I'm going wrong?
Thanks!

Recommended Answers

All 8 Replies

Hint ...

s = '''\
taxon1
ACCGTGGATC
CCTATTGATT
GGATATTATC
'''

s2 = ""
for ix, line in enumerate(s.split('\n')):
    line = line.rstrip()
    if ix == 0:
        # add a space
        line += ' '
    s2 += line

print(s2)

''' result ...
taxon1 ACCGTGGATCCCTATTGATTGGATATTATC
'''

If you get rid of lines 14 - 19 you can print the input file as one line with:

print ''.join(list).replace("\n", " ")

Vegaseat:
that would work if I wasn't pulling the input from a file. I can only use the input file as a command for running my file, I can't use any of the verbatim info from the file, like the actual DNA code.

Chris, I used a line like that before, it does bring them together, but the problem is that it then puts all the taxons on one line and I need to split them by taxon.

Thank you guys!

Does the file have more than one taxon? In the example you posted there is only one so all of the solutions are for one only. Try this as a hint, although there are other ways to do it.

handle = open(fname, "r")
all_data = handle.read()
print all_data.split("taxon")

There are 3 taxon total, but I just printed 2 of them.

taxon1
ACCGTGGATC
CCTATTGATT
GGATATTATC
taxon2
TTCATATGTA
GGATTTCATA
GATGGCCCCC

That last comment helped alot! Thank you! The all_data.split got all 3 taxon on one line for me.

You can do something like that ...

''' infile_test.py
data processing from a file

file infile.txt has content ...
taxon1
ACCGTGGATC
CCTATTGATT
GGATATTATC
taxon2
TTCATATGTA
GGATTTCATA
GATGGCCCCC 
'''

fname = "infile.txt"

with open(fname) as fin:
    s2 = ""
    for line in fin:
        line = line.rstrip()
        if "taxon" in line:
            # add a space
            line += ' '
            # might need to adjust this value
            if len(s2) > 10:
                s2 += '\n'
        s2 += line
    print(s2)

''' result ...
taxon1 ACCGTGGATCCCTATTGATTGGATATTATC
taxon2 TTCATATGTAGGATTTCATAGATGGCCCCC
'''

At this point it would be nice to know what your input data is. And what you expect your output data to look like.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.