How to create a csv file with header row

Question

pythonbegin 0 Light Poster

13 Years Ago

Hi all

How can I create a csv file with header. I have a text file with several number of blocks staring from "//" and ending a block with "//". I have attached a sample file.
I want to use first column of this text as a header of csv and append associated values in it. If any of the header missing in a block create new and append.
for example from text below, I want a csv with AC, ID, FA, OS, SF, BS, GE as an header and their values under that header. Could anyone help me with this. I have tried doing this using code at the end. But not getting exactly what I want.

//
AC T876837378768
XX
ID T876837378768
XX
DT 16.09.1996 (created); ewi.
CO Copyright (C), Biobase GmbH.
XX
FA MNG345
XX
OS human, Homo sapiens
OC eukaryota; animalia; metazoa; chordata; vertebrata; tetrapoda; mammalia; eutheria; primates
XX
SF similar to MNG;
XX
FF induced by interferon-alpha (15-30'), inhibited by 2-AP;
XX
BS R02116; AAF$CONS; Quality: 6.
BS R03064; HS$GBP_02; Quality: 6; GBP, G000264; human, Homo sapiens.
XX
DR TRANSPATH: MO000026034.
XX
RN [1]; RE0000446.
RX PUBMED: 1901265.
RA Decker T., Lew D. J., Mirkowitch J., Darnell J. E.
RT Cytoplasmic activation of GAF, an IFN-gamma-regulated DNA-binding factor
RL EMBO J. 10:927-932 (1991).
RN [2]; RE0001471.
RX PUBMED: 1833631.
RA Decker T., Lew D. J., Darnell J. E.
RT Two distinct alpha-interferon-dependent signal transduction pathways may contribute to activation of transcription of the guanylate-binding protein gene
RL Mol. Cell. Biol. 11:5147-5153 (1991).
XX
//

tfid,fa,os,ge,osm,ins,inm = "","","","","","",""
for line in f1 :
    r1 = line.split()
    if line.startswith("ID"):
        tfid = r1[1]
        #print a
    if line.startswith("FA"):
        fa = r1[1]
        #print b
    if line.startswith("OS")and line.endswith("sapiens\n"):
        os = " ".join(r1[1:])
        #print os
    if line.startswith("GE"):
        ge = " ".join(r1[1:3])
        #print ge
    if line.startswith("OS")and line.endswith("Mammalia\n"):
        osm  = r1[1]
         #print c
    if line.startswith("IN") and line.endswith("sapiens.\n"):
        ins ="\t".join(r1[1:3])
        #print g
    if line.startswith("IN") and line.endswith("Mammalia.\n"):
        inm = "\t".join(r1[1:])
    if line.startswith("//"):
        tftable = os+"\t"+tfid+"\t"+fa+"\t"+"\t"+ge+"\t"+osm+"\t"+ins+"\t"+inm+"\n"
        
        
        #tfid,fa,os,ge,osm,ins,inm = "","","","","","",""

python

4 Contributors
8 Replies
1K Views
5 Days Discussion Span
Latest Post 13 Years Ago Latest Post by pythonbegin

Beat_Slayer 17 Posting Pro in Training

13 Years Ago

Can you provide a sample file.

Your sample doesn't even have the GE.

I changed the second BS tag to GE.

f_in = open('blocks.txt').read()
f_out = open('output.csv', 'w')
f_out.write('AC\tID\tFA\tOS\tSF\tBS\tGE\n')

blocks = [x for x in f_in.split('//') if x]
for item in blocks:
    infos = [x for x in item.split('\n') if x and x != 'XX']
    for field in infos:
        if field.startswith('AC'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('ID'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('FA'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('OS'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('SF'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('BS'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('GE'):
            f_out.write('%s\t\n' % field[3:])

f_out.close()

Edited 13 Years Ago by Beat_Slayer because: n/a

TrustyTony 888 pyMod

13 Years Ago

Here is my preprosessing routine for data to put it in dict from where it is simple to output the data to file open as of with of.write()

filename='biodata.txt'
datadict= dict()

with open('data.csv','w') as of:
    data = ((ind,textline.strip().split(' ',1))
            for ind,block in enumerate(open(filename).read().split('//'))
            for textline in block.split('XX')
            if ' ' in textline)
    for ind,(key,info) in data:
        datadict[ind,key]=info.splitlines()
    for d,value in  datadict.items():
        print "datadict%s = %s" % (list(d),''.join(value))
    ## outputing to of here

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

woooee 814 Nearly a Posting Maven · Answer 1 · 2010-08-08T23:35:51+00:00

Python usually has ways of replacing a bunch of if/elif/else statements.

for field in infos:
        if field.startswith('AC'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('ID'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('FA'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('OS'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('SF'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('BS'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('GE'):
            f_out.write('%s\t\n' % field[3:])

    ## ---------  replace with  ----------
    for field in infos:
        test_2 = field[0:2]
        if test_2 in ["AC", "ID", "FA", "OS", "SF", "BS", "GE"]:  
            f_out.write('%s\t\n' % field[3:])

pythonbegin 0 Light Poster · Answer 2 · 2010-08-10T13:22:01+00:00

Hi Some of the blocks do not have GE tag.

It is not giving output. Its showing headers and first row. Header not matched with the entries in the block.

Can you provide a sample file.

Your sample doesn't even have the GE.

I changed the second BS tag to GE.

f_in = open('blocks.txt').read()
f_out = open('output.csv', 'w')
f_out.write('AC\tID\tFA\tOS\tSF\tBS\tGE\n')

blocks = [x for x in f_in.split('//') if x]
for item in blocks:
    infos = [x for x in item.split('\n') if x and x != 'XX']
    for field in infos:
        if field.startswith('AC'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('ID'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('FA'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('OS'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('SF'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('BS'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('GE'):
            f_out.write('%s\t\n' % field[3:])

f_out.close()

Beat_Slayer 17 Posting Pro in Training · Answer 3 · 2010-08-10T23:23:22+00:00

I believe the problem relys on input file.

It works here with the sample file you provided.

Can you provide more data and info.

Cheers

pythonbegin 0 Light Poster · Answer 4 · 2010-08-12T08:44:08+00:00

Hi Please find attached sample file with more data in it. Actual file looks exactly like this with more than 5000 entries.

Thanks.

I believe the problem relys on input file.
It works here with the sample file you provided.
Can you provide more data and info.
Cheers

Beat_Slayer 17 Posting Pro in Training · Answer 5 · 2010-08-12T23:47:28+00:00

It works with your file again. :)

f_in = open('blocks.txt').read()
f_out = open('output.csv', 'w')

f_out.write('AC\tID\tFA\tOS\tSF\tBS\tGE\n')

blocks = [x for x in f_in.split('//') if x]

for item in blocks:
    infos = [x for x in item.split('\n') if x and x != 'XX']
    AC = ''
    ID = ''
    FA = ''
    OS = ''
    SF = ''
    BS = ''
    GE = ''
    for field in infos:
        if field.startswith('AC'):
            AC += ' ' + field[3:]
        elif field.startswith('ID'):
            ID += ' ' + field[3:]
        elif field.startswith('FA'):
            FA += ' ' + field[3:]
        elif field.startswith('OS'):
            OS += ' ' + field[3:]
        elif field.startswith('SF'):
            SF += ' ' + field[3:]
        elif field.startswith('BS'):
            BS += ' ' + field[3:]
        elif field.startswith('GE'):
            GE += ' ' + field[3:]

    f_out.write('%s\t%s\t%s\t%s\t%s\t%s\t%s\n' % (AC, ID, FA, OS, SF, BS, GE))
    
f_out.close()

Cheers and Happy coding

pythonbegin 0 Light Poster · Answer 6 · 2010-08-13T08:48:51+00:00

Hiiii...

Perfect!!!! working.

File in csv format is not giving proper output. Instead of csv txt format working well..

Thanks a lot for your help!

Can you provide a sample file.

Your sample doesn't even have the GE.

I changed the second BS tag to GE.

f_in = open('blocks.txt').read()
f_out = open('output.csv', 'w')
f_out.write('AC\tID\tFA\tOS\tSF\tBS\tGE\n')

blocks = [x for x in f_in.split('//') if x]
for item in blocks:
    infos = [x for x in item.split('\n') if x and x != 'XX']
    for field in infos:
        if field.startswith('AC'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('ID'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('FA'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('OS'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('SF'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('BS'):
            f_out.write('%s\t' % field[3:])
        elif field.startswith('GE'):
            f_out.write('%s\t\n' % field[3:])

f_out.close()