Hi,

I'm new to python and I am having issues attempting to input data into my code from a text file. The text looks like this:

>INFO> CELLID, #729,
>INFO> 20100520-035248 LightningTable (scale_1)
>INFO> LON,LAT -96.485,34.67,0

datatime, maxref, ref_-10, MaxVIL, TotalVIL, Size(km2), CGDenAVG, CGmaxden, CGCount, FlashCount, FlashDenAVG, MESH
20:03:30:05, 63.5, 59.5, 44.0613, 18091.8, 311.062, 0, 0, 0, 40.5807, 0.0332085, 27.41,
20:03:32:06, 63.5, 60.5, 48.3901, 17427.8, 270.588, 0, 0, 0, 76.9209, 0.0723621, 38.0266,

There are several "cells", and I need to pull out the FlashCount column from each cell.

Thanks.

Drop extra lines from beginning and use my code snippet: http://www.daniweb.com/software-development/python/code/293490

# text based data input with data accessible
# with named fields or indexing
from __future__ import print_function ## Python 3 style printing
from collections import namedtuple
import string

filein = open("sample.dat")

datadict = {}
for line in filein:
    if line.startswith(('>INFO','\n')):
        continue
    headerline = line.lower().replace('-','').replace('(','').replace(')', '') ## lowercase field names Python style
    break
## first non-letter and non-number is taken to be the separator
separator = headerline.strip(string.lowercase + string.digits)[0]
print("Separator is '%s'" % separator)

headerline = [field.strip() for field in headerline.split(separator)]
Dataline = namedtuple('Dataline',headerline)
print ('Fields are:',Dataline._fields,'\n')

for data in filein:
    data = [f.strip() for f in data.rstrip('\n '+separator).split(separator)]
    d = Dataline(*data)
    print(d.flashcount)

Edited 5 Years Ago by pyTony: n/a

Thank you,

Each text file contains multiple cells, and I am interested in the FlashCount separated by cell. Would dropping the first few lines allow me to do that? Sorry I wasn't very clear about that before.

Also, I'm on version 2.4.3 so I can't use namedtuple. Is there something else that I could do this with?

Named tuple is for convenience and allows the column to be variable. If the data is allways at same column you can fix it or you can just count from header the correct column in each cell. Additional complication was caused by unconventional ending of the line with the separator instead of only newline.

filein = open("sample.dat")

for line in filein:
    if line.startswith(('>INFO','\n')):
        print(line.rstrip())
        continue
    headerline = line.split(', ')
    fieldno = headerline.index('FlashCount')
    break

for data in filein:
    d = data.split(', ')[fieldno]
    print(d)

filein.close()

Edited 5 Years Ago by pyTony: simplified

Thank you,

This has helped tremendously! I have managed to get this to work for a text file containing only 1 cell. Next, I want to get this to work with a text file containing multiple cells. If my data looks like:

>INFO> CELLID, #763,
>INFO> 20100520-035248 LightningTable (scale_1)
>INFO> LON,LAT -93.7,37.78,0

datatime, maxref, ref_-10, MaxVIL, TotalVIL, Size(km2), CGDenAVG, CGmaxden, CGCount, FlashCount, FlashDenAVG, MESH
20:03:42:29, 47, 40.5, 2.99706, 522.765, 383.863, -99900, -99900, -99900, -99900, -99900, 0.985357,
20:03:44:33, 49.5, 44, 3.88048, 807.916, 465.574, -99900, -99900, -99900, -99900, -99900, 2.5169,

>INFO> CELLID, #729,
>INFO> 20100520-035248 LightningTable (scale_1)
>INFO> LON,LAT -96.485,34.67,0

datatime, maxref, ref_-10, MaxVIL, TotalVIL, Size(km2), CGDenAVG, CGmaxden, CGCount, FlashCount, FlashDenAVG, MESH
20:03:30:05, 63.5, 59.5, 44.0613, 18091.8, 311.062, 0, 0, 0, 40.5807, 0.0332085, 27.41,
20:03:32:06, 63.5, 60.5, 48.3901, 17427.8, 270.588, 0, 0, 0, 76.9209, 0.0723621, 38.0266,

I was thinking I could somehow split the file up by searching for #'s, and then applying the bit of code I have to read a single cell. Is that a sound way of doing this? If so, how would I go about doing this?

Line 11 already has checking for info line beginning the block, which checks your start of record, if you put all the lines (3-13) in proper loop and correct break from the for loop lines 11-13. That is your job. Of course you should save the data in loop instead of printing it, probably with the cellid given in first info line as key to dictionary.

Edited 5 Years Ago by pyTony: n/a

This article has been dead for over six months. Start a new discussion instead.