complicated file parsing

Question

miac09 -3 Newbie Poster

15 Years Ago

Hi all,

I'm learning python and hope someone can help me with a sort of tricky search/parse problem. I have a tab deliminited file like this:

2 1 863.8 300.2 0.0131 0.0759 0.1727 0.0879 1.5821
3 1 874.5 289.5 0.0574 0.1292 0.4447 0.2258 1.1846
3 2 874.5 289.5 0.0573 0.0527 1.0857 0.1684 1.1760
4 1 844.3 319.7 0.1306 1.3513 0.0967 1.3976 2.2659
4 2 849.2 314.8 0.1350 1.3332 0.1013 1.3773 1.9990
4 3 846.0 318.0 0.1546 1.4675 0.1053 1.5399 2.1172

2 :PAD_999) ... 1 :PAA_888)

3 :PAB_978) ... 1 :PAA_888)

4 :PCA_098) ... 1 :PAA_888)

I would like to search the 5th column for values >= 1.0, and print the results with the tag info. like this:

4 :PCA_098 1 :PAA_888 1.3513
4 :PCA_098 2 :PAD_999 1.3332
4 :PCA_098 3 :PAB_978 1.4675

python

4 Contributors
7 Replies
155 Views
2 Days Discussion Span
Latest Post 15 Years Ago Latest Post by jice

AutoPython 5 Junior Poster

15 Years Ago

You may want to click the "disable smilies" option at the bottom of the post.

EDIT: after you post you don't have the option to click that button so
if you want an understandable post you're going to have to make a new thread.

Edited 15 Years Ago by AutoPython because: n/a

jice 53 Posting Whiz in Training

15 Years Ago

Here is a solution...

datas="""2 1 863.8 300.2 0.0131 0.0759 0.1727 0.0879 1.5821
3 1 874.5 289.5 0.0574 0.1292 0.4447 0.2258 1.1846
3 2 874.5 289.5 0.0573 0.0527 1.0857 0.1684 1.1760
4 1 844.3 319.7 0.1306 1.3513 0.0967 1.3976 2.2659
4 2 849.2 314.8 0.1350 1.3332 0.1013 1.3773 1.9990
4 3 846.0 318.0 0.1546 1.4675 0.1053 1.5399 2.1172""".split("\n")
tags="""1 PAA_888    
2 PAD_999
3 PAB_978
4 PCA_098""".split("\n")
dtags={}
# load tags in a dictionary :
for l in tags:
    t=l.split()
    dtags[t[0]]=t[1]

# print datas
for l in datas:
    sl=l.split()
    if float(sl[5]) > 1:
        print sl[0], dtags[sl[0]], sl[1], dtags[sl[1]], sl[5]

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

miac09 -3 Newbie Poster · Answer 1 · 2009-09-09T00:57:14+00:00

Not sure what's going on with the faces... they replaced the letter 'P' that is present in the file.

miac09 -3 Newbie Poster · Answer 2 · 2009-09-09T03:21:17+00:00

Thanks! Here it is again - minus the creepy smiley faces

Hi all,

I'm learning python and hope someone can help me with a sort of tricky search/parse problem. I have a tab deliminited file like this:

2 1 863.8 300.2 0.0131 0.0759 0.1727 0.0879 1.5821
3 1 874.5 289.5 0.0574 0.1292 0.4447 0.2258 1.1846
3 2 874.5 289.5 0.0573 0.0527 1.0857 0.1684 1.1760
4 1 844.3 319.7 0.1306 1.3513 0.0967 1.3976 2.2659
4 2 849.2 314.8 0.1350 1.3332 0.1013 1.3773 1.9990
4 3 846.0 318.0 0.1546 1.4675 0.1053 1.5399 2.1172

2 PAD_999) ... 1 PAA_888)

3 PAB_978) ... 1 PAA_888)

4 PCA_098) ... 1 PAA_888)

I would like to search the 5th column for values >= 1.0, and print the results with the tag info. like this:

4 PCA_098 1 PAA_888 1.3513
4 PCA_098 2 PAD_999 1.3332
4 PCA_098 3 PAB_978 1.4675

miac09 -3 Newbie Poster · Answer 3 · 2009-09-11T03:42:04+00:00

Thanks alot!! That gives the right output-

The other tricky parts that I had with this is 1. I need to do this for hundreds of files and 2. I need to parse out the datas and tags information from the same file.

I started doing a script that would search for the column headers (ie."datas") and get the next lines (and maybe store these in a dictionary datas {} ... ? but, I'm getting stuck on this-

>>> data=open("file000").readlines()
>>> ratios = ()
>>> for line in data:
if "datas" in line:
ds = line[line.find("datas") + 6:]
values += ({"datas" : datas.strip()},)

Thanks again for your help- I'm new to python.
-M

For

Here is a solution...

datas="""2 1 863.8 300.2 0.0131 0.0759 0.1727 0.0879 1.5821
3 1 874.5 289.5 0.0574 0.1292 0.4447 0.2258 1.1846
3 2 874.5 289.5 0.0573 0.0527 1.0857 0.1684 1.1760
4 1 844.3 319.7 0.1306 1.3513 0.0967 1.3976 2.2659
4 2 849.2 314.8 0.1350 1.3332 0.1013 1.3773 1.9990
4 3 846.0 318.0 0.1546 1.4675 0.1053 1.5399 2.1172""".split("\n")
tags="""1 PAA_888    
2 PAD_999
3 PAB_978
4 PCA_098""".split("\n")
dtags={}
# load tags in a dictionary :
for l in tags:
    t=l.split()
    dtags[t[0]]=t[1]

# print datas
for l in datas:
    sl=l.split()
    if float(sl[5]) > 1:
        print sl[0], dtags[sl[0]], sl[1], dtags[sl[1]], sl[5]

woooee 814 Nearly a Posting Maven · Answer 4 · 2009-09-11T04:27:16+00:00

I started doing a script that would search for the column headers (ie."datas") and get the next lines (and maybe store these in a dictionary datas {} ... ? but, I'm getting stuck on this

This is a fairly common use for computers and a fairly common question.

def process_data(list_in):
    """ 
    this function will process one group of data because 
    'list_in' starts with the record containing 'datas' plus all 
    records up to, but not including, the next 'datas' record.
    """
    if len(list_in):     ## the first list will be empty
        ##  just print to show what is being processed
        for rec in list_in:
            print rec
    print "-" * 50

data=open("file000").readlines()
ratios = ()
recs_list = []
for line in data:
    if "datas" in line:
        ## process the list before adding this record
        process_data(recs_list)

        ## re_define as an empty list
        recs_list = []

    recs_list.append(line)

##  process the final list since there won't be any
##  "datas" rec at the end
process_data(recs_list)

jice 53 Posting Whiz in Training · Answer 5 · 2009-09-11T17:08:10+00:00

About the hundreds of files, if they are in the same directory, you can use :

import os
for filename in os.listdir(path):
    for line in open(filename): # this avoids the loading of the whole file in memory...
        ...

If they are not, you shoud take a look to os.walk()
If you want to filter your files, you can use the fnmatch module...