Hi all,

I'm learning python and hope someone can help me with a sort of tricky search/parse problem. I have a tab deliminited file like this:

2 1 863.8 300.2 0.0131 0.0759 0.1727 0.0879 1.5821
3 1 874.5 289.5 0.0574 0.1292 0.4447 0.2258 1.1846
3 2 874.5 289.5 0.0573 0.0527 1.0857 0.1684 1.1760
4 1 844.3 319.7 0.1306 1.3513 0.0967 1.3976 2.2659
4 2 849.2 314.8 0.1350 1.3332 0.1013 1.3773 1.9990
4 3 846.0 318.0 0.1546 1.4675 0.1053 1.5399 2.1172

2 :PAD_999) ... 1 :PAA_888)

3 :PAB_978) ... 1 :PAA_888)

4 :PCA_098) ... 1 :PAA_888)

I would like to search the 5th column for values >= 1.0, and print the results with the tag info. like this:

4 :PCA_098 1 :PAA_888 1.3513
4 :PCA_098 2 :PAD_999 1.3332
4 :PCA_098 3 :PAB_978 1.4675

Not sure what's going on with the faces... they replaced the letter 'P' that is present in the file.

You may want to click the "disable smilies" option at the bottom of the post.

EDIT: after you post you don't have the option to click that button so
if you want an understandable post you're going to have to make a new thread.

Thanks! Here it is again - minus the creepy smiley faces

Hi all,

I'm learning python and hope someone can help me with a sort of tricky search/parse problem. I have a tab deliminited file like this:

2 1 863.8 300.2 0.0131 0.0759 0.1727 0.0879 1.5821
3 1 874.5 289.5 0.0574 0.1292 0.4447 0.2258 1.1846
3 2 874.5 289.5 0.0573 0.0527 1.0857 0.1684 1.1760
4 1 844.3 319.7 0.1306 1.3513 0.0967 1.3976 2.2659
4 2 849.2 314.8 0.1350 1.3332 0.1013 1.3773 1.9990
4 3 846.0 318.0 0.1546 1.4675 0.1053 1.5399 2.1172

2 PAD_999) ... 1 PAA_888)

3 PAB_978) ... 1 PAA_888)

4 PCA_098) ... 1 PAA_888)

I would like to search the 5th column for values >= 1.0, and print the results with the tag info. like this:

4 PCA_098 1 PAA_888 1.3513
4 PCA_098 2 PAD_999 1.3332
4 PCA_098 3 PAB_978 1.4675

Here is a solution...

datas="""2 1 863.8 300.2 0.0131 0.0759 0.1727 0.0879 1.5821
3 1 874.5 289.5 0.0574 0.1292 0.4447 0.2258 1.1846
3 2 874.5 289.5 0.0573 0.0527 1.0857 0.1684 1.1760
4 1 844.3 319.7 0.1306 1.3513 0.0967 1.3976 2.2659
4 2 849.2 314.8 0.1350 1.3332 0.1013 1.3773 1.9990
4 3 846.0 318.0 0.1546 1.4675 0.1053 1.5399 2.1172""".split("\n")
tags="""1 PAA_888    
2 PAD_999
3 PAB_978
4 PCA_098""".split("\n")
dtags={}
# load tags in a dictionary :
for l in tags:
    t=l.split()
    dtags[t[0]]=t[1]

# print datas
for l in datas:
    sl=l.split()
    if float(sl[5]) > 1:
        print sl[0], dtags[sl[0]], sl[1], dtags[sl[1]], sl[5]

Thanks alot!! That gives the right output-

The other tricky parts that I had with this is 1. I need to do this for hundreds of files and 2. I need to parse out the datas and tags information from the same file.

I started doing a script that would search for the column headers (ie."datas") and get the next lines (and maybe store these in a dictionary datas {} ... ? but, I'm getting stuck on this-

>>> data=open("file000").readlines()
>>> ratios = ()
>>> for line in data:
if "datas" in line:
ds = line[line.find("datas") + 6:]
values += ({"datas" : datas.strip()},)

Thanks again for your help- I'm new to python.
-M

For

Here is a solution...

datas="""2 1 863.8 300.2 0.0131 0.0759 0.1727 0.0879 1.5821
3 1 874.5 289.5 0.0574 0.1292 0.4447 0.2258 1.1846
3 2 874.5 289.5 0.0573 0.0527 1.0857 0.1684 1.1760
4 1 844.3 319.7 0.1306 1.3513 0.0967 1.3976 2.2659
4 2 849.2 314.8 0.1350 1.3332 0.1013 1.3773 1.9990
4 3 846.0 318.0 0.1546 1.4675 0.1053 1.5399 2.1172""".split("\n")
tags="""1 PAA_888    
2 PAD_999
3 PAB_978
4 PCA_098""".split("\n")
dtags={}
# load tags in a dictionary :
for l in tags:
    t=l.split()
    dtags[t[0]]=t[1]

# print datas
for l in datas:
    sl=l.split()
    if float(sl[5]) > 1:
        print sl[0], dtags[sl[0]], sl[1], dtags[sl[1]], sl[5]

I started doing a script that would search for the column headers (ie."datas") and get the next lines (and maybe store these in a dictionary datas {} ... ? but, I'm getting stuck on this

This is a fairly common use for computers and a fairly common question.

def process_data(list_in):
    """ 
    this function will process one group of data because 
    'list_in' starts with the record containing 'datas' plus all 
    records up to, but not including, the next 'datas' record.
    """
    if len(list_in):     ## the first list will be empty
        ##  just print to show what is being processed
        for rec in list_in:
            print rec
    print "-" * 50

data=open("file000").readlines()
ratios = ()
recs_list = []
for line in data:
    if "datas" in line:
        ## process the list before adding this record
        process_data(recs_list)

        ## re_define as an empty list
        recs_list = []

    recs_list.append(line)

##  process the final list since there won't be any
##  "datas" rec at the end
process_data(recs_list)

About the hundreds of files, if they are in the same directory, you can use :

import os
for filename in os.listdir(path):
    for line in open(filename): # this avoids the loading of the whole file in memory...
        ...

If they are not, you shoud take a look to os.walk()
If you want to filter your files, you can use the fnmatch module...