hi ,

my problem isthat i have two file s with this format

file 1 has two coloumns

protein id geneid

qqqq yyyy

tttt pppp

oooo llll

now i have one other file as cluster file as

cluster 1 : yyyy,pppp
cluster 2 : llll, yyyy,
cluster n : pppp,yyyy,llll

I want to map and find out cluster which belongs to qqqq instead of yyyy,tttt instaed of pppp...... and oooo instead of llll

And how are those clusters. I see that you want to search for the value on the fisrt column of the first file, I just don't know what would be a valid match on the second one.

A cluster that doesn't have some values, tat has some values uin some order....


Can you give exact math formula and/or example with few values and right result for that limited example.

This should do it. Yhere could be some tweaks depending on the data input.

proteins = open('protein.txt').readlines()
clusters = open('cluster.txt').readlines()

for protein in proteins:
    proteinid, geneid = protein.split(' ')
    geneid = geneid.rstrip('\n')
    for cluster in clusters:
        clusterid, genes = cluster.split(':')
        genes = genes.rstrip('\n')
        geneids = []
        geneids += genes.split(',')
        geneids = [x.lstrip() for x in geneids]
        if geneid in geneids:
            print protein.rstrip('\n')
            print cluster.rstrip('\n')
Be a part of the DaniWeb community

We're a friendly, industry-focused community of 1.18 million developers, IT pros, digital marketers, and technology enthusiasts learning and sharing knowledge.