I have afile which has entries like 
BIG_CLUSTER106: cluster1150: CUB        CUB     CUB
BIG_CLUSTER106: cluster1627: CUB        Zona_pellucida
BIG_CLUSTER106: cluster1632: CUB        CUB     CUB     CUB     CUB
BIG_CLUSTER106: cluster1814: Kringle    WSC     CUB
BIG_CLUSTER106: cluster2768: CUB        CUB     F5_F8_type_C    F5_F8_type_C    MAM     DUF3481
BIG_CLUSTER106: cluster661: Astacin     CUB     CUB     CUB     CUB
BIG_CLUSTER106: cluster687: CUB PDGF
BIG_CLUSTER106: cluster701: CUB CUB     Zona_pellucida
BIG_CLUSTER106: cluster744: CUB CUB
BIG_CLUSTER106: cluster968: CUB Laminin_EGF
now I have written program  which conver this format to this format
>BIG_CLUSTER106:	[['CUB', 'CUB', 'CUB'], ['CUB', 'Zona_pellucida'], ['CUB', 'CUB', 'CUB', 'CUB', 'CUB'], ['Kringle', 'WSC', 'CUB'], ['CUB', 'CUB', 'F5_F8_type_C', 'F5_F8_type_C', 'MAM', 'DUF3481'], ['Astacin', 'CUB', 'CUB', 'CUB', 'CUB'], ['CUB', 'PDGF'], ['CUB', 'CUB', 'Zona_pellucida'], ['CUB', 'CUB'], ['CUB', 'Laminin_EGF']]
program 
from sys import*
file = open(argv[1],'r')
outfile = open(argv[2],'w')
buffer = []
bigcluster = ''
setlist = []
rec = file.readlines()
for line in rec :
        field = line.split()
        if (bigcluster != field[0]):
                print setlist
                setlist = []
                header = ">"+field[0]#header is the variable caries the values
                print header +"\t",
                #outfile.writelines(header+"\n")
                bigcluster = field[0]
                #setlist = field[2:] 
        setlist.append(field[2:])
        #print setlist
        #setlist = []
        #outfile.writelines(setlist)
file.close()
outfile.close()
but i want to change something so that list of list which i got from this program , set operation on all list to get common string from all the list
like >BIG_CLUSTER106:	'CUB'  from 
>BIG_CLUSTER106:	[['CUB', 'CUB', 'CUB'], ['CUB', 'Zona_pellucida'], ['CUB', 'CUB', 'CUB', 'CUB', 'CUB'], ['Kringle', 'WSC', 'CUB'], ['CUB', 'CUB', 'F5_F8_type_C', 'F5_F8_type_C', 'MAM', 'DUF3481'], ['Astacin', 'CUB', 'CUB', 'CUB', 'CUB'], ['CUB', 'PDGF'], ['CUB', 'CUB', 'Zona_pellucida'], ['CUB', 'CUB'], ['CUB', 'Laminin_EGF']]

Recommended Answers

All 4 Replies

Use set.intersection()

>>> the_list = [['CUB', 'CUB', 'CUB'], ['CUB', 'Zona_pellucida'], ['CUB', 'CUB', 'CUB', 'CUB', 'CUB'], ['Kringle', 'WSC', 'CUB'], ['CUB', 'CUB', 'F5_F8_type_C', 'F5_F8_type_C', 'MAM', 'DUF3481'], ['Astacin', 'CUB', 'CUB', 'CUB', 'CUB'], ['CUB', 'PDGF'], ['CUB', 'CUB', 'Zona_pellucida'], ['CUB', 'CUB'], ['CUB', 'Laminin_EGF']]
>>> set.intersection(*(set(x) for x in the_list))
set(['CUB'])
>>> sorted(set.intersection(*(set(x) for x in the_list)))
['CUB']
from sys import argv
file = open(argv[1],'r')
setlist = []
rec = file.readlines()
nestedlist = []
for line in rec :
        field = line.split('\t')
        #print field
        header = field[0]
        nestedlist = field[1].strip()
        #print a
        print header + "\t",
        print nestedlist + "\t",
        print sorted(set.intersection(*(set(x) for x in nestedlist)))
        #print set.intersection(*(set(x) for x in nestedlist))
        nestedlist = []

when i am implying this program to find out

the_list = [['CUB', 'CUB', 'CUB'], ['CUB', 'Zona_pellucida'], ['CUB', 'CUB', 'CUB', 'CUB', 'CUB'], ['Kringle', 'WSC', 'CUB'], ['CUB', 'CUB', 'F5_F8_type_C', 'F5_F8_type_C', 'MAM', 'DUF3481'], ['Astacin', 'CUB', 'CUB', 'CUB', 'CUB'], ['CUB', 'PDGF'], ['CUB', 'CUB', 'Zona_pellucida'], ['CUB', 'CUB'], ['CUB', 'Laminin_EGF']]

instead of giving

['CUB']

it gives

[]
from sys import argv
file = open(argv[1],'r')
setlist = []
rec = file.readlines()
nestedlist = []
for line in rec :
        field = line.split('\t')
        #print field
        header = field[0]
        nestedlist = field[1].strip()
        #print a
        print header + "\t",
        print nestedlist + "\t",
        print sorted(set.intersection(*(set(x) for x in nestedlist)))
        #print set.intersection(*(set(x) for x in nestedlist))
        nestedlist = []

when i am implying this program to find out

the_list = [['CUB', 'CUB', 'CUB'], ['CUB', 'Zona_pellucida'], ['CUB', 'CUB', 'CUB', 'CUB', 'CUB'], ['Kringle', 'WSC', 'CUB'], ['CUB', 'CUB', 'F5_F8_type_C', 'F5_F8_type_C', 'MAM', 'DUF3481'], ['Astacin', 'CUB', 'CUB', 'CUB', 'CUB'], ['CUB', 'PDGF'], ['CUB', 'CUB', 'Zona_pellucida'], ['CUB', 'CUB'], ['CUB', 'Laminin_EGF']]

instead of giving

['CUB']

it gives

[]

This code is very different from the one you posted before. I suspect that nestedlist is now a string instead of a list of lists. This seems confirmed by the line print nestedlist + "\t", because you can't add a list and a string, so there must be an error in your program. The expression that I wrote applies only to a list of lists having the form that you described before. Try to print repr(nestedlist) to see what it really is.

The program doesn't complain if you pass a string because a string is iterable, its items being characters, and in python, characters are also strings which are also iterable, so, if I write [set(x) for x in "hello"] , the interpreter computes [set(['h']), set(['e']), set(['l']), set(['l']), set(['o'])] . The intersection of these sets is empty, so be careful that set("CUB CUB CUB") is very different from set(["CUB","CUB","CUB"]).

parijat24 You just need to look into sets very well.
set can work on tuples and list but cant mix them . The data must be one thus a list or a tuple. Set checks the data type also.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.