Hi All,
I'm Trying to solve this puzzle for my script:

Let's say we have 3 lists, A,B and C with the following elements
'A' = [5,3,1]
'B' = [6,5,2]
'C' = [2,0,-2]

I want to go through all 27 combination sequentially, looking for a combination that is 'valid' AND has the highest sum. Let's say that A=3 B=2 and C = 2 is valid. Testing all 27 and computing the score (sum) will work, but is computationally too expensive for larger problems.

I mean to say that the program should test the highest combination first:
A=5 , B=6, C= 2 then A=5 B=5 C=2 etc

Thanks for your help!

Something like this:

from itertools import product
a = [5,3,1]
b = [6,5,2]
c = [2,0,-2]

def valid(*args):
    """ define the requirement here"""
    print('Checking: ', args)
    return args[0] == min(args)

print(next(comb for comb in sorted(product(a, b, c), key=sum, reverse=True) if valid(*comb)))

Edited 5 Years Ago by pyTony: n/a

one more thing, let's say now i got list A,B,C,....,N the number is not always the same. How can I put that in the product() function? it does not accept a list with [a,b,c,d] as input.

I've just tested this:
print next(comb for comb in sorted(product(*all), key=sum, reverse=True) if valid(*comb))

and it works! can anyone explain me what exactly the asterisk means here? I get what it does, but not exactly I'm afraid..

Actually I'm still stuck I guess I need to take it one step further, but I don't know how..

In reality this is my situation, I've got a dictionary containing dictionaries by position which indicate the distribution of nucleotides, in this case for the combination of
.
So position 0 has a A and a G, hence both are at 0.5
{0: {'A': 0.5, 'C': 0, 'T': 0, 'G': 0.5}, 1: {'A': 0, 'C': 0.5, 'T': 0.5, 'G': 0}, 2: {'A': 0.5, 'C': 0, 'T': 0, 'G': 0.5}, 3: {'A': 0, 'C': 0.5, 'T': 0.5, 'G': 0}}
I can make lists out of the above dictionaries, in such a way:
[[0.5, 0, 0, 0.5], [0, 0.5, 0.5, 0], [0.5, 0, 0, 0.5], [0, 0.5, 0.5, 0]]

and then do the above product() trick. The thing is that I'm interested in obtaining the next combination of nucleotides (A,C,T,G) which complements the above nucleotides as good as possible. this new combination has to be validated according to other criteria. My quest hence is, how do I sequentially validate four-letter words in the order that I would obtain if I used the above function for the nucleotide composition per position?

IE I want to see something like:
CACA first, then CACG etc. untill I reach GTGT which is the least desirable..

Actually I'm still stuck I guess I need to take it one step further, but I don't know how..

In reality this is my situation, I've got a dictionary containing dictionaries by position which indicate the distribution of nucleotides, in this case for the combination of
.
So position 0 has a A and a G, hence both are at 0.5
{0: {'A': 0.5, 'C': 0, 'T': 0, 'G': 0.5}, 1: {'A': 0, 'C': 0.5, 'T': 0.5, 'G': 0}, 2: {'A': 0.5, 'C': 0, 'T': 0, 'G': 0.5}, 3: {'A': 0, 'C': 0.5, 'T': 0.5, 'G': 0}}
I can make lists out of the above dictionaries, in such a way:
[[0.5, 0, 0, 0.5], [0, 0.5, 0.5, 0], [0.5, 0, 0, 0.5], [0, 0.5, 0.5, 0]]

and then do the above product() trick. The thing is that I'm interested in obtaining the next combination of nucleotides (A,C,T,G) which complements the above nucleotides as good as possible. this new combination has to be validated according to other criteria. My quest hence is, how do I sequentially validate four-letter words in the order that I would obtain if I used the above function for the nucleotide composition per position?

IE I want to see something like:
CACA first, then CACG etc. untill I reach GTGT which is the least desirable..

All you need to do is to compute the preferred order for each position:

from __future__ import print_function
import itertools

distribution = {0: {'A': 0.5, 'C': 0, 'T': 0, 'G': 0.5}, 1: {'A': 0, 'C': 0.5, 'T': 0.5, 'G': 0}, 2: {'A': 0.5, 'C': 0, 'T': 0, 'G': 0.5}, 3: {'A': 0, 'C': 0.5, 'T': 0.5, 'G': 0}}

nucs = "ACTG"
nuc_order = dict((n, i) for (i, n) in enumerate(nucs))

def sort_by_scores(score_dict):
    return "".join(sorted(score_dict, key = lambda n: (score_dict[n], nuc_order[n])))

preferences = list(sort_by_scores(d) for (i, d) in sorted(distribution.items()))

print("preference for each position:", preferences, "\n")

for item in itertools.product(*preferences):
    print("".join(item), end=" ")
    
print("")

""" my output -->
preference for each position: ['CTAG', 'AGCT', 'CTAG', 'AGCT']

CACA CACG CACC CACT CATA CATG CATC CATT CAAA CAAG CAAC CAAT CAGA CAGG CAGC CAGT CGCA CGCG CGCC CGCT CGTA CGTG CGTC CGTT CGAA CGAG CGAC CGAT CGGA CGGG CGGC CGGT CCCA CCCG CCCC CCCT CCTA CCTG CCTC CCTT CCAA CCAG CCAC CCAT CCGA CCGG CCGC CCGT CTCA CTCG CTCC CTCT CTTA CTTG CTTC CTTT CTAA CTAG CTAC CTAT CTGA CTGG CTGC CTGT TACA TACG TACC TACT TATA TATG TATC TATT TAAA TAAG TAAC TAAT TAGA TAGG TAGC TAGT TGCA TGCG TGCC TGCT TGTA TGTG TGTC TGTT TGAA TGAG TGAC TGAT TGGA TGGG TGGC TGGT TCCA TCCG TCCC TCCT TCTA TCTG TCTC TCTT TCAA TCAG TCAC TCAT TCGA TCGG TCGC TCGT TTCA TTCG TTCC TTCT TTTA TTTG TTTC TTTT TTAA TTAG TTAC TTAT TTGA TTGG TTGC TTGT AACA AACG AACC AACT AATA AATG AATC AATT AAAA AAAG AAAC AAAT AAGA AAGG AAGC AAGT AGCA AGCG AGCC AGCT AGTA AGTG AGTC AGTT AGAA AGAG AGAC AGAT AGGA AGGG AGGC AGGT ACCA ACCG ACCC ACCT ACTA ACTG ACTC ACTT ACAA ACAG ACAC ACAT ACGA ACGG ACGC ACGT ATCA ATCG ATCC ATCT ATTA ATTG ATTC ATTT ATAA ATAG ATAC ATAT ATGA ATGG ATGC ATGT GACA GACG GACC GACT GATA GATG GATC GATT GAAA GAAG GAAC GAAT GAGA GAGG GAGC GAGT GGCA GGCG GGCC GGCT GGTA GGTG GGTC GGTT GGAA GGAG GGAC GGAT GGGA GGGG GGGC GGGT GCCA GCCG GCCC GCCT GCTA GCTG GCTC GCTT GCAA GCAG GCAC GCAT GCGA GCGG GCGC GCGT GTCA GTCG GTCC GTCT GTTA GTTG GTTC GTTT GTAA GTAG GTAC GTAT GTGA GTGG GTGC GTGT
"""

edit: see http://docs.python.org/tutorial/controlflow.html#unpacking-argument-lists for explanations about asterisks in function calls.

edit; removed a bug at line 10

Edited 5 Years Ago by Gribouillis: n/a

Other variation I did here for record:

from operator import itemgetter
from itertools import product

d= {0: {'A': 0.5, 'C': 0, 'T': 0, 'G': 0.5},
    1: {'A': 0, 'C': 0.5, 'T': 0.5, 'G': 0},
    2: {'A': 0.5, 'C': 0, 'T': 0, 'G': 0.5},
    3: {'A': 0, 'C': 0.5, 'T': 0.5, 'G': 0}}

s = [sorted((nucl for nucl in d[ind].items()), key=itemgetter(1)) for ind in range(len(d))]
priority = [''.join(a for a,b in n) for n in s]
print([''.join(comb) for comb in  product(*priority)])

Edited 5 Years Ago by pyTony: Fixed dict data layout

This question has already been answered. Start a new discussion instead.