Value of

set(sum(french,())) = set(['mardi', 'pass\xc3\xa9', 'voyez', 'envoy\xc3\xa9', 'membres', 's\xc3\xa9lection', 'peut', 'remplissaient', '\xc3\xa9t\xc3\xa9', 'prononcent', 'travaux', 'd\xc3\xa9terminent', 'trop', 'lib\xc3\xa9raux', 'd\xc3\xa9clar\xc3\xa9', 'dont', 'le', 'mais', 'la', '(', ',', 'internationales', 'Les',.....])

tm = {('se', 'est', '-', 'il', 'pass\xc3\xa9'): [phrase(english='has happened', logprob=0.0)], ('pos\xc3\xa9e',): [phrase(english='asked', logprob=-0.261521458626)], ('le', 'cours', 'de', 'les', 'deux', 'prochaines'): [phrase(english='the next two', logprob=0.0)], ('sujet', 'de'): [phrase(english='about', logprob=-0.390253186226)], ('pla\xc3\xaet',): [phrase(english='pleasure', logprob=-0.0914471149445)],.....}


for word in set(sum(french,())):
  if (word,) not in tm:
    tm[(word,)] = [models.phrase(word, 0.0)]

Q . when comes to if condition , what exaclty is it trying to do ?

Q . Is it comparing the whole tuple in tm ?

The if condition looks if the dictionary tm contains a key which is a tuple of length 1 containing the word as sole item. If it does not, a value is associated to this key in this dictionary.

It seems strange to me that you don't work with unicode strings instead of utf8-encoded strings. For exemple in python 2:

>>> s = 'pass\xc3\xa9'
>>> t = s.decode('utf8')
>>> print(t)
>>> print(repr(t))