Hi
I am developing a spell checker for my language, and I came across
solving an interesing problem. It would be great if you can suggest
me an optimized solution for the problem described below:

I have certain group of letters like these:

Group #1: c,r,b
Group #2: a,z,k
Group #3: h,t
.
.

Letters within the same group can be substituted/replaced with the
rest of the letters in that group.

(ie. Group #1 means letter 'c' can be replaced with either 'r' or 'b',
and letter r can be replaced with either c or b, and letter b can be
replaced with either r or c)

A group can have upto 3 letters, and letters do not overlap between
groups. There can be upto 25 such groups.
Given a word containing above letters, (e.g. 'cat'.) I want to
generate all possible words by subsituting letters in the groups.

eg. expected output:

cat rat bat
czt rzt bzt
ckt rkt bkt
cah rah bah
czh rzh bzh
ckh rkh bkh

My code is given below. But for complex words like 'cccccccccczzk' it
thows the 'maximum recursion depth exceeded' error. Therefore, can you
suggest me an optimized solution to achive the above task? Thanks in
advance.

#-*- coding: UTF-8 -*-
import re
def getPermutations(word):
   def getSimilarLetters(letter):
       pattern=u"|c,r,b|a,z,k|h,t|"
       list=[]
       ptrn=re.compile("(?P<grp>\|(.?,)*?"+letter+"(,.)*?\|)")
       k=ptrn.search(pattern)
       if k:
           letterlist=k.group("grp")[1:-1]
           list=letterlist.split(",")
           list.remove(letter)
           return list
       else:
           return letter

   list=[]

   def perm(word):
       posi=0
       wrd=word
       for w in word:
           posi=posi+1
           lst=getSimilarLetters(w)
           if len(lst)>0:
               for l in lst:
                   newwrd=wrd[:posi-1]+l+wrd[posi:]
                   if not (newwrd in list):
                       list.append(newwrd)
                       perm(newwrd)

   try:
       perm(word)
   except RuntimeError:
       list=[]
       list.append("error")
   return list

list=getPermutations(u"cat")

for i in list:
   print i.encode('utf-8')

Here is a possible solution to this problem. I don't know if it's "optimal" (in which sense ?). However, you shouldn't reach a recursion limit, because the recursion depth is proportional to the logarithm of the length of the word.

#-*- coding: UTF-8 -*-

class PermutatorException(Exception):
	pass

class Permutator(object):

	def __init__(self, words):
		self.groups = {}
		words = [''.join(sorted(set(w))) for w in words]
		for w in words:
			for c in w:
				if c in self.groups:
					raise PermutatorException("same letter in 2 different words")
				else:
					self.groups[c] = w

	def permutations(self, word):
		n = len(word)
		if n <= 1:
			if n:
				for c in (self.groups[word] if word in self.groups else word):
					yield c
			else:
				yield ''
			return
		x = len(word)/2
		w0, w1 = word[:x], word[x:]
		for p0 in self.permutations(w0):
			for p1 in self.permutations(w1):
				yield p0 + p1

if __name__ == "__main__":
	perm = Permutator(["crb", "azk", "ht"])
	for word in perm.permutations("cat"):
		print word

Thank you very much for the prompt reply. Your code works fine with English text. But when I tried to modify it to include my language, it gives following error:

Traceback (most recent call last):
  File "F:/Downloads/combi good.py", line 34, in <module>
    perm = Permutator(["සෂශ", "නණ", "ළල"])
  File "F:/Downloads/combi good.py", line 14, in __init__
    raise PermutatorException("same letter in 2 different words")
__main__.PermutatorException: same letter in 2 different words

But actually, the same letter is not repeated in the different groups.

The modified code is given below: (The language is Sinhala, and its in Unicode).

#-*- coding: UTF-8 -*-

class PermutatorException(Exception):
    pass

class Permutator(object):

    def __init__(self, words):
        self.groups = {}
        words = [''.join(sorted(set(w))) for w in words]
        for w in words:
            for c in w:
                if c in self.groups:
                    raise PermutatorException("same letter in 2 different words")
                else:
                    self.groups[c] = w

    def permutations(self, word):
        n = len(word)
        if n <= 1:
            if n:
                for c in (self.groups[word] if word in self.groups else word):
                    yield c
            else:
                yield ''
            return
        x = len(word)/2
        w0, w1 = word[:x], word[x:]
        for p0 in self.permutations(w0):
            for p1 in self.permutations(w1):
                yield p0 + p1

if __name__ == "__main__":
    perm = Permutator(["සෂශ", "නණ", "ළල"])
    for word in perm.permutations("ලන"):
        print word

Here is a possible solution to this problem. I don't know if it's "optimal" (in which sense ?). However, you shouldn't reach a recursion limit, because the recursion depth is proportional to the logarithm of the length of the word.

#-*- coding: UTF-8 -*-

class PermutatorException(Exception):
	pass

class Permutator(object):

	def __init__(self, words):
		self.groups = {}
		words = [''.join(sorted(set(w))) for w in words]
		for w in words:
			for c in w:
				if c in self.groups:
					raise PermutatorException("same letter in 2 different words")
				else:
					self.groups[c] = w

	def permutations(self, word):
		n = len(word)
		if n <= 1:
			if n:
				for c in (self.groups[word] if word in self.groups else word):
					yield c
			else:
				yield ''
			return
		x = len(word)/2
		w0, w1 = word[:x], word[x:]
		for p0 in self.permutations(w0):
			for p1 in self.permutations(w1):
				yield p0 + p1

if __name__ == "__main__":
	perm = Permutator(["crb", "azk", "ht"])
	for word in perm.permutations("cat"):
		print word

I had the same problem in french, with accented letters, like "éab" . It worked when I put the u at the beginning of the strings. For example, I wrote perm = Permutator([u"éab", u"àzk", u"ht"]) . There should be more robust solutions using the methods encode and decode of the classes str and unicode .

Comments
great help

Thanks a million! It works perfectly. here goes the modified code for reference:

#-*- coding: UTF-8 -*-

class PermutatorException(Exception):
    pass

class Permutator(object):

    def __init__(self, words):
        self.groups = {}
        words = [''.join(sorted(set(w))) for w in words]
        for w in words:
            for c in w:
                if c in self.groups:
                    raise PermutatorException("same letter in 2 different words")
                else:
                    self.groups[c] = w

    def permutations(self, word):
        n = len(word)
        if n <= 1:
            if n:
                for c in (self.groups[word] if word in self.groups else word):
                    yield c
            else:
                yield ''
            return
        x = len(word)/2
        w0, w1 = word[:x], word[x:]
        for p0 in self.permutations(w0):
            for p1 in self.permutations(w1):
                yield p0 + p1

if __name__ == "__main__":
    perm = Permutator([u"සෂශ", u"නණ", u"ළල"])
    for word in perm.permutations(u"lලpppනනlලනනlලනලනනලනලනනlllll"):
        print word.encode("utf-8")

I had the same problem in french, with accented letters, like "éab" . It worked when I put the u at the beginning of the strings. For example, I wrote perm = Permutator([u"éab", u"àzk", u"ht"]) . There should be more robust solutions using the methods encode and decode of the classes str and unicode .

This article has been dead for over six months. Start a new discussion instead.