This code takes in a scrambled input word and gives out the unscrambled list of words.
Eg... Input = time
Output = mite, item, emit etc.
I had posted an unscrambler in java sometime ago. This one is the improvement over the previous one. In the sense, the java unscrambler took an input word, and produced all permutations of the input word, did a binary search of each permutation on the list of available words. The problem was that if the input word is sufficiently large, it would produce an OutOfMemory error, because the permutations would be large.
The improvement is that this one, maintains a dictionary of the the available set words whose key is a word with the letters sorted in ascending order as key and the value is a list of all the words that have the same letters as the key.
Eg... Key = eimt, Value = [time, mite, emit, item].
Attaching the code.

Recommended Answers

All 10 Replies

#!/usr/bin/python
# the simple idea behind the method is to sort the word read from the file
# and store it in a hashtable. Now when a scrambled input comes in from the user
# it is sorted and its hash determines the list of words with the same letters.
# this new idea obsoletes the method getValueHash.

def getdicthash():
	dicthash={}
	dicthash[0]=['a']
	inputfile = open('./list.txt', 'r')
	#Get the word and compute the hash of the word.
	word = "a"
	while(word!= ""): #make sure there is no space in the quotes. This almost killed me trying to debug.
		word = inputfile.readline()
		word = word.strip() # remove the new line char
		listword = tolist(word)
		backtoword = toword(listword)
		hashofword = backtoword.__hash__()
		if(dicthash.has_key(hashofword)):
			wordlist = dicthash[hashofword]
			wordlist.append(word)
		else:
			dicthash[hashofword] = [word]
	
	inputfile.close()
	return dicthash



def compareword(word,scrambledinput):
	if(len(word) == len(scrambledinput)):
		listword = tolist(word)
		listinput = tolist(scrambledinput)
		wordreturned = toword(listword)
		scrambledinputreturned = toword(listinput)
		if(wordreturned.__hash__() == scrambledinputreturned.__hash__()):
			return True
		else:
			return False
	else:
		return False

def tolist(word):
	finallist=[]
	for i in word:
		finallist.append(i)
	
	finallist.sort()
	return finallist


def toword(word):
	finalword=''
	for i in word:
		finalword = finalword+i
	
	return finalword


if __name__=="__main__":
	response = 'y'
	hashe = getdicthash()
	while response == 'y':
		input = raw_input("Enter a scrambled word : ")
		listinput =	tolist(input)
		backtoinput = toword(listinput)
		try :
			matchinglist = hashe[backtoinput.__hash__()]
			for words in matchinglist:
				print words
		except:
			print "Word not found"

		response = raw_input("Do u want to continue.. enter y or n :")

Good, but shouldn't this have been posted as a code snippet?

For permutations, you could have used itertools.permutations.

I have done full anagram program with weeks of performance testing about which I did one post to this forum. These both programs look overly complicated. I will publish one word version based on my algorithm shortly as code snipet together with speed comparision to these posted versions.

instead of while(word!= ""): you could have written while(word):

I have done full anagram program with weeks of performance testing about which I did one post to this forum. These both programs look overly complicated. I will publish one word version based on my algorithm shortly as code snipet together with speed comparision to these posted versions.

This one is slow for the very first input, but should be very fast for the subsequent inputs as it is just a dictionary lookup.
The idea behind this very simple, though the code may look a little complicated. I am quite new to python. This code was like an exercise for me to learn the language better. Therefore I am sure this can be improved to a good extent.

Here simplified part of my code, it is propably worse in looking massive amount of words against same dictionary, but for interactive use fast enough and super simple:

## my solution
def isanaword(k,s):
    """ goes through the letters of second word (s) and returns it
        if first word (k) contains exactly same letters in same number
        """
    if (len(k) != len(s)) : return "" ## different lenghth 
    for c in s:
        i = k.find(c)
        if i==-1: return "" ## letter not contained in first one found
        k = k[0:i]+k[i+1:]
    return s
    
if __name__=="__main__":
    print "To stop: enter empty line"
    while True:
        i=raw_input('Give word: ')
        if i :
            print [j.rstrip()
                   for j in open('list.txt')
                   if isanaword(i, j.rstrip())
                   ]
        else: break

Here is my code with timing added. It appears from my tests that the response is less than 1/10th of second/looked word

rom time import clock
## my solution for one word anagram finder
def isanaword(k,s):
    """ goes through the letters of second word (s) and returns it
        if first word (k) contains exactly same letters in same number
        """
    if (len(k) != len(s)) : return "" ## different lenghth 
    for c in s:
        i = k.find(c)
        if i==-1: return "" ## letter not contained in first one found
        k = k[0:i]+k[i+1:]
    return s
    
if __name__=="__main__":
    print "To stop: enter empty line"
    while True:
        i=raw_input('Give word: ')
        t=clock()
        if i :
            print [j.rstrip()
                   for j in open('list.txt')
                   if isanaword(i, j.rstrip())
                   ]
            print '%.3f s' % (clock()-t) ## < 0.1 s
        else: break

Here is my code with timing added. It appears from my tests that the response is less than 1/10th of second/looked word

rom time import clock
## my solution for one word anagram finder
def isanaword(k,s):
    """ goes through the letters of second word (s) and returns it
        if first word (k) contains exactly same letters in same number
        """
    if (len(k) != len(s)) : return "" ## different lenghth 
    for c in s:
        i = k.find(c)
        if i==-1: return "" ## letter not contained in first one found
        k = k[0:i]+k[i+1:]
    return s
    
if __name__=="__main__":
    print "To stop: enter empty line"
    while True:
        i=raw_input('Give word: ')
        t=clock()
        if i :
            print [j.rstrip()
                   for j in open('list.txt')
                   if isanaword(i, j.rstrip())
                   ]
            print '%.3f s' % (clock()-t) ## < 0.1 s
        else: break

Does this program need to iterate through every word in list.txt for every input?

Yes it does and plenty of enough fast. If you want solution for every word you can prepare it solutions only one time and use it with lookup after that, different from multiword anagrams:

Included is one copy of anawords.txt prepared by the program. It is only necessary file in addition to the program itself. Or you can put 'list.txt' together with program in same directory and the program prepares the files analist.txt and from there anawords.txt.

## my solution for all anagrams 
from time import clock
import os,sys

dictionary = 'list.txt'

takeout=' \t\'-+\n\r' ## deleted letters from words
## choosing first argument for translate
if sys.version[:3]>='2.6':
    table=None #python 2.6
else:
    print 'Old python'
    table=''
    for i in range(256): t+=chr(i)


def letters(a):
    let=''.join(sorted(a))
    let = let.translate(table,takeout)
    return let

def writeout(inp,out):
    words= [w.rstrip() for w in open(inp)]
    words=[letters(a)+' '+a for a in words]
    words.sort(key=len)
    open(out,'w').write('\n'.join(words))

def getanawords(aw='anawords.txt',al='analist.txt'):
    if not os.path.isfile(aw) and not os.path.isfile(al): writeout(inp=dictionary,out=al)
    
    anawords=dict()
    
    if not os.path.isfile(aw):
        for l,w in [j.split() for j in open(al)]:
            if l in anawords: anawords[l].append(w)
            else: anawords[l]=[w]
        print "Dict prepared in %1.3f s" % (clock()-t)

        f=open(aw,'w')
        for i in sorted(anawords,key=len):
            f.write(i+' '+' '.join(anawords[(i)])+'\n')
        print "Dict saved for future"
    else:
        for i in open(aw):
            i=i.rstrip().split()
            anawords[i[0]]=i[1:]
        print "Saved dict loaded"
    return anawords
 
if __name__=="__main__":
    ## listing all words
    t=clock()

    anawords=getanawords()
    print "Preparations took %1.3f s" % (clock()-t)
               
    print "To stop: enter empty line"
    while True:
        i=raw_input('Give word: ')
        i=letters(i)
        t=clock()
        if i :
            if i in anawords: print anawords[i]
            else: print 'Word is not in vocabulary'
            print '%.3f s' % (clock()-t)
        else: break
""" Output:
>> 
Dict prepared in 0.755 s
Dict saved for future
Preparations took 0.989 s
To stop: enter empty line
Give word: item
['emit', 'item', 'mite', 'time']
0.011 s
Give word: sense
['sense']
0.010 s
Give word: dew
['dew', 'wed', "we'd"]
0.009 s
Give word: 
>>> 
## second run
>> 
Saved dict loaded
Preparations took 0.235 s
To stop: enter empty line
Give word: manhole
['manhole']
0.010 s
Give word: team
['mate', 'meat', 'tame', 'team']
0.010 s
Give word: conputer
Word is not in vocabulary
0.010 s
Give word: cotermup
['computer']
0.010 s
Give word: 
>>> 
"""

By directly double clicking the timing is really different:

Dict prepared in 0.560 s
Dict saved for future
Preparations took 0.716 s
To stop: enter empty line
Give word: item
['emit', 'item', 'mite', 'time']
0.001 s
Give word: date
['date']
0.000 s
Give word: deal
['dale', 'deal', 'lade', 'lead']
0.001 s
Give word: teba
['abet', 'bate', 'beat', 'beta']
0.001 s
Give word: erpa
['pare', 'pear', 'rape', 'reap']
0.001 s
Give word: conputeri
Word is not in vocabulary
0.000 s
Give word: retupmoc
['computer']
0.000 s
Give word:

And second time

Saved dict loaded
Preparations took 0.132 s
To stop: enter empty line
Give word: item
['emit', 'item', 'mite', 'time']
0.001 s
Give word:
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.