Super simple one word anagrams

TrustyTony 0 Tallied Votes 857 Views Share

Here is solution number one for one word anagrams, I gave in discussion thread, response time in my humble Athlon PC under 70 ms.

List of words is from original posting of the discussion thread.
http://www.daniweb.com/forums/post1206616.html#post1206616

This version does not give we'd for input dew.

Next time I give final speed demon solution: preparing lookup table file.

You can prove the program without the length check part it is optional, program becomes around 3 times slower without it in my computer.

## solution 1: fast enough response, simple function, fail fast
from time import clock
def isanaword(k,s):
    """ goes through the letters of second word (s) and returns it
        if first word (k) contains exactly same letters in same number
        """
    ## different length, not anagram, makes function faster (fail fast principle)
    if (len(k) != len(s)) : return ""  ## optional
    for c in s:
        pos = k.find(c)
        if pos==-1: return "" ## letter not contained in first one found
        k = k[:pos]+k[pos+1:] ## drop the letter in found position pos by slicing
    if not k: return s ## condition enables to remove length check
    
if __name__=="__main__":
    print('To quit enter empty line')
    inputword=' '
    while inputword:
        inputword=raw_input('Give word: ')
        if inputword:
            t=clock()
            for wd in [w.rstrip() for w in open('list.txt') if isanaword(w.rstrip(),inputword)]:
                print wd,
            print
            print 'Took %.2f s'%(clock()-t)
TrustyTony 888 pyMod Team Colleague Featured Poster

Here little shorter alternative formulation, but still with timing, which could be removed to simplify:

## solution 1: fast enough response, simple function, fail fast
from time import clock
def isanaword(k,s):
    """ goes through the letters of second word (s) and returns it
        if first word (k) contains exactly same letters in same number
        """
    for c in s:
        pos = k.find(c)
        if pos==-1: return "" ## letter not contained in first one found
        k = k[:pos]+k[pos+1:] ## drop the letter in found position pos by slicing
    if not k: return s ## if all letters used, full anagram
    
if __name__=="__main__":
    print('To quit enter empty line')
    inputword=' '
    while inputword:
        inputword=raw_input('Give word: ')
        if inputword:
            t=clock()
            for wd in [w.rstrip()
                       for w in open('list.txt')
                       if (len(w) == len(inputword)+1 ## newline longer
                           and isanaword(w.rstrip(),inputword))]:
                print wd,
            print
            print 'Took %i ms'%((clock()-t)*1000)

Speed also impoves:

To quit enter empty line
Give word: emit
emit item mite time
Took 40 ms
Give word: omit
omit
Took 41 ms
Give word: reset
ester reset steer terse
Took 43 ms
Give word:
TrustyTony 888 pyMod Team Colleague Featured Poster

Still faster:

To quit enter empty line
Give word: emit
emit
item
mite
time
Took 30 ms
Give word: omit
omit
Took 30 ms
Give word: reset
ester
reset
steer
terse
Took 33 ms
Give word: puretmoc
computer
Took 37 ms
Give word: purentoc
Took 37 ms
Give word:
## solution 1: fast enough response, simple function, fail fast
from __future__ import print_function
from time import clock

def isanaword(k,s):
    """ goes through the letters of second word (s) and returns it
        if first word (k) contains exactly same letters in same number
        """
    for c in s:
        if c not in k:
            return "" ## letter not contained in first one found
        k=k.replace(c,'',1)
    if not k.rstrip():
        return s ## if all letters used except whitespace in end
    
if __name__=="__main__":
    print('To quit enter empty line')
    inputword=' '
    while inputword:
        inputword=raw_input('Give word: ')
        li=len(inputword)+1
        if inputword:
            t=clock()
            for wd in [w.rstrip()
                       for w in open('list.txt')
                       if (len(w)==li ## newline longer
                           and isanaword(w,inputword))]:
                print(wd)
            print('Took %i ms'%((clock()-t)*1000))
jcao219 18 Posting Pro in Training

Wouldn't using itertools.permutations be easier?

TrustyTony 888 pyMod Team Colleague Featured Poster

Very inefficient. Most of permutations are not valid words.

Key in efficient generation of anagrams is limiting the number of candidates for anagrams to those words that are possible.

For one word case see my lookup table generation version.
http://www.daniweb.com/code/snippet288327.html

I am planning to use my multiword anagram routine of mine commercially, so unfortunately I can not publish that.

This is nice solution for simplicity and memory use. With some tweaking, removing niceties of printing Python 3 support, It could be even made to an one-three liner.

jcao219 18 Posting Pro in Training

Oh, I see.

I notice that each time the user inputs some text,
the program will open the words file
and go through each word in that file, checking its length
against the input text's length.
Wouldn't it be more efficient if you loaded part or all of the words from the file into memory upon first user input?
(So the program doesn't have to read from the file as much)
Then, subsequent lookups may be faster.

jcao219 18 Posting Pro in Training

Indeed, I just converted part of your code to IronPython, tweaked it with the modification I mentioned above, and ran it.

Look at the attachment.

Big drawback is that the first time a word's length is inputted, there is a ten ms delay.

Ipy Code:

from System.Diagnostics import Stopwatch
from System.IO import File

def isanaword(k,s):
    """ goes through the letters of second word (s) and returns it
        if first word (k) contains exactly same letters in same number
        """
    for c in s:
        if c not in k:
            return "" ## letter not contained in first one found
        k=k.replace(c,'',1)
    if not k.rstrip():
        return s ## if all letters used except whitespace in end

def populate_words_dict(length):
    wordsdata[length] = [w for w in File.ReadAllLines("list.txt")
                    if len(w) == length]
    
if __name__=="__main__":    
    wordsdata = {}
    
    print('To quit enter empty line')
    inputword=' '
    while inputword:
        inputword=raw_input('Give word: ')
        li=len(inputword)
        if inputword:
            sw = Stopwatch()
            sw.Start()
            if li not in wordsdata:
                populate_words_dict(li)            
            for wd in [w.rstrip()
                       for w in wordsdata[li]
                           if isanaword(w,inputword)]:
                print(wd)
            sw.Stop()
            print('Took {0} ms.'.format(sw.ElapsedMilliseconds))
TrustyTony 888 pyMod Team Colleague Featured Poster

I see, you do length by length saving version of the dictionary as in memory lookup table. Only your program should not run as wordsdata is not declared global in populate_words_dict, so it is local to function. Still it works, this I do not understand, maybe it generates the dict for every time? Looks faster after adding global for me.

Here is run after removing IronPython modules. Looks me not worth the loss in memory use and complexity. (My lookup version though is possible to simplify, even it is possible to order words by the letters function and use binary search instead of loading anything to memory)

The idea of this program is to give simple solution, as the original posted one word anagram did on the fly full dict of the whole word list in memory without saving it for later use. The lookup table version sacrifices space for time (only needing one time per dictionary anagram generation) and is surely faster than doing lookup table without saving. Even there I keep the plain human readable text format, not using pickle, which in my experience produces very big files. That is unacceptable as the programs should run even in mobile phone.

Main thing is that when you take the timing out, user of this program feels that response is immediate. For that response < 100 ms is quick enough.

I did use the lookup table solution to my full anagram program to deal with shortest anagrams (less than 2*minimum word lenghth). Also the method of one letter removing which is from user ihatehippies, helped to speed up my main program (in different function of course).

Unfortunately I found out that getting keys from anawords is for some reason slower than reading from file original word list and saving list of possible words (even it is dropping extra chars like ' and - on fly every time to allow free format of word list).

In full anagram program I only load list of possible words once, including smaller words producible from that list. After that it is not worth to reduce that set of words even all of them are not possible after one or more words have been selected.

I am using the psyco module to speed thigs up. I have not seen need to use IronPython as it is not possible to use psyco there and so it is much slower.


Here is program as I prepared to run with standard python functions:

## solution 1: fast enough response, simple function, fail fast
from __future__ import print_function ## for python 2.6
from time import clock

def isanaword(k,s):
    """ goes through the letters of second word (s) and returns it
        if first word (k) contains exactly same letters in same number
        """
    k=list(k) ## mutable list instead of string
    for c in s:
        if (c not in k):
            return "" ## letter not contained in first one found
        else:
            k.remove(c) ## remove first occurance
    if not k : return s
            
def populate_words_dict(length):
    global wordsdata
    wordsdata[length] = [w.rstrip() for w in open("list.txt")
                    if len(w) == length+1]
    
if __name__=="__main__":    
    wordsdata = {}
    
    print('To quit enter empty line')
    inputword=' '
    while inputword:
        inputword=raw_input('Give word: ')
        li=len(inputword)
        if inputword:
            t=clock()
            if li not in wordsdata:
                populate_words_dict(li)            
            for wd in [w.rstrip()
                       for w in wordsdata[li]
                           if isanaword(w,inputword)]:
                print(wd)
            print()
            print('Took %i ms'%((clock()-t)*1000))
To quit enter empty line
Give word: pucomret
computer
Took 51 ms

Give word: computer
computer
Took 27 ms

Give word: similar
similar
Took 51 ms

Give word: same
mesa
same
seam
Took 32 ms

Give word: item
emit
item
mite
time
Took 9 ms

Give word: smae
mesa
same
seam
Took 9 ms

Give word: quit
quit
Took 8 ms

Give word: eixt
exit
Took 9 ms

Give word:
jcao219 18 Posting Pro in Training

I see, you do length by length saving version of the dictionary as in memory lookup table. Only your program should not run as wordsdata is not declared global in populate_words_dict, so it is local to function. Still it works, this I do not understand, maybe it generates the dict for every time? Looks faster after adding global for me.

I never use global.
It does not generate the dict every time, whenever python's interpreter is inside a function, if a name is referenced inside, it first checks the local scope, and seeing that wordsdata is not there, it will then check the global scope.
Since wordsdata is mutable, it would just change that one instance.

Also, in some cases IronPython is actually faster than CPython because its compiled to CIL.

I have not seen need to use IronPython as it is not possible to use psyco there and so it is much slower.

I would much rather create a program like this in C# if speed was important because then it generates a single assembly that runs rather quickly.

jcao219 18 Posting Pro in Training

Support for a wildcard * would be good.
Then people can type in happ* and get happy as a result.

Are you working on a GUI for this? For some reasons non-technical users might prefer a GUI.

TrustyTony 888 pyMod Team Colleague Featured Poster

I have user interface ideas for Tkinter, Nokia'a own appuifw (only menus now) and also QT. Problem is that I would like to publish it in Symbian phones in OVI store, but neither QT (even it is supposed to be future of Nokia phones) nor Python programs can be put there, because programs can not have external dependencies and not to be too large. So I guess I must use the old interface and known to me part of C++ i.e. use it as C language which I know to use in basic level. Speed is not so much issue as I can so quickly 100 or more anagrams and continue to generate more.

For example, you could easily start to prepare dictionaries even in this simple program in other thread until user has inputted the words, the smallest word length, the dictionary used...

My program is multiword anagram program. You input Jimmy Cao, and English dictionary and length two as minimum word length, and the program spits answer in my lowly Sempron spits answer no anagrams found 0 ms processing (dictionary loading from simple text dictionary 15 ms, 15 fitting words). You have not very anagram friendly name.

My name is my test case, for English it is little slower (see screenshot of plain Tk interface version, number is test of progress indicator value, which updates only at end), in Finnish the program finds 1565 anagrams in total time of 640 ms (67 ms word selection, cleaning and loading from text file).

I have not yet progress bar working, I have one global variable updated now and then from function, but no threading implemented (I do have one after function updating the number once per second, but it seems not update during intensive calculation routine, maybe need 1 ms sleep now and then?)

jcao219 18 Posting Pro in Training

I have user interface ideas for Tkinter, Nokia'a own appuifw (only menus now) and also QT. Problem is that I would like to publish it in Symbian phones in OVI store, but neither QT (even it is supposed to be future of Nokia phones) nor Python programs can be put there, because programs can not have external dependencies and not to be too large. So I guess I must use the old interface and known to me part of C++ i.e. use it as C language which I know to use in basic level. Speed is not so much issue as I can so quickly 100 or more anagrams and continue to generate more.

For example, you could easily start to prepare dictionaries even in this simple program in other thread until user has inputted the words, the smallest word length, the dictionary used...

My program is multiword anagram program. You input Jimmy Cao, and English dictionary and length two as minimum word length, and the program spits answer in my lowly Sempron spits answer no anagrams found 0 ms processing (dictionary loading from simple text dictionary 15 ms, 15 fitting words). You have not very anagram friendly name.

My name is my test case, for English it is little slower (see screenshot of plain Tk interface version, number is test of progress indicator value, which updates only at end), in Finnish the program finds 1565 anagrams in total tiem of 640 ms (67 ms word selection, cleaning and loading from text file).

I have not yet progress bar working, I have one global variable updated now and then from function, but no threading implemented (I do have one after function updating the number once per second, but it seems not update during intensive calculation routine, maybe need 1 ms sleep now and then?)

Your program looks good.
Yes, you need threading whenever you have a GUI. That way, the program doesn't seem to freeze while you do calculations.

TrustyTony 888 pyMod Team Colleague Featured Poster

I have though plan to change from return to yield and that way I have control back often to main GUI and maybe can persuade tk GUI to show nice progress bar.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, learning, and sharing knowledge.