Simple multiword anagram candidate words

14 Years Ago TrustyTony 0 1K Views

As in C++ was some question about speed of simple hard disk based lookup, here is a version of unscramble program completely HD based lookup, which gives all possible candidate words for multiword anagram.

Program is not sensitive for case of words but does not ignore special characters like ' or space, but sub-word condition means that it is possible to input any number of special characters without disturbance.
For example:

Possible multiword anagram candidate words
To quit enter empty line

Give word: Tony Veijalainen
a ye to on no it in at an yon yet yen yea voe vie via vet vat van toy ton toe tin tie ten tee tea tan ova one oil oat not nit nil net nee nay lye lot lit lie let lee lea lay joy jot jet jay ivy ion inn eye eve eta eon eel eat aye ate any ant ana ale yeti vote volt vole viol vine vile vial veto vent vein veil veal vane vale vain tone toil tiny tile teen teal tale tail oven oval only oily nova note none nine neon neat navy nave nail love lone loin loan live lion lint lino line lien levy lent leat lean lava late lane lain jolt join joey jive jilt jail iota into evil even envy elan anon anal alto aeon voile vital viola vinyl venal valet tonne tonal tinny tenon talon ovate olive nylon novel novae navel naval natal naive liven linen lento leave leant laity joint inlet inlay inane event envoy enjoy elven elite elate eaten avian avail atone anvil annoy annal anion alone alive alien violin violet venial vanity vainly tannoy tannin ninety neatly neaten native nation litany linnet levity leaven jovial invite invent intone innate evenly entail enjoin atonal anyone anoint anneal violent violate valiant toenail tinnily olivine novelty neonate naivety naivete naively lenient jointly javelin inanity inanely enliven elation antenna aniline alanine aeolian venetian venality neonatal national innovate innately aviation antennae alienate valentine joviality inviolate invention elevation alienation
Took 4178 ms

Give word:

python

## Simple anagram candidate words
from time import clock
def isanaword(part,bigger, sub=False):
    """ goes through the letters of second word (bigger) and returns it
        if first word (part) contains exactly same letters in same number
        if sub is true also accept partial anagram as part

    """
    for c in bigger:
        if c not in part: return "" ## letter not contained in first one found
        part=part.replace(c,'',1)
    if sub or part in '\n': return bigger ## if all letters used, full anagram

if __name__=="__main__":
    print('Possible multiword anagram candidate words\nTo quit enter empty line')
    inputword=' '
    wordlist=sorted(open('uk.txt').read().lower().split(),key=len)
    while inputword:
        inputword=raw_input('\nGive word: ').lower()
        if inputword:
            t=clock()
            for wd in [w
                       for w in wordlist
                       if isanaword(inputword, w, sub=True)]:
                print wd,
            print
            print 'Took %i ms'%((clock()-t)*1000)

TrustyTony 888 ex-Moderator

14 Years Ago

The time in my example case (my name) directly from command line takes actually 156 ms in my computer. Quite a IDLE penalty of redirection. Problem with internal timer also? (time.clock())

With Python 2.6 and psyco 116 ms.

Edited 14 Years Ago by TrustyTony because: n/a

TrustyTony 888 ex-Moderator

14 Years Ago

An example how you should name your variables correct from beginning, I renamed the old variable names to opposite of their meaning (because my full anagram program has parameters in opposite order). The IDLE issue seems to be dismissal performance of multiple small print statements, so I simplified the code to use ', '.join.

Actually this program works from word list from disk and sorts only once, not for each word as some earlier versions. The load takes typically round 50 ms in my computer for the 57047 word UK dictionary.

## Simple sub-anagrams
from time import clock

#{ python specialising compiler
try:
    import psyco
    psyco.full()
except:
    pass
#}


def part_of(bigger, part):
    """ goes through the letters of part and returns it
        if part is partial anagram of bigger
    
    """
    # each letter of part must be in bigger
    for c in part:
        if c not in bigger: return ""
        bigger=bigger.replace(c, '', 1)
    return part

if __name__=="__main__":
    t = clock()
    wordlist = sorted(open('uk.txt').read().lower().split(), key=len, reverse=True)
    t -= clock()
    print('Wordlist load and sort took %i ms'%(-t * 1000))
    
    print('''
Generating possible multiword anagram
candidate words in reverse length order.

(To quit enter empty line)''')

    while True:
        inputword=raw_input('\nGive word: ').lower()
        if not inputword:
            break
        t = clock()
        print(', '.join(wd for wd in (part_of(inputword, w)
                                       for w in wordlist)
                        if wd))                  
        t -= clock()
        print('Took %i ms'%(-t * 1000))
        
    print "Bye, bye!"

Edited 14 Years Ago by TrustyTony because: Timing stop before print

TrustyTony 888 ex-Moderator

14 Years Ago

Here is finally one version that is made to run in both Python2 and Python3. It is also possible to translate to C++ with Shedskin 0.7 translator (load/sort time is longer, 188 ms without sort, around 125-140 ms with the sort)

EDIT: Added also possibility to give the name of dictionary file as argument.

## Simple sub-anagrams
from time import clock
import sys

#{ python specialising compiler
try:
    import psyco
    psyco.full()
except:
    pass
#}

try:
    input = raw_input
except NameError:
    pass

def part_of(bigger, part):
    """ goes through the letters of part and returns it
        if part is partial anagram of bigger

    """
    # each letter of part must be in bigger
    for c in part:
        if c not in bigger: return ""
        bigger=bigger.replace(c, '', 1)
    return part

if __name__=="__main__":
    t = clock()
    wordlist = sorted(open('dict/uk.txt' if len(sys.argv) == 1 else sys.argv[1]).read().lower().split(), 
                      key=len,
                      reverse=True)
    t -= clock()
    print('Wordlist load and sort took %i ms'%(-t * 1000))

    print('''
Generating possible multiword anagram
candidate words in reverse length order.

(To quit enter empty line)''')

    while True:
        inputword = input('\nGive word: ').lower()
        if not inputword:
            break
        t = clock()
        print(', '.join(wd for wd in (part_of(inputword, w)
                                       for w in wordlist)
                        if wd))
        t -= clock()
        print('Took %i ms'%(-t * 1000))

    print("Bye, bye!")

Edited 14 Years Ago by TrustyTony because: n/a

svfox2000 0 Newbie Poster

12 Years Ago

good stuff there. Thanks!

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.