Case of big difference from the small change in Python code

TrustyTony 888 ex-Moderator

15 Years Ago

I optimized very carefully program I was producing by producing the main tight loop functions half a dozen times with various implementation styles. Also i considered amount of passing information in call hierarchy in parameters or letting functions to trust proper initialization of the global variables.

I got few interesting lessons. First, simple one is this:

Len is expensive for tight loops

Len is not O(1) (constant time) operation, but takes travelling through the list linked list style or otherwise it is slow.

Often if you keep your logic straight you have knowledge of any change of length of list and can use fast add/substract to have local variable to remember the value. In my case I did also string processing operation in function and with little extra effort I could not only return the result itself, but it's known length as tupple.

The curious case

Here is a diff file telling the only difference between two versions of my program, which had huge difference in performance:

89c89
< def find_words(kirj,smallest=1):
---
> def find_words(kirj):
113c113
< def find_words3(kirj,smallest=1):
---
> def find_words3(kirj):
152c152
< def ana_recursive(word, smallest=1, wordlist=[]):
---
> def ana_recursive(word, wordlist=[]):
169c169
<                 an=ana_recursive(c,wordlist=wordlist[j:])
---
>                 an=ana_recursive(c,wordlist[j:])
197c197
< def anagrams(origword, smallest):
---
> def anagrams(origword):
199c199
<     global words, delta, limit, Cancel, printing, start
---
>     global words,smallest, delta, limit, Cancel,printing, start
202,203d201
<     Cancel=False
<     print 'Cancel',Cancel
204a203
>     setlang(lang)
213c212
<     wordlist=find_words3(words,smallest)
---
>     wordlist=find_words3(words)
218c217
<     res = ana_recursive(words, wordlist=wordlist)
---
>     res = ana_recursive(words, wordlist)
230d228
<
266c264
<         res=anagrams(k,smallest)
---
>         res=anagrams(k)
275d272
<

Doesn't look much, does it?

How about performance.

Only difference in results was the time information in the beginning, and I saved those also for two test cases I used: 2 letter or more anagrams of my own and my brothers name.

2,3c2,3
< Processing took 904 ms.
< Finnishing time :Tue Mar 30 22:43:55 2010.
---
> Processing took 698 ms.
> Finnishing time :Tue Mar 30 22:42:17 2010.

2,3c2,3
< Processing took 10.333 s.
< Finnishing time :Tue Mar 30 22:39:43 2010.
---
> Processing took 12.962 s.
> Finnishing time :Tue Mar 30 22:44:40 2010.

The slower version is the one which has smallest anagram word size as parameter instead of global variable.

These programs were/are using psyco module to accelerate the code, but usually switching off psyco does not change the order of goodness of the code.

Comments?

Tony

P.S: OK, after the len removal optimisations etc. the times are now down to:
352 words loaded in 95 ms, 1564 anagrams found in 658 ms
and
1005 words loaded in 101 ms, 27315 anagrams found in 9.956 s
(but the dictionary has some additional previously missing words)

And I am not using hashes for words and dictionaries are straight text files not cPickles.

Edited 15 Years Ago by TrustyTony because: n/a

1 Contributor
0 Replies
116 Views

Be the first to reply

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.