Timing: Lists vs Dictionaries

Question

woooee 814 Nearly a Posting Maven

12 Years Ago

The following code does the same lookups using a list and a dictionary. Lists are sequential in memory, meaning that the program has to get the offset for the beginning of the item it wants to use, go to that offset, compare to what it wants to find, and go back and do it all over again for each element in the list. A dictionary uses a hash which creates small groups, for lack of a better term, so there is a relatively small number of lookups. For each tenfold increase in data, the dictionary's time increases by ten times also, but is still pretty fast. The list's lookup time increases by 100+ times for the same increase. One would expect the same increases for a tenfold increase in the size of each individual item being stored as that would also increase the anount of memory that has to be traversed. So it appears that you can't really decide on using a list vs. dictionary based on the number of items but have to consider the total amount of memory to be consumed.

import datetime

def add_to_list(num):
    a_list=[]
    start_time=datetime.datetime.now()
    for ctr in xrange(num):
        ##  lookup up each time to show the "cost" of lookups
        if ctr not in a_list:
            a_list.append(ctr)

    print "elapsed time:", datetime.datetime.now()-start_time


def add_to_dict(num):
    a_dict={}
    start_time=datetime.datetime.now()
    for ctr in xrange(num):
        ##  lookup up each time to show the "cost" of lookups
        if ctr not in a_dict:
            a_dict[ctr]=ctr

    print "elapsed time:", datetime.datetime.now()-start_time


for num in [1000, 10000, 100000]:
    print "list of %6d" % (num), 
    add_to_list(num)
    print "dict of %6d" % (num), 
    add_to_dict(num)
    print

""" compare execute times for list vs. dictionary for different
    lengths of input data

list of   1000 elapsed time: 0:00:00.012220
dict of   1000 elapsed time: 0:00:00.000264  Approx 61 times faster

list of  10000 elapsed time: 0:00:01.210761
dict of  10000 elapsed time: 0:00:00.003146  Approx 390 times

list of 100000 elapsed time: 0:02:30.698131
dict of 100000 elapsed time: 0:00:00.03296   Approx 4566.7 times
"""

python

Edited 12 Years Ago by woooee

2 Contributors
1 Reply
205 Views
14 Hours Discussion Span
Latest Post 12 Years Ago Latest Post by sepp2k

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sepp2k 378 Practically a Master Poster · Answer 1 · 2013-03-19T11:49:13+00:00

What exactly were you trying to measure here? If you just wanted to measure the cost of lookups, wouldn't it have made more sense to just measure a function that makes a bunch of lookups in an already-built list or dictionary without measuring the time it takes to build the list or dictionary?

By the way: When you say that the time of the list code increases by factor 100+ when the input size increases by 10, that's not really accurate. It increases by around 100 the first time you increase the input size. The second time, the increase factor is about twice that. And if you increased the input size again, the runtime would increase by an even larger factor. That is to say: the runtime of your list code is quadratic, not linear.

Lists are sequential in memory, meaning that the program has to get the offset for the beginning of the item it wants to use, go to that offset, compare to what it wants to find, and go back and do it all over again for each element in the list.

That's not really accurate. It starts at the memory address where the list begins and then increases the address by the size of the list elements (i.e. sizeof(void*) in Python) until it finds the element it's looking for. There's no going back involved.

One would expect the same increases for a tenfold increase in the size of each individual item being stored as that would also increase the anount of memory that has to be traversed. So it appears that you can't really decide on using a list vs. dictionary based on the number of items but have to consider the total amount of memory to be consumed.

I'm not following that. If the size of the individual items increases, we can assume that the runtime of performing an equality operations also increases (though of course it's possible to define types for which that would not be the case - in which case I would not expect any difference in runtime), which would slow down the list code significantly because it performs O(n) equality comparisons every time you check whether an item is included in the list. For a dictionary there will be only a small number of equality comparisons per inclusion check (only one if there are no hash collisions), so it will be slowed down to a much smaller degree. However none of this has anything to do with traversing memory.