I'm writing an application that needs to store a ton of objects and some attributes.

I was wondering, based on python's inner workings, if it's more memory efficient to have one huge dictionary or many small dictionaries and then one dictionary that just references the smaller dictionaries?

Any information would be helpful.


Personally i think if your that worried about efficiency then you should try a language like C++ which is a lot faster then python anyway.

But my bet would be on the larger one as that would take up less memory on your computer then lots of smaller ones.

Are you using binary trees?

For something like a dictionary thats proobably the most efficient

it halfs the search field each time it does a comparison so for lots and lots of nodes you only need a small number of compares (assuming its balanced)

A dictionary has hashed lookup and is highly optimized in Python, since the language uses dictionaries internally too. I would go with one dictionary.

C++ has the STL map<const Key, Data> container, but I don't know if that would be any memory saving, and don't think the lookup is hashed for speed.

Use whatever is easiest to write the code in.

Get it right. Profile; then see if you need to worry about optimisation.

Once you have a working program, you often find that optimisations are unnecessary, or, when you profile, you find that the bottleneck isn't where you thought it might be. This way you save wasted effort, and are more likely to have something working when your deadline looms.

- Paddy.

commented: makes a lot of sense! +11

Actually I did a bit of research a while back on how Python allocated memory. For types like lists and dictionaries, once they are created (and the space allocated), Python doesn't free the memory for the remainder of the script's lifetime, but instead holds onto it for when newer ones are created.

This would suggest that several smaller dictionaries are more efficient that one large one IF the dictionaries are not being used concurrently (ie, one is released and then another created). But for you this doesn't seem to be the case, so I would suggest that you go with one large one since it would probably be easier to manage.

commented: good point to ponder +11

Doesnt python have auto garbage collection?
Surely if you remove a node from the list it will thereore have nothing pointing to it, which should trigger disposal.

Yeah JBennet you are exactly right.
When you remove something from a list it is unallocated from memory therefore freeing that memory up for later use.

Well, thats how it works in Java at least, maybe python is different.

No no, you missed the point. Yes there is garbage collection, but for types like lists and dictionaries when they are garbage collected, the memory isn't actually freed, but it is kept by the interpreter (and not released back to the OS) for later use by new lists/dictionaries. The memory *is* available for use within the interpreter (so it has been garbage collected), it's just not released back to the OS.

Also note that this doesn't apply to *all* python objects (at least that's the impression I got when I read about it). Memory held by int, string, tuple objects etc. are released back to the OS when they are garbage collected.

Seems inefficient though, especially considering lists in python are used for some memory-intensive things like the AI in strategy
games (civ4 for example) where it would be beneficial to release the memory back to the OS

When you run a CPython program you won't see the size of the process shrink, from the OS POV. unused objects of all types are garbage collected, and the memory made available for later use by that process, but is not given bacck for other processes to use.

The only exception might be if you memory map a large file (mmap).

- Paddy.