0

I can't find the answer to this on google because I'm not sure exactly what to search for. I'm building a program where there will hardly ever be new data being added in however the data which is in the system will need to be displayed often.

So what I'm asking is... which collections framework will allow for me to get the fastest lookup/traversal on the objects stored within the collection, the speed of insertion/deletion is not top priority.

4
Contributors
3
Replies
4
Views
5 Years
Discussion Span
Last Post by mike_2000_17
1

Are you looking it up by key, or by value? Do you need to handle duplicates? Does it need to be sorted? What is the average tuple (element) size? How many elements will your collection contain? Before days of the STL (Standard Template Library) I implemented a wide variety of C and C++ collections such as maps, multimaps, hashmaps, hash tables, linked lists (single and double link types), sorted linked lists, circular lists, etc. Now, for C++ at least, one would just select the appropriate type of collection from the STL and be done with it. In Java there are generics (template classes) very similar to the collection template classes in C++.

0

The best way to find out is to try them out. There are many non-trivial issues beyond what you can reasonably predict based on theories (time-complexity or whatever else). It also depends on the factors that rubberman mentioned. The fastest traversal is always going to be with compact and contiguous memory (i.e. array) but you'll pay the price at insertions / deletions. The fastest lookup is, of course, hash tables (constant-time lookups, like looking up by index), but you pay the price in design complexity (finding a suitable hash function). The main factor for performance is predictability, meaning that the architecture (cache pre-fetchers and branch predictors) has to be able to easily predict what you will do next, this often results in 10x to 200x faster code (this is why linked-list traversals are usually too slow to actually be useful).

Basically, the holy grail is constant-time lookups and linear-time traversal (with low constant factor). A finely tuned hash function will give you constant-time lookups on average. Compact arrays will give you linear-time traversal. Having both at the same time might be very hard. There are trade-offs every step of the way.

Given a set of elements, you could, in theory, use the data to infer the best possible hash function, and then store the elements in an array, so you'll get the fastest possible lookups and traversals, but you pay a very high price if the elements change in any way (you have to recompute the hash function, and reorder everything).

On the other hand, you could store all the elements in a sorted array and look them up by binary search. You'll get as fast as possible in-order traversal, but your lookup speed will be reasonably good but far from optimal.

You can also store the elements in an array with a breadth-first layout of a binary search tree, giving you very fast lookup times (close to the theoretical logN time), with pretty close to best unordered traversal, but the ordered traversal will be good but not great.

And so on. It really all depends on the application, on the precise use-cases (number of elements, size of elements, duplicates, distribution, number of lookup vs. number of traversals, etc.). The best thing is to just try different standard ones and see what seems to be more beneficial, maybe later you can consider making a special container of your own that is best suited for your application. For example, in C++, you can easily test the unordered_set (or multi), set (or multi), and a sorted vector, and see what gives the best performance, that's a good start.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.