Hey, one of you guys suggested that I go on Wikipedia's page for hash tables. It was actually pretty informative. I'm pressed for time so I am doing separate chaining for my hash table because it seems like it will be one of the easiest to implement (and according to wikipedia, thats also true). Anyway, I have a couple of questions on separate chaining. Thanks for any suggestions in advance.

1. When does the hash table need to be resized. I know about load factor, so lets say I choose 75%. Doesn't checking to see how full the hash table is require looking at every index to see if something is there, and incrementing a counter. If so, how often should I check to see how full the hash table is? It seems like doing so every time would be inefficient.

2. How should I decide what the initial capacity of the array that my hash table uses should be? Is it related to my individual problem (I think so)? If so, any general strategies for choosing a good size? Also, are there any bad things that can occur (other than wasting too much memory) from picking an original size that is too large?

Recommended Answers

All 5 Replies

Does the "table size should be a good choice, such as a prime number" property of hash tables always need to be preserved, or does it only apply to when the table is first created? In other words, when rehashing, should the new hash table size be a prime number? From what I've read about hash tables, I think it should always be a prime number, but my lecture notes say otherwise - they say to "double the size of the old hash table and re-insert the items from the old hash table".

<EDIT> Ignore please ............. mixed my tabs again :@

1) You know the size of the table and you can keep track of the load factor by modifying a counter when you add or remove a chain. This is pretty efficient.

2) Separate chaining is much more friendly when it comes to table size. Making it too small isn't a crisis because the chains grow as you need them, and making it too large will just waste space. Despite what some people will tell you, a lot of short chains is still fast, so as long as your hash function has a good distribution, you can get away with setting the initial size to the expected number of keys.

If you choose to make the table size prime, you should treat that as an invariant and make it prime with each resize as well. I say "if you choose to make the table size prime" because there are practical reasons for doing so, such as the hash function is weak and a non-prime table size promotes clustering, your collision strategy is known to work well with a prime size, has known issues with non-prime sizes, or you can't prove that your hash table will work correctly without a prime size (this is the most common situation).

So yes, if you choose a prime size, and it's not an arbitrary decision, you should maintain that decision throughout the life of the table.

Thank you, much appreciated

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.