My research group is trying to implement a hash table to reduce the effects of a very costly step in our calculations.

Basically, it can take more than an hour to go from A to B, and B is something ridiculously simple, like a number between 1 and 100.

A in this case is an integer array with the following attributes:

  • any entry can be either positive or negative
  • average of 20 entries per array
  • all entries are even
  • with k elements in an array, values range from (+/-)2...2k, with no repeats

Examples:

  • {4, -2, 6, 10, -8}
  • {-2, -4, -6}
  • {6, 4, -2}

I don't have an exact figure, but I think we have around 5 million distinct arrays to hash here.

What is a good hashing function that would work and still be reversible?

Thanks,
natnit

You will need 5 million different hash tables.
Which meanns that if you are lokking for a given value, say -4, across all of your arrays you will have to consult each hash to see if -4 is in the array and which element it is.

This will result in some performance gain.
I'm assuming that the position of the element and the value in the in the array both have meaning.

It sounds like the data design needs to be rethought -- more than that a hash table will fix your problems. For example, a database can do
highly optimized lookups, like count the number of element #3 that equals 20.

I dunno, don't know enough to about your problem to help, really.

Narue has a site
www.eternallyconfuzzled.com
that has a good explanation of hashing algorithms.

UNIX systems usually come with hcreate, hsearch as an implementation of using Knuth's
original hashing work.

I personally think you've got a data design problem. Possibly you should have put in associative arrays upfront as part of the design.
Hashing added after the fact may only help a little. Depending on how far down the call tree your lookups are embedded in your code.

Are these arraya your keys to the map/list? Can there be any repetition?
If there is repetition, is it needed to hold all instances (e.g. like in a std::multimap) or only one key should be there (like in std::map)?

What's the max number of entries per array?

How abt picking up the first value bit (not the sign) from first 24 entries in the array? You'll have a 3 byte key that can represent values more than 5 mil. (of course for lesser number of entries you pickup more bits).

Completely agree with Jim on this. Add to this if you're hoping to make a lookup faster in an array of 20 elements, I don't think you're looking at major performance gains.

So I just hope tht these arrays represent your "keys" and not "values".

If this is NOT the case and you still wanna introduce hashing then read on.
If your key is ranging from -2k..+2k (lets discount -2..+2 for now) where avg k is 0 and max (guessing) is say 50. AND you don't have memory problems. Then you can simply strip the sign bit, divide by 2 and use the number itself as key, probability of collision is less than 5% if I'm not wrong, but that you gotta test anyway. :)

E.g. here are the hash values for your examples:

{4, -2, 6, 10, -8}
hash= {2,  1, 3,   5,  4}
{-2, -4, -6}
hash= { 1,  2,  3}
{6, 4, -2}
hash= {3, 2,  1}

Edited 3 Years Ago by happygeek: fixed formatting

Thanks for the reply. The picture you paint of the current situation is quite grim. :)

I was thinking more along the lines of a string conversion from the array. I'm assuming there are string to int hash functions, and a string is nothing more than an array of characters. So all I'd need is something to map -2, 8, 4 etc to chars.

I am still confused about your reasoning on why I need 5 million hash tables, though.

Thanks for the reply. The picture you paint of the current situation is quite grim. :)

I was thinking more along the lines of a string conversion from the array. I'm assuming there are string to int hash functions, and a string is nothing more than an array of characters. So all I'd need is something to map -2, 8, 4 etc to chars.

I am still confused about your reasoning on why I need 5 million hash tables, though.

Nothing easier than to map integers onto characters; a look up table would do it, but how would that help?

You did not explain your full data design. So the answer as to why 5 million hash tables =

one hash for one array.

Explain how you need to lookup values in each array. It would help. It might be possible to hash across all arrays.

BUT.

Your collision rate will be out of this world because you must have zillions of duplicates.
Either duplicate element # in the array or duplicate values in the array. This assumes you do not need to lookup based on the entire array's contents. In that second case, hashing might actually help somewhat.

Ok, let me refine my question, since I think there is a communication problem here somewhere.

Assume I have lots of array values that I want to be able to look up easily.

Each array is at most 60 integers long, and for the sake of argument, I have 1 million arrays.

All I'm looking for is a hash function to produce keys. For example:

{2, 4, 6} -> 3821
{2, 6, 4} -> 91313
{4, -2} -> 1243

I'm sure that there should be something, because going from int[] to int shouldn't be that different from going from char[] to int...

I hope that made it clearer.

Ok, let me refine my question, since I think there is a communication problem here somewhere.

Assume I have lots of array values that I want to be able to look up easily.

Each array is at most 60 integers long, and for the sake of argument, I have 1 million arrays.

All I'm looking for is a hash function to produce keys. For example:

{2, 4, 6} -> 3821
{2, 6, 4} -> 91313
{4, -2} -> 1243

I'm sure that there should be something, because going from int[] to int shouldn't be that different from going from char[] to int...

I hope that made it clearer.

I will suggest a possible approach rather than give an answer. Think of the values contained in the array as being the digits in a number with an appropriately large base. The hash value, or at least a basis for it, could then be the value of that number.

Are these arrays your keys to the map/list? Can there be any repetition?
If there is repetition, is it needed to hold all instances (e.g. like in a std::multimap) or only one key should be there (like in std::map)?

What's the max number of entries per array?

If you answer a few questions may be you'll get more/faster help..

Hi Natnit,

Have you found the solution?
I have exactly the same problem here.
Regards,
Mix

That thread is 4 years old, so I doubt any answer will be forthcoming. Please let old threads rest in peace, and start a new thread - hopefully with better communication skills than was used here by Natnit.

This article has been dead for over six months. Start a new discussion instead.