| | |
Simple Hash Function for Integer Arrays
Please support our C advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: Mar 2007
Posts: 3
Reputation:
Solved Threads: 0
My research group is trying to implement a hash table to reduce the effects of a very costly step in our calculations.
Basically, it can take more than an hour to go from A to B, and B is something ridiculously simple, like a number between 1 and 100.
A in this case is an integer array with the following attributes:
What is a good hashing function that would work and still be reversible?
Thanks,
natnit
Basically, it can take more than an hour to go from A to B, and B is something ridiculously simple, like a number between 1 and 100.
A in this case is an integer array with the following attributes:
- any entry can be either positive or negative
- average of 20 entries per array
- all entries are even
- with k elements in an array, values range from (+/-)2...2k, with no repeats
- {4, -2, 6, 10, -8}
- {-2, -4, -6}
- {6, 4, -2}
What is a good hashing function that would work and still be reversible?
Thanks,
natnit
•
•
Join Date: May 2004
Posts: 178
Reputation:
Solved Threads: 10
You will need 5 million different hash tables.
Which meanns that if you are lokking for a given value, say -4, across all of your arrays you will have to consult each hash to see if -4 is in the array and which element it is.
This will result in some performance gain.
I'm assuming that the position of the element and the value in the in the array both have meaning.
It sounds like the data design needs to be rethought -- more than that a hash table will fix your problems. For example, a database can do
highly optimized lookups, like count the number of element #3 that equals 20.
I dunno, don't know enough to about your problem to help, really.
Narue has a site
www.eternallyconfuzzled.com
that has a good explanation of hashing algorithms.
UNIX systems usually come with hcreate, hsearch as an implementation of using Knuth's
original hashing work.
I personally think you've got a data design problem. Possibly you should have put in associative arrays upfront as part of the design.
Hashing added after the fact may only help a little. Depending on how far down the call tree your lookups are embedded in your code.
Which meanns that if you are lokking for a given value, say -4, across all of your arrays you will have to consult each hash to see if -4 is in the array and which element it is.
This will result in some performance gain.
I'm assuming that the position of the element and the value in the in the array both have meaning.
It sounds like the data design needs to be rethought -- more than that a hash table will fix your problems. For example, a database can do
highly optimized lookups, like count the number of element #3 that equals 20.
I dunno, don't know enough to about your problem to help, really.
Narue has a site
www.eternallyconfuzzled.com
that has a good explanation of hashing algorithms.
UNIX systems usually come with hcreate, hsearch as an implementation of using Knuth's
original hashing work.
I personally think you've got a data design problem. Possibly you should have put in associative arrays upfront as part of the design.
Hashing added after the fact may only help a little. Depending on how far down the call tree your lookups are embedded in your code.
Are these arraya your keys to the map/list? Can there be any repetition?
If there is repetition, is it needed to hold all instances (e.g. like in a std::multimap) or only one key should be there (like in std::map)?
What's the max number of entries per array?
How abt picking up the first value bit (not the sign) from first 24 entries in the array? You'll have a 3 byte key that can represent values more than 5 mil. (of course for lesser number of entries you pickup more bits).
If there is repetition, is it needed to hold all instances (e.g. like in a std::multimap) or only one key should be there (like in std::map)?
What's the max number of entries per array?
How abt picking up the first value bit (not the sign) from first 24 entries in the array? You'll have a 3 byte key that can represent values more than 5 mil. (of course for lesser number of entries you pickup more bits).
•
•
•
•
You will need 5 million different hash tables.........I personally think you've got a data design problem. Possibly you should have put in associative arrays upfront as part of the design.
Hashing added after the fact may only help a little. Depending on how far down the call tree your lookups are embedded in your code.
So I just hope tht these arrays represent your "keys" and not "values".
If this is NOT the case and you still wanna introduce hashing then read on.
If your key is ranging from -2k..+2k (lets discount -2..+2 for now) where avg k is 0 and max (guessing) is say 50. AND you don't have memory problems. Then you can simply strip the sign bit, divide by 2 and use the number itself as key, probability of collision is less than 5% if I'm not wrong, but that you gotta test anyway.

E.g. here are the hash values for your examples:
{4, -2, 6, 10, -8}
hash= {2, 1, 3, 5, 4}
{-2, -4, -6}
hash= { 1, 2, 3}
{6, 4, -2}
hash= {3, 2, 1}
•
•
Join Date: Mar 2007
Posts: 3
Reputation:
Solved Threads: 0
Thanks for the reply. The picture you paint of the current situation is quite grim. 
I was thinking more along the lines of a string conversion from the array. I'm assuming there are string to int hash functions, and a string is nothing more than an array of characters. So all I'd need is something to map -2, 8, 4 etc to chars.
I am still confused about your reasoning on why I need 5 million hash tables, though.

I was thinking more along the lines of a string conversion from the array. I'm assuming there are string to int hash functions, and a string is nothing more than an array of characters. So all I'd need is something to map -2, 8, 4 etc to chars.
I am still confused about your reasoning on why I need 5 million hash tables, though.
•
•
Join Date: Nov 2006
Posts: 134
Reputation:
Solved Threads: 3
•
•
•
•
Thanks for the reply. The picture you paint of the current situation is quite grim.
I was thinking more along the lines of a string conversion from the array. I'm assuming there are string to int hash functions, and a string is nothing more than an array of characters. So all I'd need is something to map -2, 8, 4 etc to chars.
I am still confused about your reasoning on why I need 5 million hash tables, though.
Nothing easier than to map integers onto characters; a look up table would do it, but how would that help?
•
•
Join Date: May 2004
Posts: 178
Reputation:
Solved Threads: 10
You did not explain your full data design. So the answer as to why 5 million hash tables =
one hash for one array.
Explain how you need to lookup values in each array. It would help. It might be possible to hash across all arrays.
BUT.
Your collision rate will be out of this world because you must have zillions of duplicates.
Either duplicate element # in the array or duplicate values in the array. This assumes you do not need to lookup based on the entire array's contents. In that second case, hashing might actually help somewhat.
one hash for one array.
Explain how you need to lookup values in each array. It would help. It might be possible to hash across all arrays.
BUT.
Your collision rate will be out of this world because you must have zillions of duplicates.
Either duplicate element # in the array or duplicate values in the array. This assumes you do not need to lookup based on the entire array's contents. In that second case, hashing might actually help somewhat.
Last edited by jim mcnamara; Mar 9th, 2007 at 4:15 pm.
•
•
Join Date: Mar 2007
Posts: 3
Reputation:
Solved Threads: 0
Ok, let me refine my question, since I think there is a communication problem here somewhere.
Assume I have lots of array values that I want to be able to look up easily.
Each array is at most 60 integers long, and for the sake of argument, I have 1 million arrays.
All I'm looking for is a hash function to produce keys. For example:
{2, 4, 6} -> 3821
{2, 6, 4} -> 91313
{4, -2} -> 1243
I'm sure that there should be something, because going from int[] to int shouldn't be that different from going from char[] to int...
I hope that made it clearer.
Assume I have lots of array values that I want to be able to look up easily.
Each array is at most 60 integers long, and for the sake of argument, I have 1 million arrays.
All I'm looking for is a hash function to produce keys. For example:
{2, 4, 6} -> 3821
{2, 6, 4} -> 91313
{4, -2} -> 1243
I'm sure that there should be something, because going from int[] to int shouldn't be that different from going from char[] to int...
I hope that made it clearer.
•
•
Join Date: Nov 2006
Posts: 134
Reputation:
Solved Threads: 3
•
•
•
•
Ok, let me refine my question, since I think there is a communication problem here somewhere.
Assume I have lots of array values that I want to be able to look up easily.
Each array is at most 60 integers long, and for the sake of argument, I have 1 million arrays.
All I'm looking for is a hash function to produce keys. For example:
{2, 4, 6} -> 3821
{2, 6, 4} -> 91313
{4, -2} -> 1243
I'm sure that there should be something, because going from int[] to int shouldn't be that different from going from char[] to int...
I hope that made it clearer.
I will suggest a possible approach rather than give an answer. Think of the values contained in the array as being the digits in a number with an appropriately large base. The hash value, or at least a basis for it, could then be the value of that number.
![]() |
Other Threads in the C Forum
- Previous Thread: Posix Timers
- Next Thread: help for writing a program that'll read in a line of text
| Thread Tools | Search this Thread |
* ansi api array arrays bash binarysearch calculate centimeter changingto char character convert copyanyfile copypdffile createcopyoffile createprocess() csyntax directory dynamic fflush file floatingpointvalidation fork forloop frequency getlasterror getlogicaldrivestrin givemetehcodez graphics gtkgcurlcompiling gtkwinlinux hardware highest homework i/o ide inches initialization intmain() iso km license linked linkedlist linux linuxsegmentationfault list logical_drives loopinsideloop. lowest match matrix microsoft motherboard mqqueue multi mysql oddnumber odf open opendocumentformat openwebfoundation pdf pointer pointers posix power program programming pyramidusingturboccodes read recursion recv recvblocked repetition reversing scanf scheduling segmentationfault send shape single socketprogramming stack standard strchr string strings suggestions test testautomation unix urboc user variable whythiscodecausesegmentationfault win32api windows.h windowsapi





