| | |
Linear Probing
Please support our Java advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: Sep 2008
Posts: 1,597
Reputation:
Solved Threads: 202
If I have a hash function h(k, f(i)) and
f(int i){
return c * i;
}
And the first index that it tells me to insert the element at already has an element there, what do I do? Would the proper way to insert an element the first time be
h(k, f(0)) = the index I try to insert the element at.
Then, if there is a collision/an element is already at that index, I would do i += stepwise, and call h(k, f(i)) again?
f(int i){
return c * i;
}
And the first index that it tells me to insert the element at already has an element there, what do I do? Would the proper way to insert an element the first time be
h(k, f(0)) = the index I try to insert the element at.
Then, if there is a collision/an element is already at that index, I would do i += stepwise, and call h(k, f(i)) again?
•
•
Join Date: Dec 2008
Posts: 53
Reputation:
Solved Threads: 6
In linear probing, you do indeed just try the next space and so on. If you use linear probing (or indeed any type of probing to resolve collisions) then you have to think about strategies to avoid pathalogically high seek times as the table fills up. Things to consider:
- use a good-quality hash function
- instead of linear probing, try other strategies such as a second hash function to determine the next index to try (e.g. instead of just adding one, how about putting the index through a couple of cycles of an XORShift random number generator)
- define some probe limit after which you'll re-hash the table (or declare it "full")
But generally, it's not clear that probing is a good strategy for collision resolution. If you use the linked list per bucket approach, and "manually" allocate elements of your linked lists from a pre-allocated array, then you're effectively performing the same operations as with probing (cycling through elements of an array and checking against the key/hash code until you find a match), but without the risk of each collision itself increasing the chance of a further collision.
- use a good-quality hash function
- instead of linear probing, try other strategies such as a second hash function to determine the next index to try (e.g. instead of just adding one, how about putting the index through a couple of cycles of an XORShift random number generator)
- define some probe limit after which you'll re-hash the table (or declare it "full")
But generally, it's not clear that probing is a good strategy for collision resolution. If you use the linked list per bucket approach, and "manually" allocate elements of your linked lists from a pre-allocated array, then you're effectively performing the same operations as with probing (cycling through elements of an array and checking against the key/hash code until you find a match), but without the risk of each collision itself increasing the chance of a further collision.
•
•
Join Date: Sep 2008
Posts: 1,597
Reputation:
Solved Threads: 202
My question was about linear probing, not about another form of hashing. And your 'random number generator' comment doesn't make sense to me, because if you hash an element using randomness, then you cannot ever get back to that square in O(1) time, which is the goal of hashing.
Also, my lecture notes say that
f(i) = c * i, and that h(h(k), f(i)) is the hash function. My question is the following: How is an "i" value chosen for the first hash attempt? And if the first hash attempt is unsuccessful, do you simply increment "i" and try again?
Also, my lecture notes say that
f(i) = c * i, and that h(h(k), f(i)) is the hash function. My question is the following: How is an "i" value chosen for the first hash attempt? And if the first hash attempt is unsuccessful, do you simply increment "i" and try again?
•
•
Join Date: Dec 2008
Posts: 53
Reputation:
Solved Threads: 6
The aim of the hash function is to give the keys as "random" a distribution as possible over the indices in the hash table. So in a simple hash function such as the one you mention, where you're essentially multiplying by a constant, you need to pick a constant that "avoids an obvious pattern" and "spreads the bits" of the original number over a range of bits in the hash code.
To illustrate what this means, it's easier to start with a bad example of the multiplier: if we picked 32, then we'd just be shifting the number 5 bits to the left. But if we multiply by 31, then we're effectively shifting 5 bits to the left, and THEN subtracting the original number. The net effect is that we "mix in" the bits of the start value in more places in the hash code. 31 is also a prime number, so that whatever the size of our hash table, we will be mixing bits again when we calculate (hash code % table size) to get the hash table index. For a graphical illustration of this "bit mixing", I wrote an article a while ago on how hash functions work that may help. In practice, numbers either side of a power of 2 (but definitely NOT a power of 2), and which are prime or have few factors, work reasonably well. Obviously, it always depends a little bit on your data.
The random number generator is DETERMINISTIC: every time you give it the number x as input, it will always generate the SAME (but approximately "randomly distributed") number y.
This is a hash function, but there are an essentially infinite number of possibilities!
As I say, if you are using linear probing, then yes. But there are other possibilities, and I don't think that in practice linear probing is commonly used, due to problems I mentioned in my previous post.
To illustrate what this means, it's easier to start with a bad example of the multiplier: if we picked 32, then we'd just be shifting the number 5 bits to the left. But if we multiply by 31, then we're effectively shifting 5 bits to the left, and THEN subtracting the original number. The net effect is that we "mix in" the bits of the start value in more places in the hash code. 31 is also a prime number, so that whatever the size of our hash table, we will be mixing bits again when we calculate (hash code % table size) to get the hash table index. For a graphical illustration of this "bit mixing", I wrote an article a while ago on how hash functions work that may help. In practice, numbers either side of a power of 2 (but definitely NOT a power of 2), and which are prime or have few factors, work reasonably well. Obviously, it always depends a little bit on your data.
•
•
•
•
My question was about linear probing, not about another form of hashing. And your 'random number generator' comment doesn't make sense to me, because if you hash an element using randomness, then you cannot ever get back to that square in O(1) time, which is the goal of hashing.
•
•
•
•
f(i) = c * i, and that h(h(k), f(i)) is the hash function.
As I say, if you are using linear probing, then yes. But there are other possibilities, and I don't think that in practice linear probing is commonly used, due to problems I mentioned in my previous post.
![]() |
Similar Threads
- Reading txt file into Hash Table (C++)
- Dictionary ADT (Computer Science)
- Help needed in designing a c++ program using hashing and linear probing!! (C++)
- exam technique? (C++)
- Array required, but java.lang.String and java.util.Vector found (Java)
- array required, but java.lang.String found and java.util.Vector found? (Community Introductions)
Other Threads in the Java Forum
- Previous Thread: problem in solve query??
- Next Thread: getting the output of a terminal as input into a java program.
| Thread Tools | Search this Thread |
account android api applet application array arrays automation bidirectional binary birt bluetooth chat class classes client code columns component data database designadrawingapplicationusingjavajslider draw eclipse editor error errors event exception expand fractal game givemetehcodez graphics gui guidancer homework html ide image inetaddress inheritance input integer intellij j2me java javaprojects jlabel jme jni jpanel jtextfield julia linux list loop map method methods midlethttpconnection mobile mobiledevelopmentcreatejar monitoring myaggfun netbeans newbie nullpointerexception open-source plazmic print problem program programming project property recursion ria scanner screen search server set size sms smsspam sort sourcelabs splash sql sqlite static string subclass support swing testautomation threads tree windows






