If I have a hash function h(k, f(i)) and

f(int i){
return c * i;
}

And the first index that it tells me to insert the element at already has an element there, what do I do? Would the proper way to insert an element the first time be

h(k, f(0)) = the index I try to insert the element at.

Then, if there is a collision/an element is already at that index, I would do i += stepwise, and call h(k, f(i)) again?

In linear probing, you do indeed just try the next space and so on. If you use linear probing (or indeed any type of probing to resolve collisions) then you have to think about strategies to avoid pathalogically high seek times as the table fills up. Things to consider:
- …

## All 4 Replies

In linear probing, you do indeed just try the next space and so on. If you use linear probing (or indeed any type of probing to resolve collisions) then you have to think about strategies to avoid pathalogically high seek times as the table fills up. Things to consider:
- use a good-quality hash function
- instead of linear probing, try other strategies such as a second hash function to determine the next index to try (e.g. instead of just adding one, how about putting the index through a couple of cycles of an XORShift random number generator)
- define some probe limit after which you'll re-hash the table (or declare it "full")
But generally, it's not clear that probing is a good strategy for collision resolution. If you use the linked list per bucket approach, and "manually" allocate elements of your linked lists from a pre-allocated array, then you're effectively performing the same operations as with probing (cycling through elements of an array and checking against the key/hash code until you find a match), but without the risk of each collision itself increasing the chance of a further collision.

My question was about linear probing, not about another form of hashing. And your 'random number generator' comment doesn't make sense to me, because if you hash an element using randomness, then you cannot ever get back to that square in O(1) time, which is the goal of hashing.

Also, my lecture notes say that

f(i) = c * i, and that h(h(k), f(i)) is the hash function. My question is the following: How is an "i" value chosen for the first hash attempt? And if the first hash attempt is unsuccessful, do you simply increment "i" and try again?

The aim of the hash function is to give the keys as "random" a distribution as possible over the indices in the hash table. So in a simple hash function such as the one you mention, where you're essentially multiplying by a constant, you need to pick a constant that "avoids an obvious pattern" and "spreads the bits" of the original number over a range of bits in the hash code.

To illustrate what this means, it's easier to start with a bad example of the multiplier: if we picked 32, then we'd just be shifting the number 5 bits to the left. But if we multiply by 31, then we're effectively shifting 5 bits to the left, and THEN subtracting the original number. The net effect is that we "mix in" the bits of the start value in more places in the hash code. 31 is also a prime number, so that whatever the size of our hash table, we will be mixing bits again when we calculate (hash code % table size) to get the hash table index. For a graphical illustration of this "bit mixing", I wrote an article a while ago on how hash functions work that may help. In practice, numbers either side of a power of 2 (but definitely NOT a power of 2), and which are prime or have few factors, work reasonably well. Obviously, it always depends a little bit on your data.

My question was about linear probing, not about another form of hashing. And your 'random number generator' comment doesn't make sense to me, because if you hash an element using randomness, then you cannot ever get back to that square in O(1) time, which is the goal of hashing.

The random number generator is DETERMINISTIC: every time you give it the number x as input, it will always generate the SAME (but approximately "randomly distributed") number y.

f(i) = c * i, and that h(h(k), f(i)) is the hash function.

This is a hash function, but there are an essentially infinite number of possibilities!

If the first hash attempt is unsuccessful, do you simply increment "i" and try again?

As I say, if you are using linear probing, then yes. But there are other possibilities, and I don't think that in practice linear probing is commonly used, due to problems I mentioned in my previous post.

Ok, great. That's all I was looking for, hence posing such a specific question. Thanks.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, learning, and sharing knowledge.