neilcoffey 79 Junior Poster in Training

An important thing to understand is that there are two types of exceptions:

  • checked exceptions, which are thrown when your code (or some code in the Java libraries) explicitly throws them; in this case, methods that throw such exceptions must be declared as throwing them, and code that calls such methods is forced to deal with the exception in some way (either with a try/catch block, or by "throwing it up")
  • unchecked exceptions are special exceptions, such as IndexOutOfBoundsException, NullPointerException that don't need to be explicitly handled (you've probably seen that if part of your code accidentally has a null pointer, the exception "automatically" gets printed out without you having to catch it) -- what happens here is that something called the "default uncaught exception handler" takes care of it

Now, there are certain language operations that can throw certain unchecked exceptions (such as trying to access a out-of-bounds array index, doing an integer division by zero, or accessing a null object reference). So in these cases, no you don't need to explicitly throw anything -- you kind of have no choice. (Though you can catch them and deal with them).

You also can explicitly throw any type of exception from any method that you write -- you don't have to create your own exception class if you don't want to. For example, if you want to signal to the caller of your method that they've passed in an invalid argument, then Java already defines IllegalArgumentException

neilcoffey 79 Junior Poster in Training

The aim of the hash function is to give the keys as "random" a distribution as possible over the indices in the hash table. So in a simple hash function such as the one you mention, where you're essentially multiplying by a constant, you need to pick a constant that "avoids an obvious pattern" and "spreads the bits" of the original number over a range of bits in the hash code.

To illustrate what this means, it's easier to start with a bad example of the multiplier: if we picked 32, then we'd just be shifting the number 5 bits to the left. But if we multiply by 31, then we're effectively shifting 5 bits to the left, and THEN subtracting the original number. The net effect is that we "mix in" the bits of the start value in more places in the hash code. 31 is also a prime number, so that whatever the size of our hash table, we will be mixing bits again when we calculate (hash code % table size) to get the hash table index. For a graphical illustration of this "bit mixing", I wrote an article a while ago on how hash functions work that may help. In practice, numbers either side of a power of 2 (but definitely NOT a power of 2), and which are prime or have few factors, work reasonably well. Obviously, it always depends a little bit on your data.

My question was about linear …

neilcoffey 79 Junior Poster in Training

In linear probing, you do indeed just try the next space and so on. If you use linear probing (or indeed any type of probing to resolve collisions) then you have to think about strategies to avoid pathalogically high seek times as the table fills up. Things to consider:
- use a good-quality hash function
- instead of linear probing, try other strategies such as a second hash function to determine the next index to try (e.g. instead of just adding one, how about putting the index through a couple of cycles of an XORShift random number generator)
- define some probe limit after which you'll re-hash the table (or declare it "full")
But generally, it's not clear that probing is a good strategy for collision resolution. If you use the linked list per bucket approach, and "manually" allocate elements of your linked lists from a pre-allocated array, then you're effectively performing the same operations as with probing (cycling through elements of an array and checking against the key/hash code until you find a match), but without the risk of each collision itself increasing the chance of a further collision.

Ezzaral commented: Very helpful post. +15