I need help on an assignment, I don't know where to start, I have coded to so far for it to read in the words from a text file and an array for the words to be put into, after that I'm lost as to what the next step would be. Please Help!

we are to take a list of 28500 words (one word per line) and apply 2 hashing functions to each word, don't store in hash table but arrays initialized to zeros. Array size of 5701, ideal expected average size for the chains would be 28500/5701 = about 5
for each function report the average chain length, number of empty chains and the 5 longest
report on goodness of the functions:
number of chains of length zero is less than 10%
length of the longest chain will be less than eight times the expected length of the
average chain.

First Function:
add the Unicode values of each of the Characters in the word and save as an integer
and obtain the remainder using 5701

Second Function:
For each of the following character strings s, compute the location
h(s) = n(smaller subscript s) mod 5701
for this string, mapping each symbol in s to a number between 0 to 127 with base 128 integer

This may help you--

http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

-- furthermore, from the looks of the FAQ and your assignment, it's fairly obvious of what you need to do.

Since you're using Strings, I'd assume that you'd use their first character as a key and then store them through a hash by sorting them before inserting them. The sorting isn't required, but recommended. Sure inserting values may be difficult, but by knowing the chains are sorted it should provide easier access to the values inquestion.

I think I can provide a good example... shouldn't be too hard in Java, it'll just take time @_@

I think this may be of some help, though there are a few notable flaws--

import java.util.Arrays;

public class HashTest{

	// 4 chains, assuming 20 values and 5 values per chain
	private String[] chain1 = new String[5],
		      	     chain2 = new String[5],
		      	  	 chain3 = new String[5],
		      	     chain4 = new String[5];
	// since 4 chains, each key will be value%4
	int used[] = new int[4];

       public HashTest(){
           for(int i = 0; i < used.length; i++){
                 used[i] = 0; // not necessary, but leaving values uninitialized is sloppy
           }
       }

	public static void main(String... args){

		HashTest ht = new HashTest();

		ht.insert("Faker");
		ht.insert("People");
		ht.insert("Abby");
		ht.insert("Fools Gold!");
		ht.showChainsUsed();
		ht.showChains();
	}

	public void insert(String value){

		//get first char of String and use as key
		char c = value.charAt(0);

		int n = ((int)c)%4; // casting c as an int then modding by 4

		switch(n){
			case 0:
					chain1[used[0]] = value;
					used[0]++;
					Arrays.sort(chain1, 0, used[0]); // no necessary, but helpful
					break;
			case 1:
					chain2[used[1]] = value;
					used[1]++;
					Arrays.sort(chain2, 0, used[1]); // not necessary, but helpful
					break;
			case 2:
					chain3[used[2]] = value;
					used[2]++;
					Arrays.sort(chain3, 0, used[2]);
					break;
			case 3:
					chain4[used[3]] = value;
					used[3]++;
					Arrays.sort(chain4, 0, used[3]);
					break;
			// no default expected, values will only be between 0 and 3
		}
	}

	public boolean search(String arg){
		char c = arg.charAt(0);

		int n = ((int)c)%4;

		switch(n){
			case 0:
					for(int i = 0; i < used[0]; i++){
						if(chain1[i].equalsIgnoreCase(arg))
							return true;
					}
					return false;
			case 1:
					for(int i = 0; i < used[1]; i++){
						if(chain2[i].equalsIgnoreCase(arg))
							return true;
					}
					return false;
			case 2:
					for(int i = 0; i < used[2]; i++){
						if(chain3[i].equalsIgnoreCase(arg))
							return true;
					}
					return false;
			case 3:
					for(int i = 0; i < used[3]; i++){
						if(chain4[i].equalsIgnoreCase(arg))
							return true;
					}
					return false;
		}
		return false;
	}

	public String getValAt(int chain, int indice){
		try{
			switch(chain){
				case 0:	return chain1[indice];
				case 1:	return chain2[indice];
				case 2:	return chain3[indice];
				case 3:	return chain4[indice];
			}
		}catch(Exception e){
			System.out.println(e);
		}
		return null;
	}

	public void showChainsUsed(){
		for(int i = 0; i < used.length; i++)
			System.out.println("chain " + i + " used: " + used[i]);
	}

	public void showChains(){
		for(int j = 0; j < 4; j++){
			for(int i = 0; i < used[j]; i++){
				switch(j){
					case 0:
							System.out.print(chain1[i] + " ");
							break;
					case 1:
							System.out.print(chain2[i] + " ");
							break;
					case 2:
							System.out.print(chain3[i] + " ");
							break;
					case 3:
							System.out.print(chain4[i] + " ");
							break;
				}
			}
			System.out.println();
		}
	}
}

--first this is a very VERY simplified hashing algorithm that only uses the first character for the hash. It is far, FAR more efficient to use a range of bitsets to do hashes. Here is a quote from the site provided that explains the reason--

Properties of an ideal hash

With the understanding that collisions will be inevitable in all but the most specialized of cases, a primary goal in developing a hash function for lookup is to minimize collisions. This typically means forcing a uniform distribution of the hash value, much like a random number generator. However, unlike a random number generator, the process must be repeatable so that the same key hashes to the same index, while different keys do not. This goal can be broken down into two general properties of an ideal hash.

An ideal hash will permute its internal state such that every resulting hash value has exactly one input that will produce it. Any hash function that uses every part of the key to calculate the hash value will typically meet this requirement, so general hash functions that only process a part of the key will almost always be very poor because the differing parts of the key may not be the parts involved in creating the hash value. A good example of this is only hashing the first four or five characters of a string, and then using the algorithm to hash URLs that start with “http://”.

A hash function is said to achieve avalanche if the resulting hash value is wildly different if even a single bit is different in the key. This effect aids distribution because similar keys will not have similar hash values. A hash function that distributes the hash values in a uniform manner will minimize collisions and fill the array more evenly. Avalanche is a concept derived from cryptographic hashing, and it offers a way to ensure that a hash function is good when used for table lookup.

Often you will see hash functions that claim to be "best for strings" or “best for integers”. These functions should be avoided because if a hash function is not good for all types of data then it is probably a poor algorithm in general. Sometimes, on the other hand, a hash function may be optimized for a single type for performance reasons. It is good to learn to tell the difference between the two, but a safe practice is to only use general hash functions that are known to be good. This tutorial offers several examples of good general hash functions.

-- the second notable error in my program is this function--

public boolean search(String arg){
		char c = arg.charAt(0);

		int n = ((int)c)%4;

		switch(n){
			case 0:
					for(int i = 0; i < used[0]; i++){
						if(chain1[i].equalsIgnoreCase(arg))
							return true;
					}
					return false;
			case 1:
					for(int i = 0; i < used[1]; i++){
						if(chain2[i].equalsIgnoreCase(arg))
							return true;
					}
					return false;
			case 2:
					for(int i = 0; i < used[2]; i++){
						if(chain3[i].equalsIgnoreCase(arg))
							return true;
					}
					return false;
			case 3:
					for(int i = 0; i < used[3]; i++){
						if(chain4[i].equalsIgnoreCase(arg))
							return true;
					}
					return false;
		}
		return false;
	}

--notice that the arrays are sorted, yet I still search through all values. This is for non-sorted compatibility. If you wanted to optimize the above code, you'd search until the argument was less than the chain indice evaluated.

Edit: I just reread the requirements. You need to add all of the Unicode values of the characters in each String then mod them by the number of chains to get their key, then add them.

So instead of using char c for the first char, you'll be iterating through the Strings and summing up each character in the String to a total, then hash it for the key.

Thanks so much! That was very helpful. I got everything sorted and read in, the only thing I'm having trouble with is reading in one word at a time and then applying the hash function. I just can not get in one word at a time then add the Unicode values of the letters, I think I'm confusing myself. please if you could give me any advice on what to do that would be greatly appreciated.

Thanks so much!

Thanks so much! That was very helpful. I got everything sorted and read in, the only thing I'm having trouble with is reading in one word at a time and then applying the hash function. I just can not get in one word at a time then add the Unicode values of the letters, I think I'm confusing myself. please if you could give me any advice on what to do that would be greatly appreciated.

Thanks so much!

Iterate through all of the values of the String using the String method String.charAt(int index) and sum them up then provide the hash function for the sum of the Unicode number values of the characters in the String.

All this requires is a simple cast during the summation.

public class StringSummation_Unicode{

	public static long sum(String arg){
		int total = 0;
		for(int i = 0; i < arg.length(); i++){
			total += (long)arg.charAt(i);
		}
		return total; // returns the sum of the chars after cast
	}

	public static void main(String... args){
		String myString = "AAA"; // sum is expected to be 65 * 3
		System.out.println(StringSummation_Unicode.sum(myString)); // 195 - successful
	}
}
This article has been dead for over six months. Start a new discussion instead.