First here is what I need to accomplish:
---------------------------------------------
The program

In this program you will create a direct access file on a simulated disk. A number of sectors will be allocated for the file and records will be written to the sectors using hashing. The sectors will be buckets, that is the hash function will give numbers which identify sectors, not individual records. When buckets fill, overflow buckets (sectors) will be allocated to hold the overflowing records.

The Disk class

You will create a Disk class. The disk used for this program will then be an instance of this class. A disk will be a two dimensional array of char, i.e. an array of char arrays. You should view this as an array of sectors where each sector is an array of characters. The number of sectors in a disk and the size of the sectors is arbitrary, but a default size is 10,000 sectors on a disk with 512 characters in each sector. The class definition for Disk is:

public class Disk
{
   private int sectorCount;   // sectors on the disk
   private int sectorSize;    // characters in a sector
   private char[][] store;    // all disk data is stored here
   public Disk()    // for default sectorCount and sectorSize
   {}
   public Disk(int sectorCount, int sectorSize)
   {}
   public void readSector(int sectorNumber, char[] buffer)   // sector to 
   {}                                                        // buffer
   public void writeSector(int sectorNumber, char[] buffer)  // buffer to
   {}                                                        // sector
   public int getSectorCount()
   {
      return sectorCount;
   }
   public int getSectorSize()
   {
      return sectorSize;
   }
}

sectorCount is the number of sectors in the disk and sectorSize is the number of characters in a sector. char[][] store is a reference to the 2-dimensional array which must be allocated by the constructor.

Reading and writing to the disk

You can only read and write to the disk a sector at a time. A disk buffer which is a character array of the same size as a sector must be available. The writeSector method will copy the contents of its second parameter (which should be your disk buffer) to the sector whose number is the first argument. The previous contents of the sector will be overwritten. The readSector method will copy the contents of the sector whose number is the first parameter to the character array which is the second parameter. All access to the disk will be through these two methods. (Note: The disk buffer you will use will be part of the file class described below.)

Direct access files

A direct access file stores records in such a way that a record can be read or written to the file without reading or writing the entire file. Each record must contain a unique key field. The value of this key field (the "key") will be used to determine where the record is stored on the disk. The record can then be read given only the key. The file is implemented as a hash table using sectors for buckets. The class definition for directFile is:

public class directFile
{
   public directFile(Disk disk, int recordSize, int keySize,
                     int firstAllocated, int bucketsAllocated)
   {}
   public boolean insertRecord(char[] record)
   {}   
   public boolean findRecord(char[] record)
   {}   
   private int hash(char[] key)
   {}
   private Disk disk;             // disk on which the file will be written
   private char[] buffer;         // disk buffer
   private int recordSize;        // in characters
   private int keySize;           // in characters
   private int recordsPerSector;
   private int firstAllocated;    // sector number
   private int bucketsAllocated;  // buckets originally allocated   
   private int firstOverflow;     // sector number
   private int overflowBuckets;   // count of overflow buckets in use
}

To insert a record (insertRecord method) the hash method takes the key for its parameter and returns a hash value in the range 0 ... (bucketsAllocated - 1). The record is then placed in the sector corresponding to the hash value. This is done by reading the sector from disk into the disk buffer (the buffer field of the directFile), writing the record into the first available position in the buffer, and then writing the buffer back to disk.

If the sector is full, the record is placed in the first overflow sector which is not full. Overflow sectors are allocated when needed and records are stored in them in the order in which the insertions occur. To allocate an overflow sector just increment the overflowBuckets field. If it is the first bucket allocated you must also assign the sector number to the firstOverflow field.

If a record already exists in the file with the given key no insertion will be done but insertRecord will return false. If an insertion is done true will be returned.

To read a record (findRecord method) given its key, hash the key, read the corresponding sector, and copy the record from its place in the disk buffer into the (char[]) parameter of findRecord and true is returned. If the record is not found in this sector, but the sector is not full, the record is not in the table and false is returned. If the record is not found in this sector and the sector is full you must search the overflow sectors.

Note that the parameter passed to findRecord is a reference to a character array which can hold an entire record. findRecord will only use the first part of this array (the key) to do its search. It will then copy the entire record (if found) into the array when it returns.
------------------------------------------------------------------

So if I'm understanding this correctly the Hash function will take a char array of length 27 which is the name of the mountain, and then based on that the hash function will generate a hash value somewhere between 0 and 599?

Creating a function to do this is where I'm lost. I've read many articles online and I still don't get it. If anyone could help get me started that would be great or post links to good tutorials/examples.

I just don't understand how I would have each key generate a different hash value and its REALLY frustrating lol. But thanks for any help/input!

Edited 7 Years Ago by __avd: Added [code] tags. Encase your code in: [code] and [/code] tags.

Here is my current hash function...any thoughts/help would be great.

private int hash(char[] key){
        int hashVal = 0;
        int tableSize = disk.getSectorCount();
        for(int i = 0; i < key.length-1; i++){
            hashVal = 37 * hashVal + key[i];
        }
        hashVal %= tableSize;
        System.out.println("Hash Value is: " + hashVal);

        return hashVal;
    }

Modify the hash method to

private int hash(char[] key){
        int hashVal = 1;
        int prime = 31;

        for(int i = 0; i < key.length-1; i++){
            hashVal = prime * hashVal + key[i];
        }
        
        System.out.println("Hash Value is: " + hashVal);
 
        return hashVal;
    }

You don't require those statements.
The given code will generate a hash value for you which you can use as key.

Edited 7 Years Ago by lookof2day: n/a

I thought the hash function is supposed to give the number of the sector to which the first record will be saved.

My understanding of what the hash function was supposed to do was that given a key, in this case a char array containing the name of the mountain, and then perform some operation(s) on that key and generate a hashval int that is the sector number to store that mountain data. If I'm not thinking of this correctly any clarification would be great. Thanks!

You are correct about this. But I cannot understand why you need firstAllocated to show you where the start is if you know that your hash table is the sectors between 0 and 599. Also the first overflow sector is going to start at 600. Why do you need firstOverflow? I mean i can see why but shouldn't it be private final int or something?

It looks like i forgot to post part of the assignment I think this might clear up those two variables? It looks like since I'm only allocating 600 sectors at first the overflow would be if I need more sectors? Though again this doesn't make sense since I'm using a char [][] array wouldn't I have to complete recreate the current char [] [] that is storing all the data?


---------------------------------------------------
The test program

You will create an instance of disk and an instance of directFile which uses that disk. The disk will have 2000 sectors, each having 512 characters. The file will be created initially allocating 600 (non-overflow) sectors and will begin at sector 0.

The records you store will store information on mountains. A record will consist of 60 characters and will have three fields:

name: 27 characters (the key field)
country: 27 characters
altitude (in feet): 6 characters
The characters will be either normal printing characters or NULLs ('\000') which will fill out the remainder of the fields. Altitudes will be stored as character strings, not ints.
Your program will open a (real) file containing records in the format:


Shasta, Mount#United States#14162
with each record on its own line. The program will reformat each line from the input into the 60-character record format and insert it into your direct file. The program will then go into a loop displaying a menu offering the choices:
Insert new record
Find record
Quit
and accept the user's choice. If the user chooses "Insert new record" the program will prompt for the three fields, format a 60-character record with the data, And insert the record. The program will display whether the insertion was successful or not. If the user chooses "Find record" the program will prompt for the mountain's name (the key) and will search for a record with that key. If a record is found its fields will be displayed in the format:

Shasta, Mount, United States, altitude: 14162 ft.
If no record is found a message to that effect will be displayed. The program will run until the user chooses "Quit."

And for this hash function should I not factor in the null values in the key? Just the locations in the key that have actual char values?

And the hash function I have now gives me a number wayyyy to big for me to use. Here is the current function:

private int hash(char[] key){
        int hashVal = 0;
        int prime = 31;
        int i = 0;
        //for(i = 0; i < key.length-1; i++){
        while(i < key.length-1){

            System.out.println("Hash Val: " + hashVal + "and i val: " + i);
            //Passes over null values in the key.
            if(key[i] == '\u0000'){
                while(key[i] == '\u0000' && i < key.length-1){
                    i++;}
                
            }
            hashVal = prime * hashVal + key[i];
            i++;
        }
        
        System.out.println("Hash Value is: " + hashVal);

        return hashVal;
    }

and here is a sample terminal session:
Okay well for some reason its not letting me copy all of it so i took a screen shot it should be attached.

I'm not sure how I should modify the hash function to give me a valid hashVal.

"The program will reformat each line from the input into the 60-character record format and insert it into your direct file. "
I think this means that first you reformat it and then do insertion which uses the hash function.

Right that's what I'm doing but using a 27 char key even without including spaces(nulls) I get a hashVal way too big. I don't see how I would incorporate that many values into generating a valid hashVal right now I have it limited to the first three values of the key but What I get from the assignment is to use all available values, unless I didnt read it correctly.

Your first hash function was correct. The only thing you need to fix in it is the negative values. Try Math class. You need %tableSize somewhere in the function. I don't think you can do it without it.
send me your email in a private message

This article has been dead for over six months. Start a new discussion instead.