HashTable Implementation Revised

Please support our C++ advertiser: Intel Parallel Studio Home
Thread Solved

Join Date: Nov 2008
Posts: 4
Reputation: a.self is an unknown quantity at this point 
Solved Threads: 0
a.self's Avatar
a.self a.self is offline Offline
Newbie Poster

HashTable Implementation Revised

 
0
  #1
Nov 9th, 2008
*Note: its not a fully functional hashTable for technical purposes
My priority is to get it to store the strings properly. I approached this a different
yet similar way to my other post and I have gotten much farther! The problem now is that my output prints duplicates each and table. More confounding is why only these words don't output properly. I will provide the source so you can try it out for yourself.
I compiled with visual c++ 2008 & 05 and received same output. My next goal would be to
implement it as an ADT, but for now solving this issue is crucial.
Once again thanks for any help I receive.

  1. #include <iostream>
  2. #include <string>
  3. #include <fstream>
  4. #include <cctype>
  5. #include <iomanip>
  6. using namespace std;
  7. //function prototypes
  8. int hashFunction(const char *key, int keyLenght, const int HTSIZE);
  9. void LinearProbing(int upindex, string words[], string item ,int noitems[]);
  10. string StringToLower(string strToConvert);
  11.  
  12. int main()
  13. {
  14. string getString; //used to read the strings
  15. const int HTSIZE = 131; //Holds up to 130 words
  16. string hashTable[HTSIZE]; //stores unique words
  17. const char *cphashTable; //holds c-string value for processing
  18. int IndexStatusList[131]; //holds the number of times a word is inserted at its location
  19. ifstream DataIn; //filestream object
  20. int size = 0; //no use yet
  21. int lenght; //stores lenght of a string
  22. int hIndex; //stores the index in where to store a string
  23.  
  24. //will be used as a constructor when implemented as ADT
  25. //initializes both arrays. The first for empty strings, the other for ints
  26. for(int c = 0; c < HTSIZE; c++)
  27. {
  28. hashTable[c] = " ";
  29. IndexStatusList[c] = 0;
  30. }
  31.  
  32. DataIn.open("blah.txt"); //Fail Safe test
  33. if(!DataIn)
  34. {
  35. cout <<"Error opening data file\n";
  36. }
  37. else //Sucess
  38. {
  39. while(DataIn >> getString) //Gets the first string and reads until
  40. { //there is no more data to be read
  41. lenght = static_cast<int>(getString.length()); //gets lenght of a string to be used for processing
  42. cphashTable = getString.data(); //returns a c-string copy of a string
  43. hIndex = hashFunction(cphashTable,lenght,HTSIZE); //computes the index. see def below
  44. LinearProbing(hIndex,hashTable, cphashTable, IndexStatusList); //cphashTable //getString
  45. } //tried both no difference in results
  46. }
  47.  
  48. cout << "Count\t" << "Index\t" << "WORD\n"; //Title of Columns
  49.  
  50. for(int i = 0; i < HTSIZE;i++) // Outputs number of occurences followed by
  51. { // the location of the word in the hashtable
  52. if(hashTable[i] != " ") // the content of that location e.g cat
  53. {
  54. cout << IndexStatusList[i] << "\t" << i << "\t" << hashTable[i] << endl ;
  55. }
  56. }
  57.  
  58. return 0;
  59. } //end of main
  60. /* Def of Linear Probring
  61. Places the item, in this case a string in the appropriate location in the array.
  62. The location hIndex is calculated by the function hashFunction and then passed
  63. to this function as an argument. If the location to where hIndex points is empty
  64. in the hashTable insert the item there and update its number of currence:(parralel array)
  65. StatusListIndex. While the location is occupied find next available location by incrementing
  66. the current location to one until an available space is found. My current list is less than
  67. 130 elements so no need to worry about space at this time. Then insert item at this new location
  68. and update the IndexStatusList. You also need to check if at a certain location it is the same
  69. instance of that item, to update its IndexStatusList. Eg, cat is inserted into location 1,
  70. IndexStatusList is 1. Then another cat is found just update the IndexStatusList by one rather than
  71. inserting it in a new location. Cat still at loc 1 IndexStatusList is 2
  72. */
  73. void LinearProbing(int upindex, string words[], string item ,int noitems[])
  74. {
  75. item = StringToLower(item); //i tried this in hopes of being able to compare both
  76. //strings properly but im not so sure.
  77. if(item == words[upindex])
  78. {
  79. int inc = noitems[upindex]; //this section takes care of duplicates
  80. inc++;
  81. noitems[upindex] = inc;
  82. }
  83. else if(words[upindex] == " ")
  84. {
  85. words[upindex] = item; //if location is empty
  86. noitems[upindex] = 1;
  87. }
  88. else
  89. {
  90. while(noitems[upindex] != 0)
  91. {
  92. upindex = ((upindex + 1) % 131); //collision
  93. }
  94. words[upindex] = item;
  95. noitems[upindex] = 1;
  96. }
  97.  
  98.  
  99. }
  100. /*
  101. Adds the values of each character in the string then % it with HTSIZe which
  102. is to be used as hIndex. There is no distinction between lowercase and uppercase
  103. A is converted to the ascii equivalent of a
  104. ex string ab = 97+98 then lets say htsize was 3
  105. hIndex would then be 0.
  106. */
  107. int hashFunction(const char *key, int keyLenght, const int HTSIZE)
  108. {
  109. int sum = 0;
  110. for(int j =0; j < keyLenght ; j++)
  111. {
  112. sum = sum + static_cast<int>(tolower(key[j]));
  113.  
  114. }
  115. return sum = sum % HTSIZE;
  116. }
  117. //change each element of the string to lower case
  118. string StringToLower(string strToConvert)
  119. {
  120. for(unsigned int i=0;i<strToConvert.length();i++)
  121. {
  122. strToConvert[i] = tolower(strToConvert[i]);
  123. }
  124. return strToConvert;//return the converted string
  125. }
Last edited by a.self; Nov 9th, 2008 at 6:59 pm. Reason: Forgot to include source files.
Attached Files
File Type: zip Test2.zip (2.3 KB, 6 views)
~Albert
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 2,001
Reputation: ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of ArkM has much to be proud of 
Solved Threads: 343
ArkM's Avatar
ArkM ArkM is offline Offline
Postaholic

Re: HashTable Implementation Revised

 
0
  #2
Nov 10th, 2008
1. Let's suppose that you have a collision with word "table". The LinearProbing finds a new free location then places the word into this cell. But the next occurence of the word "table" gets collision again - and linearProbing finds the next free location, it does not check if the word "table" was placed in the hash table! That's why you have next occurences of the same word in resulting hash table. Don't forget to wrap around search index in a new (correct) linearProbing loop...
2. DataIn >> getString gets not a word as a sequence of letters but all chars until whitespace occured. That's why you have such items as "539," or "word" and "word.". Add a simple item preprocessing (use isalpha() from <cctype>, for example) to select true words only.
3. Use c_str() string class member functions, not data(), The last one does not garantee that it's C-string with zero byte terminator.
4. Add hash table overfow check NOW! Never ever say that "My current list is less than
130 elements so no need to worry about space at this time" !!!
5. It seems you are writing a program in C++. Where is a class (at least a struct) HashTable? Why you have wrote obviously C-style code in C++? Why linearProbe does not know about hash table capacity?..

I think you are capable to correct all these defects yourself.
Good luck!
Reply With Quote Quick reply to this message  
Join Date: Nov 2008
Posts: 4
Reputation: a.self is an unknown quantity at this point 
Solved Threads: 0
a.self's Avatar
a.self a.self is offline Offline
Newbie Poster

Re: HashTable Implementation Revised

 
0
  #3
Nov 10th, 2008
Originally Posted by ArkM View Post
1. Let's suppose that you have a collision with word "table". The LinearProbing finds a new free location then places the word into this cell. But the next occurence of the word "table" gets collision again - and linearProbing finds the next free location, it does not check if the word "table" was placed in the hash table! That's why you have next occurences of the same word in resulting hash table. Don't forget to wrap around search index in a new (correct) linearProbing loop...
2. DataIn >> getString gets not a word as a sequence of letters but all chars until whitespace occured. That's why you have such items as "539," or "word" and "word.". Add a simple item preprocessing (use isalpha() from <cctype>, for example) to select true words only.
3. Use c_str() string class member functions, not data(), The last one does not garantee that it's C-string with zero byte terminator.
4. Add hash table overfow check NOW! Never ever say that "My current list is less than
130 elements so no need to worry about space at this time" !!!
5. It seems you are writing a program in C++. Where is a class (at least a struct) HashTable? Why you have wrote obviously C-style code in C++? Why linearProbe does not know about hash table capacity?..

I think you are capable to correct all these defects yourself.
Good luck!
I seemed to have finally solved my problems!
1) My code WAS checking to see if the item was already in the table. The problem came in the logic. When a duplicate item was found, it had no choice but to think collision was occurring because it first tested to see if the location in my string array was empty but not equal to it.
2) I didn't specify, for the purpose of this program digits are to be taken as strings not integers.
3) I knew strings could be compared at a character per character basis however, i was trying to convert an array of strings which would subsequently point to the next string rather than a single character in that string. My mistake ^_^ so the need for c_str() or data() was removed from my code with a simple loop. I do agree with you using c_str() as to assert a null string terminator is included.
4)Its good habit to take care of any business such as enough memory allocated for a variable however, i was not required to do so since were short on time. That explains why linearprobing doesnt take HTSIZE as an argument.
5)My previous code was written in c++ only a few functions were in c-style.
6)My updated code is now an Abstract Data Type
Last edited by a.self; Nov 10th, 2008 at 7:31 pm.
~Albert
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:



Other Threads in the C++ Forum
Thread Tools Search this Thread



Tag cloud for C++
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC