String Hash Function

Please support our C++ advertiser: Intel Parallel Studio Home
Reply

Join Date: Sep 2009
Posts: 19
Reputation: killerqb is an unknown quantity at this point 
Solved Threads: 0
killerqb killerqb is offline Offline
Newbie Poster

String Hash Function

 
0
  #1
Oct 21st, 2009
I am trying to optimize my hash function, for a separate chaining hash table that I have created from scratch. My hash function is as follows:
  1. int hash( const string &key, int tableSize)
  2. {
  3. int hashVal = 0;
  4. for(int i = 0; i<key.length(); i++)
  5. hashVal = 37*hashVal+key[i];
  6. hashVal %= tableSize;
  7. if(hashVal<0)
  8. hashVal += tableSize;
  9. return hashVal;
  10. }
I was wondering if one could point me to either a better hash function or how I should go about modifying this function.
Reply With Quote Quick reply to this message  
Join Date: Jun 2009
Posts: 681
Reputation: Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of 
Solved Threads: 132
Tom Gunn's Avatar
Tom Gunn Tom Gunn is offline Offline
Practically a Master Poster
 
0
  #2
Oct 21st, 2009
SAX was designed for hashing strings and is supposed to be really good for it:
  1. unsigned hash(string const& key, unsigned tableSize)
  2. {
  3. unsigned hashVal = 0;
  4.  
  5. for (int x = 0; x < key.length(); ++x)
  6. {
  7. hashVal ^= (hashVal << 5) +
  8. (hashVal >> 2) +
  9. key[x];
  10. }
  11.  
  12. return hashVal % tableSize;
  13. }
Your algorithm is a common variation of a good hash that uses 33 instead of 37. It should work well in the general case.
-Tommy (For Great Justice!) Gunn
Reply With Quote Quick reply to this message  
Join Date: Sep 2009
Posts: 19
Reputation: killerqb is an unknown quantity at this point 
Solved Threads: 0
killerqb killerqb is offline Offline
Newbie Poster
 
0
  #3
Oct 21st, 2009
Thanks, but when I implemented your hash function it took nearly twice as long. Any other functions I should give a whirl?
Reply With Quote Quick reply to this message  
Join Date: Jun 2009
Posts: 681
Reputation: Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of 
Solved Threads: 132
Tom Gunn's Avatar
Tom Gunn Tom Gunn is offline Offline
Practically a Master Poster
 
0
  #4
Oct 21st, 2009
Thanks, but when I implemented your hash function it took nearly twice as long.
OK, by optimize you mean speed and not collisions. Your algorithm is about as fast as it gets without having excessive collisions or doing micro optimizations. Are you trying to increase the speed because you have proven it to be too slow with a profiler? If not, do not bother because programmers are very bad at guessing where performance bottlenecks are.
-Tommy (For Great Justice!) Gunn
Reply With Quote Quick reply to this message  
Join Date: Jun 2005
Posts: 2,044
Reputation: Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice 
Solved Threads: 139
Team Colleague
Rashakil Fol's Avatar
Rashakil Fol Rashakil Fol is offline Offline
Super Senior Demiposter
 
0
  #5
Oct 21st, 2009
Originally Posted by killerqb View Post
Thanks, but when I implemented your hash function it took nearly twice as long. Any other functions I should give a whirl?
Is your call to .length() getting called every time or are you assuming that will get optimized away?
All my posts may be redistributed under the GNU Free Documentation License.
Reply With Quote Quick reply to this message  
Join Date: Sep 2009
Posts: 19
Reputation: killerqb is an unknown quantity at this point 
Solved Threads: 0
killerqb killerqb is offline Offline
Newbie Poster
 
0
  #6
Oct 21st, 2009
Yes, sorry I meant speed by optimize, I guess I did not clarify. I'm kinda new to hash functions.
Reply With Quote Quick reply to this message  
Join Date: Sep 2009
Posts: 19
Reputation: killerqb is an unknown quantity at this point 
Solved Threads: 0
killerqb killerqb is offline Offline
Newbie Poster
 
0
  #7
Oct 21st, 2009
I was trying to play around with the FNV function but my program keeps seg faulting

static const unsigned InitialFNV = 2166136261;
static const unsigned FNVMultiple = 16777619;

// /* Fowler / Noll / Vo (FNV) Hash */
unsigned HashTable::hashFunc(const string &s)const
{
unsigned hash = InitialFNV;
for(unsigned i = 0; i < s.length(); i++)
{
hash = hash ^ (s[i]); /* xor the low 8 bits */
hash = hash * FNVMultiple; /* multiply by the magic number */
}
return hash;
}
Reply With Quote Quick reply to this message  
Join Date: Sep 2009
Posts: 19
Reputation: killerqb is an unknown quantity at this point 
Solved Threads: 0
killerqb killerqb is offline Offline
Newbie Poster
 
0
  #8
Oct 21st, 2009
This was actually the original function I found online, but I was trying to modify it in order to get it to work with my code. However I am unfamiliar with size_t and the U at the end of the InitialFNV initialization.
So can someone explain the U and size_t please.
  1. static const size_t InitialFNV = 2166136261U;
  2. static const size_t FNVMultiple = 16777619;
  3.  
  4. /* Fowler / Noll / Vo (FNV) Hash */
  5. size_t myhash(const string &s)
  6. {
  7. size_t hash = InitialFNV;
  8. for(size_t i = 0; i < s.length(); i++)
  9. {
  10. hash = hash ^ (s[i]); /* xor the low 8 bits */
  11. hash = hash * FNVMultiple; /* multiply by the magic number */
  12. }
  13. return hash;
  14. }
Last edited by killerqb; Oct 21st, 2009 at 1:43 pm.
Reply With Quote Quick reply to this message  
Join Date: Jun 2005
Posts: 2,044
Reputation: Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice 
Solved Threads: 139
Team Colleague
Rashakil Fol's Avatar
Rashakil Fol Rashakil Fol is offline Offline
Super Senior Demiposter
 
1
  #9
Oct 21st, 2009
The U means that it's an unsigned integer literal. size_t is an unsigned integer type that's guaranteed to be able to hold the size of an array. But not a std::string. If you're paranoid you should use std::string::size_type. Don't be.

Your program is not segfaulting because of the hash function. The error is elsewhere.
All my posts may be redistributed under the GNU Free Documentation License.
Reply With Quote Quick reply to this message  
Reply

Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC