Hi folks ,

i am working on hastable nowdays and looking ways to optimize my
code . Please comment on my below observations and let me know if you
have any suggestions.

I have an existing implementation of chained hash table (A), which
uses singly link list for chaining.

typedef struct hnode HNODE;
struct hnode{
HNODE *link;
};

typedef struct {
unsigned long (*hash)(const void *);
int (*compare)(const void *, const void *);
const void *(*keyof)(const HNODE *);
HNODE **table;
unsigned int buckets;
unsigned int count;
} HASH;


The existing implementation caters a huge amount of data.
Well the deletion or search of a hash node based on key is not a quick
operation . As this would require
calculating the index using hash func , and then searching the linked
list on that index for the node based on values.

I would like to optimize the search and deletion .

so i thought to modify my existing hash table (A)implemetaion .
1 . using a doubly link list for chaining .
2 . using a separate data stucture , preferably an other hash table
(B)
which would store a address of node inserted in hashtable (A) .
Nodes in B would be inserted parallel to A .

Nodes in A would be deleted using B , and the searching of node would
also done using B .
Nodes in B would be deleted parallel to A .


Presently A has hash node structure as :

I would like to perform the modification in hnode as below

struct hnode{
HNODE *link_pre;
HNODE *link_post;
};

Would this give me the desired outcome.

Thanks
Aki

meet123321
Deleted Member

Well the deletion or search of a hash node based on key is not a quick
operation

That's always a possibility, but with a decent hash function you won't see enough collisions for the chains to grow especially long.

1 . using a doubly link list for chaining

This adds unnecessary overhead. If you're adding another link, you may as well use a balanced tree and ensure logarithmic search performance.

2 . using a separate data stucture , preferably an other hash table

Tricky to get right because now you need another hash function to handle collisions from the first hash function. If there's a high enough rate of collisions that a chained implementation becomes noticeably slow, you'll likely see similar issues with a sub hash table. Also keep in mind that your structure is (or appears to be) written such that the client code supplies a hash function. For a double hash, that starts to be a significant workload for the user of your library.

My question to you is this: Have you tested your hash table and confirmed that the load factor and chain lengths is so degenerate as to make searching the chains noticeably slow? If so, have you used multiple "good" hash algorithms and confirmed that they all perform poorly? I'm willing to bet that you haven't.

This article has been dead for over six months. Start a new discussion instead.