Hi ,
I need to implement efficient search algoritham for comma separated list of MSISDN's that needs to be loaded in memory (left side of '|') character.
Followed by the comma separated series of MSISDN's that needs to be searched in the above huge list(millions)(right side of '|' character).

Example:
9845012345, 9845054321, 9845011111|9845011110, +919845012345, 09845054321

I want to implement this in c++ . Can anybody suggest what algoritham wiil be best suited for the same as here searched MSISDN can be expected in millions. I appreaciate your suggestions.

The standard solution to this is to use an associative container to store the list of numbers. One obvious option is to use std::set which relies on a binary search tree and will achieve look-up times of O(log(N)), as is typical of binary search trees. In your case, however (numbers and millions of records), it might be more appropriate to use a hash-table implementation, which, in the standard C++, is implemented in the std::unordered_set, which will acheive look-up times of O(1) through a hashing function. To really optimize the performance, you would probably need to design a good hashing function that is tailored for your application (MSISDN numbers).

There are other options, but I think that a hash-table is definitely the way to go in this case.

Thanks for your replay. I have two question to you.
(1) what should be strategy to design a good hashing function.
(2) Is it possible to use well design Hash function with unordered_set as unordered_set will have its own hash function .

(1) what should be strategy to design a good hashing function.

I am not an expert on hashing functions. The basic idea is that it should be a good way to map all the numbers you are storing to a compact and non-clashing set of numbers. The standard containers will use a basic one-size-fits-all hashing function. For example, if you are storing int objects, and let's say those are 32bit, then the range is −2,147,483,648 to 2,147,483,647, and the default hashing function will probably make the assumption that the integers you give to store will be uniformly distributed over that entire range. So, if your numbers are between 0 and 1000, then that hash function is wildly inappropriate.

This is all a matter of probability distributions of the data and minimizing collisions while maintaining compactness. As explained in the link I provided: hashing is an art. And I can't claim to know much about it.

(2) Is it possible to use well design Hash function with unordered_set as unordered_set will have its own hash function .

Yes. The standard containers like unordered_set take a template argument to specify the hash function to use. See the reference page. Notice the "Hash" argument to the template. By default, the std::hash functor is used, see docs.

This article has been dead for over six months. Start a new discussion instead.