one suggestion: load them into a vector, sort them, then use a binary search algorithm to search for given string. But I suspect that is something like what a would do for you.
Another suggestion: If the file that contains the 1.5 million records rarely changes, don't read them into memory. Write another small program that will be run only when the primary file changes. It will create an index file that contains fixed-length records with key string and integer offset into primary file where the record starts, sort the key file. Then in the main program you are writing use binary search to search the keys in the file. Since the index file will already be sorted when your program starts, you could just read the whole thing into a vectory and then do in-memory binary search. C++ std::string is a nice class, but will kill a program that uses millions of them. When reading the structure below don't convert the strings to std::string, leave them as c-style strings because they will load into vector a lot quicker.
struct keys
{
char keystr[255];
unsigned long offset;
}
Ancient Dragon
Retired & Loving It
30,046 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,342
The best data structure to keep track of millions of record (i mean there is no such best but still...) B Trees.
B-Trees are specifically designed for managing indexes on secondary storage such as hard disks, compact discs, and so on, providing efficient insert, delete, and search operations. Like binary search trees, B-Trees contain nodes. Unlike binary search trees, however, the nodes of a
B-Tree contain not one, but multiple, keys, up to some defined maximum—usually determined by the size of a disk block. The keys in a node are stored in sorted order, with an associated child node holding keys that sort lower than it—every nonleaf node containing k keys must have k+1 children.
So if poss you can try to implement a class for B-Trees since they exhibit excellent performance where numerous records are concerned. Most Databases like MySQL also use some kind or variation of B+ Trees.
An implementation for ur reference can be found here:
http://www.cse.mrt.ac.lk/lecnotes/cs5224/Assignment5/Assignment5/index.html
Hope it helped, bye.[IMG]file:///C:/DOCUME%7E1/sos/LOCALS%7E1/Temp/moz-screenshot-1.jpg[/IMG]
[IMG]file:///C:/DOCUME%7E1/sos/LOCALS%7E1/Temp/moz-screenshot.jpg[/IMG]
~s.o.s~
Failure as a human
11,938 posts since Jun 2006
Reputation Points: 3,281
Solved Threads: 733
Then maybe you are looking for something along the lines of "Ternary Trees" which are used in games like Scrabble and solving crosswords which also require wildcard matching. Maybe you would want to look here:
http://www.cs.princeton.edu/~rs/strings/
Hope it helped, bye.
~s.o.s~
Failure as a human
11,938 posts since Jun 2006
Reputation Points: 3,281
Solved Threads: 733