I'm creating a code which will map two distict strings.

I used two ways for that:
1) created a MAP and added raw data to it from file.
It is not that efficient.:sad:

2) created an object holding mapped values:
eg.

class c
{
string str1,str2;
public:
/// opearations
};
and Added these objects to VECTOR.

I have to search it sequentially As there are around 1.5 million enrties or objects
.

Can enyone show me the way to improve my program.
Or some other approach towards the problem?:rolleyes:

Does anybody having a function which takes the input as STL object reference and file path And load the data to STL object.
Also include exception handellings.

Recommended Answers

All 4 Replies

one suggestion: load them into a vector, sort them, then use a binary search algorithm to search for given string. But I suspect that is something like what a <map> would do for you.

Another suggestion: If the file that contains the 1.5 million records rarely changes, don't read them into memory. Write another small program that will be run only when the primary file changes. It will create an index file that contains fixed-length records with key string and integer offset into primary file where the record starts, sort the key file. Then in the main program you are writing use binary search to search the keys in the file. Since the index file will already be sorted when your program starts, you could just read the whole thing into a vectory and then do in-memory binary search. C++ std::string is a nice class, but will kill a program that uses millions of them. When reading the structure below don't convert the strings to std::string, leave them as c-style strings because they will load into vector a lot quicker.

struct keys
{
   char keystr[255]; 
   unsigned long offset;
}

The best data structure to keep track of millions of record (i mean there is no such best but still...) B Trees.
B-Trees are specifically designed for managing indexes on secondary storage such as hard disks, compact discs, and so on, providing efficient insert, delete, and search operations. Like binary search trees, B-Trees contain nodes. Unlike binary search trees, however, the nodes of a
B-Tree contain not one, but multiple, keys, up to some defined maximum—usually determined by the size of a disk block. The keys in a node are stored in sorted order, with an associated child node holding keys that sort lower than it—every nonleaf node containing k keys must have k+1 children.

So if poss you can try to implement a class for B-Trees since they exhibit excellent performance where numerous records are concerned. Most Databases like MySQL also use some kind or variation of B+ Trees.

An implementation for ur reference can be found here:
http://www.cse.mrt.ac.lk/lecnotes/cs5224/Assignment5/Assignment5/index.html

Hope it helped, bye.[IMG]file:///C:/DOCUME%7E1/sos/LOCALS%7E1/Temp/moz-screenshot-1.jpg[/IMG]
[IMG]file:///C:/DOCUME%7E1/sos/LOCALS%7E1/Temp/moz-screenshot.jpg[/IMG]

I also thought about using the B-tree only. But problem is with the string match.

The key, which is used for searching is string
the format is AB12333 or DB4839.

I'm not getting, what should be the maximimum degree of each node.

I 'm supposed to implement the the wild match also.
ie for A* output may be AB3847, AG48783, A62674.....
and AB?473 will yield AB1367, AB2677,

Then maybe you are looking for something along the lines of "Ternary Trees" which are used in games like Scrabble and solving crosswords which also require wildcard matching. Maybe you would want to look here:

http://www.cs.princeton.edu/~rs/strings/

Hope it helped, bye.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.