Hi folks,

General question here... I'm writing a program which reads a ton of source data, crunches the numbers, and outputs a few nice summary reports. The source data is a lot of individual records:

00001,Item1,Item2,Item3,Item4,Item5,...
00002,Item1,Item2,Item3,Item4,Item5,...
00003,Item1,Item2,Item3,Item4,Item5,...

Originally, I created an object called "Record," which stored each Item. But I ran into serious trouble when I realized there are literally MILLIONS of records. My machine simply doesn't have enough system memory to handle the load. So I need a completely new approach to this problem.

Someone mentioned to me that my program should learn about these records by creating a hash table on-the-fly. Okay, sounds great. So I read up about hash tables in general and C++'s map function in particular, but I don't see a direct way to use these to address my problem. A hash table/map function would seem great if you wanted to track large amount of data which seperates into two items of information (person's name and phone number, for example.) What do you do when you have ten, twenty, maybe more thirty items of information you need to track?

So I'm just generally asking... does anyone see a way to do what I'm trying to do? I'm just asking for some general brainstorming ideas...

Many thanks!

Recommended Answers

All 2 Replies

It's very rare when one absolutely needs to keep everything loaded at the same time. I'd recommend changing your logic so that you only need to keep a few records in memory at once.

As your example shows you have a number followed by a bunch of data. You could use a map the is defined as map<int, vector<something> > . this would allow the number in each line to be the key and then the vector would hold all of the other data in the line associated with that number.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.