954,496 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Hash Table for Tracking Multiple Pieces of Information?

Hi folks,

General question here... I'm writing a program which reads a ton of source data, crunches the numbers, and outputs a few nice summary reports. The source data is a lot of individual records:

00001,Item1,Item2,Item3,Item4,Item5,...
00002,Item1,Item2,Item3,Item4,Item5,...
00003,Item1,Item2,Item3,Item4,Item5,...

Originally, I created an object called "Record," which stored each Item. But I ran into serious trouble when I realized there are literally MILLIONS of records. My machine simply doesn't have enough system memory to handle the load. So I need a completely new approach to this problem.

Someone mentioned to me that my program should learn about these records by creating a hash table on-the-fly. Okay, sounds great. So I read up about hash tables in general and C++'s map function in particular, but I don't see a direct way to use these to address my problem. A hash table/map function would seem great if you wanted to track large amount of data which seperates into two items of information (person's name and phone number, for example.) What do you do when you have ten, twenty, maybe more thirty items of information you need to track?

So I'm just generally asking... does anyone see a way to do what I'm trying to do? I'm just asking for some general brainstorming ideas...

Many thanks!

phummon
Newbie Poster
24 posts since Apr 2010
Reputation Points: 10
Solved Threads: 0
 

It's very rare when one absolutely needs to keep everything loaded at the same time. I'd recommend changing your logic so that you only need to keep a few records in memory at once.

Narue
Bad Cop
Administrator
15,460 posts since Sep 2004
Reputation Points: 6,464
Solved Threads: 1,401
 

As your example shows you have a number followed by a bunch of data. You could use a map the is defined as map<int, vector<something> > . this would allow the number in each line to be the key and then the vector would hold all of the other data in the line associated with that number.

NathanOliver
Veteran Poster
1,084 posts since Apr 2009
Reputation Points: 215
Solved Threads: 189
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: