943,866 Members | Top Members by Rank

Ad:
  • C++ Discussion Thread
  • Unsolved
  • Views: 476
  • C++ RSS
Aug 22nd, 2009
0

Design issues

Expand Post »
Hello,i am developing an indexer that indexes html pages.My problem lies in the aspect of creating a global index of all the stored pages in the reprository.I need a method that will be suitable for quick retrieval and insertion of new data.I have tried using STL MAP but i do not know how it will perform with so many data in it.I was told a b-tree can do it.

What are your suggestions for a problem like this?

Thanks
Similar Threads
Reputation Points: 10
Solved Threads: 0
Light Poster
codedhands is offline Offline
30 posts
since Dec 2008
Aug 22nd, 2009
0

Re: Design issues

STL MAP is already a b-tree and I think it is a suitable method for your purpose.
Moderator
Reputation Points: 2136
Solved Threads: 1228
Posting Genius
adatapost is offline Offline
6,527 posts
since Oct 2008
Aug 25th, 2009
0

Re: Design issues

Usually std::map is implemented using a rb-tree (red-black-tree) and not a B-tree (Bayer-tree)
Reputation Points: 395
Solved Threads: 71
Posting Whiz
jencas is offline Offline
362 posts
since Dec 2007
Aug 25th, 2009
0

Re: Design issues

Click to Expand / Collapse  Quote originally posted by codedhands ...
I have tried using STL MAP but i do not know how it will perform with so many data in it.
How much data is 'so many data'. And why haven't you just given it a try yet?
Moderator
Featured Poster
Reputation Points: 4142
Solved Threads: 394
Industrious Poster
Nick Evan is offline Offline
4,132 posts
since Oct 2006
Aug 25th, 2009
0

Re: Design issues

Click to Expand / Collapse  Quote originally posted by niek_e ...
How much data is 'so many data'. And why haven't you just given it a try yet?
Am looking at up to a billion,i haven't such data yet,but i would like to know if someone has an idea on how it will perform especially when trying to load from a file.
Reputation Points: 10
Solved Threads: 0
Light Poster
codedhands is offline Offline
30 posts
since Dec 2008
Aug 25th, 2009
0

Re: Design issues

Quote ...
Am looking at up to a billion,i haven't such data yet,but i would like to know if someone has an idea on how it will perform
The STL map class has performance guarantees that match a balanced search tree. The height will be small for a billion items, like 2*log(n) for a red black tree, so lookup is quick. When lookup is quick, insertion and deletion will be quick too.

Really all you need to worry about with performance is paging and locality of reference. If there are a lot of items in memory that are in different pages, the computer will be swapping pages in and out and that slows memory accesses. If there are so many items that RAM cannot hold it all, the whole machine can slow down as memory gets swapping and out of virtual memory.

Quote ...
especially when trying to load from a file.
Files are slow. There is no getting around that. If loading the whole file is too slow, you can use windows into the file and only process part of it at any time. Just make sure that the file format makes seeking possible.
Reputation Points: 1446
Solved Threads: 135
Practically a Master Poster
Tom Gunn is offline Offline
681 posts
since Jun 2009
Aug 25th, 2009
0

Re: Design issues

Click to Expand / Collapse  Quote originally posted by Tom Gunn ...
The STL map class has performance guarantees that match a balanced search tree. The height will be small for a billion items, like 2*log(n) for a red black tree, so lookup is quick. When lookup is quick, insertion and deletion will be quick too.

Really all you need to worry about with performance is paging and locality of reference. If there are a lot of items in memory that are in different pages, the computer will be swapping pages in and out and that slows memory accesses. If there are so many items that RAM cannot hold it all, the whole machine can slow down as memory gets swapping and out of virtual memory.


Files are slow. There is no getting around that. If loading the whole file is too slow, you can use windows into the file and only process part of it at any time. Just make sure that the file format makes seeking possible.
I love this.Is it possible to seek a map while saved in a file.How do you suggest i make windows for the file?

The main question now is,how do i ensure great perfomance when loading an STL map of a billion pages into memory.I need your suggestions..

Thanks in advance
Last edited by codedhands; Aug 25th, 2009 at 5:17 pm.
Reputation Points: 10
Solved Threads: 0
Light Poster
codedhands is offline Offline
30 posts
since Dec 2008
Aug 25th, 2009
0

Re: Design issues

Quote ...
Is it possible to seek a map while saved in a file.How do you suggest i make windows for the file?
You can seek in a file, and load from a file into a map. Unless you want to use a third party external B-tree library, that is probably as good as it gets. As for the window idea, it is nothing more than looking at a smaller part of the file at once, and loading another small part when searching leaves that view.

Quote ...
The main question now is,how do i ensure great perfomance when loading an STL map of a billion pages into memory.
There is no such thing as great performance working with huge files or huge amounts of files that I know of. How do you know the performance is not acceptable when your code has not worked with that amount yet? Are you sure you are not optimizing prematurely?
Reputation Points: 1446
Solved Threads: 135
Practically a Master Poster
Tom Gunn is offline Offline
681 posts
since Jun 2009
Aug 25th, 2009
0

Re: Design issues

Click to Expand / Collapse  Quote originally posted by Tom Gunn ...
Are you sure you are not optimizing prematurely?
I guess i am.thanks
Reputation Points: 10
Solved Threads: 0
Light Poster
codedhands is offline Offline
30 posts
since Dec 2008

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in C++ Forum Timeline: openCV Image Transformation
Next Thread in C++ Forum Timeline: Writing algorithms help





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC