954,492 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Design issues

Hello,i am developing an indexer that indexes html pages.My problem lies in the aspect of creating a global index of all the stored pages in the reprository.I need a method that will be suitable for quick retrieval and insertion of new data.I have tried using STL MAP but i do not know how it will perform with so many data in it.I was told a b-tree can do it.

What are your suggestions for a problem like this?

Thanks

codedhands
Light Poster
30 posts since Dec 2008
Reputation Points: 10
Solved Threads: 0
 

STL MAP is already a b-tree and I think it is a suitable method for your purpose.

__avd
Posting Genius (adatapost)
Moderator
8,648 posts since Oct 2008
Reputation Points: 2,136
Solved Threads: 1,241
 

Usually std::map is implemented using a rb-tree (red-black-tree) and not a B-tree (Bayer-tree)

jencas
Posting Whiz
366 posts since Dec 2007
Reputation Points: 395
Solved Threads: 71
 
I have tried using STL MAP but i do not know how it will perform with so many data in it.

How much data is 'so many data'. And why haven't you just given it a try yet?

Nick Evan
Not a Llama
Moderator
10,112 posts since Oct 2006
Reputation Points: 4,142
Solved Threads: 403
 
How much data is 'so many data'. And why haven't you just given it a try yet?

Am looking at up to a billion,i haven't such data yet,but i would like to know if someone has an idea on how it will perform especially when trying to load from a file.

codedhands
Light Poster
30 posts since Dec 2008
Reputation Points: 10
Solved Threads: 0
 
Am looking at up to a billion,i haven't such data yet,but i would like to know if someone has an idea on how it will perform


The STL map class has performance guarantees that match a balanced search tree. The height will be small for a billion items, like 2*log(n) for a red black tree, so lookup is quick. When lookup is quick, insertion and deletion will be quick too.

Really all you need to worry about with performance is paging and locality of reference. If there are a lot of items in memory that are in different pages, the computer will be swapping pages in and out and that slows memory accesses. If there are so many items that RAM cannot hold it all, the whole machine can slow down as memory gets swapping and out of virtual memory.especially when trying to load from a file.
Files are slow. There is no getting around that. :) If loading the whole file is too slow, you can use windows into the file and only process part of it at any time. Just make sure that the file format makes seeking possible.

Tom Gunn
Master Poster
733 posts since Jun 2009
Reputation Points: 1,446
Solved Threads: 135
 

The STL map class has performance guarantees that match a balanced search tree. The height will be small for a billion items, like 2*log(n) for a red black tree, so lookup is quick. When lookup is quick, insertion and deletion will be quick too.

Really all you need to worry about with performance is paging and locality of reference. If there are a lot of items in memory that are in different pages, the computer will be swapping pages in and out and that slows memory accesses. If there are so many items that RAM cannot hold it all, the whole machine can slow down as memory gets swapping and out of virtual memory.

Files are slow. There is no getting around that. :) If loading the whole file is too slow, you can use windows into the file and only process part of it at any time. Just make sure that the file format makes seeking possible.

I love this.Is it possible to seek a map while saved in a file.How do you suggest i make windows for the file?

The main question now is,how do i ensure great perfomance when loading an STL map of a billion pages into memory.I need your suggestions..

Thanks in advance

codedhands
Light Poster
30 posts since Dec 2008
Reputation Points: 10
Solved Threads: 0
 
Is it possible to seek a map while saved in a file.How do you suggest i make windows for the file?


You can seek in a file, and load from a file into a map. Unless you want to use a third party external B-tree library, that is probably as good as it gets. As for the window idea, it is nothing more than looking at a smaller part of the file at once, and loading another small part when searching leaves that view.The main question now is,how do i ensure great perfomance when loading an STL map of a billion pages into memory.
There is no such thing as great performance working with huge files or huge amounts of files that I know of. How do you know the performance is not acceptable when your code has not worked with that amount yet? Are you sure you are not optimizing prematurely?

Tom Gunn
Master Poster
733 posts since Jun 2009
Reputation Points: 1,446
Solved Threads: 135
 
Are you sure you are not optimizing prematurely?

I guess i am.thanks

codedhands
Light Poster
30 posts since Dec 2008
Reputation Points: 10
Solved Threads: 0
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You