Design issues

Please support our C++ advertiser: Intel Parallel Studio Home
Reply

Join Date: Dec 2008
Posts: 30
Reputation: codedhands is an unknown quantity at this point 
Solved Threads: 0
codedhands codedhands is offline Offline
Light Poster

Design issues

 
0
  #1
Aug 22nd, 2009
Hello,i am developing an indexer that indexes html pages.My problem lies in the aspect of creating a global index of all the stored pages in the reprository.I need a method that will be suitable for quick retrieval and insertion of new data.I have tried using STL MAP but i do not know how it will perform with so many data in it.I was told a b-tree can do it.

What are your suggestions for a problem like this?

Thanks
Reply With Quote Quick reply to this message  
Join Date: Oct 2008
Posts: 2,666
Reputation: adatapost has much to be proud of adatapost has much to be proud of adatapost has much to be proud of adatapost has much to be proud of adatapost has much to be proud of adatapost has much to be proud of adatapost has much to be proud of adatapost has much to be proud of adatapost has much to be proud of adatapost has much to be proud of 
Solved Threads: 476
Moderator
adatapost's Avatar
adatapost adatapost is offline Offline
Posting Maven

Re: Design issues

 
0
  #2
Aug 22nd, 2009
STL MAP is already a b-tree and I think it is a suitable method for your purpose.
Reply With Quote Quick reply to this message  
Join Date: Dec 2007
Posts: 360
Reputation: jencas is just really nice jencas is just really nice jencas is just really nice jencas is just really nice jencas is just really nice 
Solved Threads: 69
jencas jencas is offline Offline
Posting Whiz

Re: Design issues

 
0
  #3
Aug 25th, 2009
Usually std::map is implemented using a rb-tree (red-black-tree) and not a B-tree (Bayer-tree)
If you are forced to reinvent the wheel at least try to invent a better one!

Please use code tags - Please mark solved threads as solved
Reply With Quote Quick reply to this message  
Join Date: Oct 2006
Posts: 2,915
Reputation: niek_e has a reputation beyond repute niek_e has a reputation beyond repute niek_e has a reputation beyond repute niek_e has a reputation beyond repute niek_e has a reputation beyond repute niek_e has a reputation beyond repute niek_e has a reputation beyond repute niek_e has a reputation beyond repute niek_e has a reputation beyond repute niek_e has a reputation beyond repute niek_e has a reputation beyond repute 
Solved Threads: 304
Moderator
Featured Poster
niek_e's Avatar
niek_e niek_e is offline Offline
Cenosillicaphobiac

Re: Design issues

 
0
  #4
Aug 25th, 2009
Originally Posted by codedhands View Post
I have tried using STL MAP but i do not know how it will perform with so many data in it.
How much data is 'so many data'. And why haven't you just given it a try yet?
Reply With Quote Quick reply to this message  
Join Date: Dec 2008
Posts: 30
Reputation: codedhands is an unknown quantity at this point 
Solved Threads: 0
codedhands codedhands is offline Offline
Light Poster

Re: Design issues

 
0
  #5
Aug 25th, 2009
Originally Posted by niek_e View Post
How much data is 'so many data'. And why haven't you just given it a try yet?
Am looking at up to a billion,i haven't such data yet,but i would like to know if someone has an idea on how it will perform especially when trying to load from a file.
Reply With Quote Quick reply to this message  
Join Date: Jun 2009
Posts: 681
Reputation: Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of 
Solved Threads: 133
Tom Gunn's Avatar
Tom Gunn Tom Gunn is offline Offline
Practically a Master Poster

Re: Design issues

 
0
  #6
Aug 25th, 2009
Am looking at up to a billion,i haven't such data yet,but i would like to know if someone has an idea on how it will perform
The STL map class has performance guarantees that match a balanced search tree. The height will be small for a billion items, like 2*log(n) for a red black tree, so lookup is quick. When lookup is quick, insertion and deletion will be quick too.

Really all you need to worry about with performance is paging and locality of reference. If there are a lot of items in memory that are in different pages, the computer will be swapping pages in and out and that slows memory accesses. If there are so many items that RAM cannot hold it all, the whole machine can slow down as memory gets swapping and out of virtual memory.

especially when trying to load from a file.
Files are slow. There is no getting around that. If loading the whole file is too slow, you can use windows into the file and only process part of it at any time. Just make sure that the file format makes seeking possible.
-Tommy (For Great Justice!) Gunn
Reply With Quote Quick reply to this message  
Join Date: Dec 2008
Posts: 30
Reputation: codedhands is an unknown quantity at this point 
Solved Threads: 0
codedhands codedhands is offline Offline
Light Poster

Re: Design issues

 
0
  #7
Aug 25th, 2009
Originally Posted by Tom Gunn View Post
The STL map class has performance guarantees that match a balanced search tree. The height will be small for a billion items, like 2*log(n) for a red black tree, so lookup is quick. When lookup is quick, insertion and deletion will be quick too.

Really all you need to worry about with performance is paging and locality of reference. If there are a lot of items in memory that are in different pages, the computer will be swapping pages in and out and that slows memory accesses. If there are so many items that RAM cannot hold it all, the whole machine can slow down as memory gets swapping and out of virtual memory.


Files are slow. There is no getting around that. If loading the whole file is too slow, you can use windows into the file and only process part of it at any time. Just make sure that the file format makes seeking possible.
I love this.Is it possible to seek a map while saved in a file.How do you suggest i make windows for the file?

The main question now is,how do i ensure great perfomance when loading an STL map of a billion pages into memory.I need your suggestions..

Thanks in advance
Last edited by codedhands; Aug 25th, 2009 at 5:17 pm.
Reply With Quote Quick reply to this message  
Join Date: Jun 2009
Posts: 681
Reputation: Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of Tom Gunn has much to be proud of 
Solved Threads: 133
Tom Gunn's Avatar
Tom Gunn Tom Gunn is offline Offline
Practically a Master Poster

Re: Design issues

 
0
  #8
Aug 25th, 2009
Is it possible to seek a map while saved in a file.How do you suggest i make windows for the file?
You can seek in a file, and load from a file into a map. Unless you want to use a third party external B-tree library, that is probably as good as it gets. As for the window idea, it is nothing more than looking at a smaller part of the file at once, and loading another small part when searching leaves that view.

The main question now is,how do i ensure great perfomance when loading an STL map of a billion pages into memory.
There is no such thing as great performance working with huge files or huge amounts of files that I know of. How do you know the performance is not acceptable when your code has not worked with that amount yet? Are you sure you are not optimizing prematurely?
-Tommy (For Great Justice!) Gunn
Reply With Quote Quick reply to this message  
Join Date: Dec 2008
Posts: 30
Reputation: codedhands is an unknown quantity at this point 
Solved Threads: 0
codedhands codedhands is offline Offline
Light Poster

Re: Design issues

 
0
  #9
Aug 25th, 2009
Originally Posted by Tom Gunn View Post
Are you sure you are not optimizing prematurely?
I guess i am.thanks
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the C++ Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC