Member Avatar for vick4523zf

Hye, I have started to read and implement Cache Oblivious Btree on RAM. Starting with the Weight Balanced BTree by Arge and Vitter. The Deletion algorithm explained says that after deletion of a leaf an ancestor will get unbalanced so to merge the ancestor with neighbor and split if the weight goes above the max range.

"Deletions. Deletions are similar to insertions. As before,we search down the tree to find which leaf w to delete. After deleting w, some ancestors of w may become unbalanced.That is, some ancestor node u at height h may have weight lower than (1/2)d^(h-1). As before, we bring the ancestors of w into balance starting from the ancestors closest to the leaves. We merge u with one of its neighbors. After merging u, however, it might now have a weight larger than its upper bound, so we immediately split it into two nodes as described in the insertion algorithm. (This may alternatively be viewed as u stealing children from its neighbor.)"

The same algo in the original paper says to use global rebuilding techniques.

My questions are:

1.What is the "Global Rebuilding" technique exactly?
2.Is the deletion algorithm correct? I used the deletion algo of Btree (as explained in the book by Thomas Cormen)and just replaced min with the minimum range of the Weight Balanced Btree required and cant seem to find where the above condition will occur.

Some clarity on the questions would really be appreciated. Any other inputs are also welcome.Thanks.

If anyone has implemented the same on windows than the code would be highly appreciated for detailed understanding.(Purely for knowledge purpose. No form of cheating or homework)

Recommended Answers

All 4 Replies

1.What is the "Global Rebuilding" technique exactly?

I would assume that it refers to a complete destruction of the tree and a complete reconstruction of it. This is what I usually call a "collect-and-rebuild" process. When you have a tree or a sub-tree that needs to be re-balanced, some traditional approaches (e.g., in red-black trees or AVL-trees) involve doing some tinkering with the structure, such as tree rotations to re-shape the tree by swapping or splitting some parent-child relationships. However, in some cases, this is too hard or not practical at all (e.g., in contiguous storage, like a cache-oblivious implementation). So, instead of trying to repair an unbalanced tree or subtree, it is easier to just collect all the node values that you want to keep (e.g., removing those you want to delete, or adding those you want to insert), destroy the tree, and re-build it back from the array of node values that you collected. In many kinds of search trees, this is the only option when you need to re-balance a tree or subtree.

It may sound like a horrible solution, because it seems really expensive to continuously destroy and rebuild the tree whenever you add or remove elements from it, but in reality, this is amortized by the fact that it is very rare that a tree becomes unbalanced enough to requires a complete reconstruction. Typically, the majority of insertions and deletions don't cause any significant imbalance in the affected subtree. But when it does, the general logic is to go up the tree to find the first unbalanced parent (unbalanced according to some balance criteria, like the weight-balanced criteria), and do a collect-and-rebuild operation on that subtree, and then continue to move up to see if any other parent was unbalanced, and so on. On very rare occasions, you end up finding that the root is unbalanced, and then, you rebuild the entire tree. In practice, this cost is amortized logarithmic (I've tested it).

I think that the authors Arge and Vitter are referring to a "lazy deletion" strategy, when they talk about "global rebuilding". This lazy-balancing strategy is a very simple yet effective means of balancing a search tree. Essentially, the idea is to not check the tree for balance at all, but instead, periodically re-construct it completely. Re-constructing a brand new tree always yields a perfectly balanced tree, unless you have a very terrible construction algorithm. And, if you have a balanced tree with N nodes, if you add something like N/2 nodes without doing any re-balancing work, the tree will probably be unbalanced, but not that much, at least, not so much that it would be noticeable in performance (you might get lookups in O(log(N)+1) instead of O(log(N)), which is nothing). So, the idea here is that you could just count the number of additions and removals, and when you have done enough of them that you would expect that, on average, the tree would be significantly unbalanced, then you just do a global re-construction of the tree, and reset the counter. This is, again, amortized logarithmic re-balancing cost, in a way similar to the way dynamic arrays have amortized constant reallocation cost. This is a very cheap and effective re-balancing strategy, that is probably why the authors mention it as a good alternative to the more "eager" balancing strategy.

The eager balancing strategy means that you check at every insertion or deletion whether there was any imbalancing generated and you immediately fix the problem by re-balancing it (as locally as possible). The so-called "self-balancing" trees are all examples of eager balancing strategies.

2.Is the deletion algorithm correct?

It seems correct to me. But I don't have much experience implementing B-trees in particular. I've done work mostly in space partitioning trees and in metric trees, such as vantage-point trees, and B-trees are not really applicable there (AFAIK). But overall, the strategy described seems correct, it is a traditional AVL-tree-style balancing strategy, where you just walk up the tree looking for subtrees to re-balance. The balance criteria must be correct because that is the entire foundation of the weight-balanced b-tree. By definition, a tree is balanced according to that criteria.

If anyone has implemented the same on windows than the code would be highly appreciated for detailed understanding.

I have a publicly available implementation (in C++) of a dynamic vantage-point tree (DVP-tree) which uses a similar balancing strategy and a collect-and-rebuild method to re-balancing sub-trees. It also relies on a separate tree container for the storage of the nodes of the tree. In practice, I mostly use a breadth-first layout tree container, because it performs the best, but I also have a von Emde Boas layout tree container (which is a cache-oblivious data-structure for a fixed-arity tree) that I can use (although, in practice, it takes a very large number of nodes to start performing better than the breadth-first layout). So, you might want to look at those. The DVP-tree implementation can be found as part of my main library here. The tree implementations can be found in an extension to the boost graph library that I'm working on, see bfl_d_ary_tree and vebl_d_ary_tree.

Other than that, I would just say, good luck to you, implementing a cache-oblivious b-tree is no small task.

commented: Good explanation Mike. +12
Member Avatar for vick4523zf

Thanks Mike that was some really helpful information.

Member Avatar for vick4523zf

do you mean to say that the van emboas layout only works for a complete binary tree? I have been trying to solve it for the weight balanced Btree and i am realising that it is next to impossible.

Further i serioualy have no clue if i am goin in the right direction. Do you have any advice on how i can verify if i am on the right path?

Member Avatar for vick4523zf

hye i have another question. The property of the WBtree by arge and vitter is
"Balance: Consider a nonroot node u at height h in the tree. (Leaves have
height 1.) The weight w(u) of u is the number of nodes in the subtree rooted
at u. This weight is bounded by
((d)^(h−1))/2 <= w(u) <= 2((d)^(h−1))"

If we use the property then my leaves will always have between 1 and 2 keys. Is that correct? I dont see the point of the Btree is that is the condition.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.