I am having trouble implementing a solution using a binary search tree that returns the key of the node based on it's rank in O(N) worst case running time.
Every idea i have (listed below) results in O(N^2) or sends me to a dead end. This almost seems impossible.
i would greatly appreciate any insight on how this might be done or if one of my ideas may have a work around.

  • Use a while loop to cycle down to the leftmost node and start from there counting upwards but there is no way to access a parent node while meeting maintaining O(N).
  • Use a placeholder node to traverse each node using a counter to countdown to the desired rank but i cant move to the parent node after checking if child's child's node is null.
  • Use the key values to cycle through instead of the ranks but this result just send gave me an O(N^2) running time as well.
  • I also thought of using an array/stack/queue but filling it is O(N) and then running through that again is another O(N) creating an O(N^2) scenario again.

This is my exisiting code that is currently O(N^2):

public int size() { // number of nodes inside this BST,
        if(root==null) { //first check if tree is empty
            return 0;
        }
        else {
            return rSize(root); //returns recursive size method below
        }
    }
    private int rSize(Node node){ //traverses through each left then right node and returns the size/rank count
        if(node==null){
            return 0;
        }
        else { 
            return (rSize(node.left) + rSize(node.right) + 1);
        }
    }

    public Key rank(int k) { //k is the desired rank of node
        Node node = select(root, k);
        if (node==null) {
            return null;
        } else {
            return node.key;
        }
    }   

    private Node rank(Node node, int k) { 
        if (node == null) return null;

        int t = rSize(node.left);
        if (t > k)
            return select(node.left, k);
        else if (t < k)
            return select(node.right, k - t - 1);
        else
            return node;
    }

What do you specifically mean by "rank"? How far down the tree it is?

A couple of things about b-trees - usually less-than values move to the left branch, and greater-than go right. Also, just start at the root and iterate: if key == node value, you are done. If key < node key, go left and continue to search. If key > node key, go right and continue until key == node key value. At each step, you increment the level you have gone (your rank I presume) until you have a match. If you don't find a match, then the rank should be -1 (not found).

In any case, look at Knuth volume 3, sorting and searching algorithms.

sorry i meant that the leftmost node will be ranked at 0 and its parent 1 etc etc.

I also thought of using an array/stack/queue but filling it is O(N) and then running through that again is another O(N) creating an O(N^2) scenario again

?
If there are two steps and each is O(n) then the whole thing is also O(n) - ie when you double n you double the run time. O(n^2) happens when you have two nested loops (or equivalent), not two consecutive ones.

Searching a b-tree (as described by rubberman) is O(log n) - each time you double n you add the same linear increment of runtime.

Edited 7 Months Ago by JamesCherrill

... after seeing another post by Teme64 I would have to qualify my previous post by saying I was assuming a balanced tree. In the worst case of an unbalanced tree you have effectively a linked list, and it would be O(N)

EDIT: I am using the term "size" to denote the number of elements, including itself, that a node has in its tree. A node's "size" would be its left tree's size plus its right tree's size + one for itself. Equivalent to the term "count". Just clarifying the terminology I am using.

I believe a node's "rank" is the size (number of elements) of its left tree. It is the equivalent of its array index if you changed the binary tree into an ordered array. A completely unbalanced tree where no node has a right child would be your worst case scenario here, I believe. Thus if you have a decreasing array of ten elements {9,8,7,6,5,4,3,2,1,0} and create the tree by inserting each array element in order, you have a tree where the first inserted node (9) is the root of the tree. 9's left child is 8, 8's left child is 7, etc., with no node having a right child. In this tree, each node's "rank" would be its value. I believe this is what the OP is intending? If so, read on. If not, well, read on anyway but it won't answer the OP's question. :)

Finding the size of a node is an O(N) task regardless of what shape the tree is. Thus finding the rank of a node is an O(N) task since rank is the size of a node's left node. Now as far as finding the NODE with rank x in this worst-case scenario, I believe that the normal "select" function algorithm is for a node to find its rank. If its rank is x, it returns itself. If its rank is more than x, if it has a left child, it will make a recursive call to its left child. If it has no left child, it returns null. If its rank is less than x, if it has a right child, it will make a recursive call to its right child. Otherwise it returns null.

Node select(int rank)
{
    if(left == null)
    {
        if(rank == 0)
            return this;
         return null;
    }
    int left_size = left.size();
    if(left_size == rank)
        return this;
    else if(rank < left_size)
    {
        return left.select(rank);
    }
    else if(rank > left_size && right != null)
    {
        return right.select(rank - left_size - 1);
    }
    return null;
}

Worst case scenario is O(N^2) if the rank looked for is 0 in this completely skewed-left tree since you'd have N calls to select while traversing leftward. Each call has a call to size(), which is of order O(N), so O(N^2) total.

However, I believe we can avoid this problem with a global flag that short-circuits the recursion. It might not be the most elegant solution, but I believe it reduces this task to O(N). Let's have a global flag called rankedNode and an two select functions. One is the initial call. It is called once and it's sole purpose is to set that global flag and call the other select functions. I have it as taking two integer parameters, not one, solely to differentiate it from the other select function. The second parameter, named dummyInt, is not used. Again, it's just to differentiate the two functions. Initial call would be to this two-parameter function at the root node.

Node rankedNode = null; // global flag to prevent recursive calls when node is found

Node select(int dummyInt, int rank)
{
    rankedNode = null;
    return this.select(rank);
}

Node select(int rank)
{
    if(rankedNode != null)
        return rankedNode;

    if(left == null)
    {
        if(rank == 0)
        {
            rankedNode = this;
            return this;
         }
         return null;
    }

    int left_size = left.size();
    if(left_size == rank)
    {
        rankedNode = this;
        return this;
    }
    else if(rank < left_size)
    {
        return left.select(rank);
    }
    else if(rank > left_size && right != null)
    {
        return right.select(rank - left_size - 1);
    }
    return null;
}

Edited 7 Months Ago by AssertNull: Edit: Added a short paragraph at the very top to clarify terms. Also fixed bug.

OK, found even MORE bugs. Really sorry. Edit window is about to end. Will post error-free (hopefully) code ASAP. Again, very sorry.

OK, this one should be OK. :) Again, sorry. Ignore the previous code please. Please note that when I use the term "size", I define it as the number of nodes in a tree, which is the size of a node's left tree plus the size of a node's right tree plus one for itself. I've also heard this refered to as "count".

I believe a node's "rank" is the size (number of elements) of its left tree. It is the equivalent of its array index if you changed the binary tree into an ordered array. A completely unbalanced tree where no node has a right child would be your worst case scenario here, I believe. Thus if you have a decreasing array of ten elements {9,8,7,6,5,4,3,2,1,0} and create the tree by inserting each array element in order, you have a tree where the first inserted node (9) is the root of the tree. 9's left child is 8, 8's left child is 7, etc., with no node having a right child. In this tree, each node's "rank" would be its value. I believe this is what the OP is intending? If so, read on. If not, well, read on anyway but it won't answer the OP's question. :)

Finding the size of a tree is an O(N) task regardless of what shape the tree is. Thus finding the rank of a node is an O(N) task since rank is the size of a node's left tree. Now as far as finding the NODE with rank x in this worst-case scenario, I believe that the normal "select" function algorithm is for a node to find its rank. If its rank is x, it returns itself. If its rank is more than x, if it has a left child, it will make a recursive call to its left child. If it has no left child, it returns null. If its rank is less than x, if it has a right child, it will make a recursive call to its right child. Otherwise it returns null.

Node select(int rank)
{
    int left_size = 0;
    if(left != null)
    {
        left_size = left.size();
    }

    if(rank == left_size)
    {
        return this;
    }
    else if(rank < left_size && left != null)
    {
        return left.select(rank);
    }
    else if(rank > left_size && right != null)
    {
        return right.select(rank - (left_size + 1)); // add one for this node
    }
    return null;
}

Worst case scenario is O(N^2) if the rank looked for is 0 in this completely skewed-left tree since you'd have N calls to select while traversing leftward. Each call has a call to size(), which is of order O(N), so O(N^2) total.

However, I believe we can avoid this problem with a global flag that short-circuits the recursion. It might not be the most elegant solution, but I believe it reduces this task to O(N). Let's have a global flag called rankedNode and an two select functions. One is the initial call. It is called once and it's sole purpose is to set that global flag and call the other select functions. I have it as taking two integer parameters, not one, solely to differentiate it from the other select function. The second parameter, named dummyInt, is not used. Again, it's just to differentiate the two functions. Initial call would be to this two-parameter function at the root node.

Node rankedNode = null; // global flag to prevent recursive calls when node is found

Node select(int dummyInt, int rank)
{
    rankedNode = null;
    return this.select(rank);
}

Node select(int rank)
{
    if(rankedNode != null)
        return rankedNode; // short-circuit unneeded recurcion

    int left_size = 0;
    if(left != null)
    {
        left_size = left.size();
    }

    if(rank == left_size)
    {
        rankedNode = this; // flag itself as the answer
        return this;
    }
    else if(rank < left_size && left != null)
    {
        return left.select(rank);
    }
    else if(rank > left_size && right != null)
    {
        return right.select(rank - (left_size + 1)); // add one for this node
    }
    return null;
}

Well, I did it again. I'm always lecturing people to make sure they actually read and understand what's being asked and this time I didn't follow my own advice. On top of all my edits/errors, after re-reading Roger's initial post, I see that the goal of all this was not to return the NODE with the desired rank. Assuming I am reading it right THIS time, Roger actually just needs the key/value at that rank. Now I returned the node itself, so it's quite easy to get the key/value given the node. I'm not sure whether that difference changes the optimal strategy or the O(N).

Anyway, as often happens, I glanced at Roger's post, didn't really read it, assumed I knew what he wanted, then went off and experimented with that since it seemed interesting and I'd never thought about it before. So all in all, it wasn't a waste of time and I sort of learned something. Not sure if it actually helps Roger or not though. :)

I AM interested, however, in knowing whether what I did was considered "cheating" or not and whether anyone knows how to do this at better than O(N^2) worst-case scenario with pure recursion, i.e. without using the global flag or something similar.

This article has been dead for over six months. Start a new discussion instead.