Finding last element of a binary heap

Finding last element of a binary heap - algorithm

quoting Wikipedia:
It is perfectly acceptable to use a
traditional binary tree data structure
to implement a binary heap. There is
an issue with finding the adjacent
element on the last level on the
binary heap when adding an element
which can be resolved
algorithmically...
Any ideas on how such an algorithm might work?
I was not able to find any information about this issue, for most binary heaps are implemented using arrays.
Any help appreciated.
Recently, I have registered an OpenID account and am not able to edit my initial post nor comment answers. That's why I am responding via this answer. Sorry for this.
quoting Mitch Wheat:
#Yse: is your question "How do I find
the last element of a binary heap"?
Yes, it is.
Or to be more precise, my question is: "How do I find the last element of a non-array-based binary heap?".
quoting Suppressingfire:
Is there some context in which you're
asking this question? (i.e., is there
some concrete problem you're trying to
solve?)
As stated above, I would like to know a good way to "find the last element of a non-array-based binary heap" which is necessary for insertion and deletion of nodes.
quoting Roy:
It seems most understandable to me to
just use a normal binary tree
structure (using a pRoot and Node
defined as [data, pLeftChild,
pRightChild]) and add two additional
pointers (pInsertionNode and
pLastNode). pInsertionNode and
pLastNode will both be updated during
the insertion and deletion subroutines
to keep them current when the data
within the structure changes. This
gives O(1) access to both insertion
point and last node of the structure.
Yes, this should work. If I am not mistaken, it could be a little bit tricky to find the insertion node and the last node, when their locations change to another subtree due to an deletion/insertion. But I'll give this a try.
quoting Zach Scrivena:
How about performing a depth-first
search...
Yes, this would be a good approach. I'll try that out, too.
Still I am wondering, if there is a way to "calculate" the locations of the last node and the insertion point. The height of a binary heap with N nodes can be calculated by taking the log (of base 2) of the smallest power of two that is larger than N. Perhaps it is possible to calculate the number of nodes on the deepest level, too. Then it was maybe possible to determine how the heap has to be traversed to reach the insertion point or the node for deletion.

Basically, the statement quoted refers to the problem of resolving the location for insertion and deletion of data elements into and from the heap. In order to maintain "the shape property" of a binary heap, the lowest level of the heap must always be filled from left to right leaving no empty nodes. To maintain the average O(1) insertion and deletion times for the binary heap, you must be able to determine the location for the next insertion and the location of the last node on the lowest level to use for deletion of the root node, both in constant time.
For a binary heap stored in an array (with its implicit, compacted data structure as explained in the Wikipedia entry), this is easy. Just insert the newest data member at the end of the array and then "bubble" it into position (following the heap rules). Or replace the root with the last element in the array "bubbling down" for deletions. For heaps in array storage, the number of elements in the heap is an implicit pointer to where the next data element is to be inserted and where to find the last element to use for deletion.
For a binary heap stored in a tree structure, this information is not as obvious, but because it's a complete binary tree, it can be calculated. For example, in a complete binary tree with 4 elements, the point of insertion will always be the right child of the left child of the root node. The node to use for deletion will always be the left child of the left child of the root node. And for any given arbitrary tree size, the tree will always have a specific shape with well defined insertion and deletion points. Because the tree is a "complete binary tree" with a specific structure for any given size, it is very possible to calculate the location of insertion/deletion in O(1) time. However, the catch is that even when you know where it is structurally, you have no idea where the node will be in memory. So, you have to traverse the tree to get to the given node which is an O(log n) process making all inserts and deletions a minimum of O(log n), breaking the usually desired O(1) behavior. Any search ("depth-first", or some other) will be at least O(log n) as well because of the traversal issue noted and usually O(n) because of the random nature of the semi-sorted heap.
The trick is to be able to both calculate and reference those insertion/deletion points in constant time either by augmenting the data structure ("threading" the tree, as mention in the Wikipedia article) or using additional pointers.
The implementation which seems to me to be the easiest to understand, with low memory and extra coding overhead, is to just use a normal simple binary tree structure (using a pRoot and Node defined as [data, pParent, pLeftChild, pRightChild]) and add two additional pointers (pInsert and pLastNode). pInsert and pLastNode will both be updated during the insertion and deletion subroutines to keep them current when the data within the structure changes. This implementation gives O(1) access to both insertion point and last node of the structure and should allow preservation of overall O(1) behavior in both insertion and deletions. The cost of the implementation is two extra pointers and some minor extra code in the insertion/deletion subroutines (aka, minimal).
EDIT: added pseudocode for an O(1) insert()
Here is pseudo code for an insert subroutine which is O(1), on average:
define Node = [T data, *pParent, *pLeft, *pRight]
void insert(T data)
{
do_insertion( data ); // do insertion, update count of data items in tree
# assume: pInsert points node location of the tree that where insertion just took place
# (aka, either shuffle only data during the insertion or keep pInsert updated during the bubble process)
int N = this->CountOfDataItems + 1; # note: CountOfDataItems will always be > 0 (and pRoot != null) after an insertion
p = new Node( <null>, null, null, null); // new empty node for the next insertion
# update pInsert (three cases to handle)
if ( int(log2(N)) == log2(N) )
{# #1 - N is an exact power of two
# O(log2(N))
# tree is currently a full complete binary tree ("perfect")
# ... must start a new lower level
# traverse from pRoot down tree thru each pLeft until empty pLeft is found for insertion
pInsert = pRoot;
while (pInsert->pLeft != null) { pInsert = pInsert->pLeft; } # log2(N) iterations
p->pParent = pInsert;
pInsert->pLeft = p;
}
else if ( isEven(N) )
{# #2 - N is even (and NOT a power of 2)
# O(1)
p->pParent = pInsert->pParent;
pInsert->pParent->pRight = p;
}
else
{# #3 - N is odd
# O(1)
p->pParent = pInsert->pParent->pParent->pRight;
pInsert->pParent->pParent->pRight->pLeft = p;
}
pInsert = p;
// update pLastNode
// ... [similar process]
}
So, insert(T) is O(1) on average: exactly O(1) in all cases except when the tree must be increased by one level when it is O(log N), which happens every log N insertions (assuming no deletions). The addition of another pointer (pLeftmostLeaf) could make insert() O(1) for all cases and avoids the possible pathologic case of alternating insertion & deletion in a full complete binary tree. (Adding pLeftmost is left as an exercise [it's fairly easy].)

My first time to participate in stack overflow.
Yes, the above answer by Zach Scrivena (god I don't know how to properly refer to other people, sorry) is right. What I want to add is a simplified way if we are given the count of nodes.
The basic idea is:
Given the count N of nodes in this full binary tree, do "N % 2" calculation and push the results into a stack. Continue the calculation until N == 1. Then pop the results out. The result being 1 means right, 0 means left. The sequence is the route from root to target position.
Example:
The tree now have 10 nodes, I want insert another node at position 11. How to route it?
11 % 2 = 1 --> right (the quotient is 5, and push right into stack)
5 % 2 = 1 --> right (the quotient is 2, and push right into stack)
2 % 2 = 0 --> left (the quotient is 1, and push left into stack. End)
Then pop the stack: left -> right -> right. This is the path from the root.

You could use the binary representation of the size of the Binary Heap to find the location of the last node in O(log N). The size could be stored and incremented which would take O(1) time. The the fundamental concept behind this is the structure of the binary tree.
Suppose our heap size is 7. The binary representation of 7 is, "111". Now, remember to always omit the first bit. So, now we are left with "11". Read from left-to-right. The bit is '1', so, go to the right child of the root node. Then the string left is "1", the first bit is '1'. So, again go to the right child of the current node you are at. As you no longer have bits to process, this indicates that you have reached the last node. So, the raw working of the process is that, convert the size of the heap into bits. Omit the first bit. According to the leftmost bit, go to the right child of the current node if it is '1', and to the left child of the current node if it is '0'.
As you always to to the very end of the binary tree this operation always takes O(log N) time. This is a simple and accurate procedure to find the last node.
You may not understand it in the first reading. Try working this method on the paper for different values of Binary Heap, I'm sure you'll get the intuition behind it. I'm sure this knowledge is enough to solve your problem, if you want more explanation with figures, you can refer to my blog.
Hope my answer has helped you, if it did, let me know...! ☺

How about performing a depth-first search, visiting the left child before the right child, to determine the height of the tree. Thereafter, the first leaf you encounter with a shorter depth, or a parent with a missing child would indicate where you should place the new node before "bubbling up".
The depth-first search (DFS) approach above doesn't assume that you know the total number of nodes in the tree. If this information is available, then we can "zoom-in" quickly to the desired place, by making use of the properties of complete binary trees:
Let N be the total number of nodes in the tree, and H be the height of the tree.
Some values of (N,H) are (1,0), (2,1), (3,1), (4,2), ..., (7,2), (8, 3).
The general formula relating the two is H = ceil[log2(N+1)] - 1.
Now, given only N, we want to traverse from the root to the position for the new node, in the least number of steps, i.e. without any "backtracking".
We first compute the total number of nodes M in a perfect binary tree of height H = ceil[log2(N+1)] - 1, which is M = 2^(H+1) - 1.
If N == M, then our tree is perfect, and the new node should be added in a new level. This means that we can simply perform a DFS (left before right) until we hit the first leaf; the new node becomes the left child of this leaf. End of story.
However, if N < M, then there are still vacancies in the last level of our tree, and the new node should be added to the leftmost vacant spot.
The number of nodes that are already at the last level of our tree is just (N - 2^H + 1).
This means that the new node takes spot X = (N - 2^H + 2) from the left, at the last level.
Now, to get there from the root, you will need to make the correct turns (L vs R) at each level so that you end up at spot X at the last level. In practice, you would determine the turns with a little computation at each level. However, I think the following table shows the big picture and the relevant patterns without getting mired in the arithmetic (you may recognize this as a form of arithmetic coding for a uniform distribution):
0 0 0 0 0 X 0 0 <--- represents the last level in our tree, X marks the spot!
^
L L L L R R R R <--- at level 0, proceed to the R child
L L R R L L R R <--- at level 1, proceed to the L child
L R L R L R L R <--- at level 2, proceed to the R child
^ (which is the position of the new node)
this column tells us
if we should proceed to the L or R child at each level
EDIT: Added a description on how to get to the new node in the shortest number of steps assuming that we know the total number of nodes in the tree.

Solution in case you don't have reference to parent !!!
To find the right place for next node you have 3 cases to handle
case (1) Tree level is complete Log2(N)
case (2) Tree node count is even
case (3) Tree node count is odd
Insert:
void Insert(Node root,Node n)
{
Node parent = findRequiredParentToInsertNewNode (root);
if(parent.left == null)
parent.left = n;
else
parent.right = n;
}
Find the parent of the node in order to insert it
void findRequiredParentToInsertNewNode(Node root){
Node last = findLastNode(root);
//Case 1
if(2*Math.Pow(levelNumber) == NodeCount){
while(root.left != null)
root=root.left;
return root;
}
//Case 2
else if(Even(N)){
Node n =findParentOfLastNode(root ,findParentOfLastNode(root ,last));
return n.right;
}
//Case 3
else if(Odd(N)){
Node n =findParentOfLastNode(root ,last);
return n;
}
}
To find the last node you need to perform a BFS (breadth first search) and get the last element in the queue
Node findLastNode(Node root)
{
if (root.left == nil)
return root
Queue q = new Queue();
q.enqueue(root);
Node n = null;
while(!q.isEmpty()){
n = q.dequeue();
if ( n.left != null )
q.enqueue(n.left);
if ( n.right != null )
q.enqueue(n.right);
}
return n;
}
Find the parent of the last node in order to set the node to null in case replacing with the root in removal case
Node findParentOfLastNode(Node root ,Node lastNode)
{
if(root == null)
return root;
if( root.left == lastNode || root.right == lastNode )
return root;
Node n1= findParentOfLastNode(root.left,lastNode);
Node n2= findParentOfLastNode(root.left,lastNode);
return n1 != null ? n1 : n2;
}

I know this is an old thread but i was looking for a answer to the same question. But i could not afford to do an o(log n) solution as i had to find the last node thousands of times in a few seconds. I did have a O(log n) algorithm but my program was crawling because of the number of times it performed this operation. So after much thought I did finally find a fix for this. Not sure if anybody things this is interesting.
This solution is O(1) for search. For insertion it is definitely less than O(log n), although I cannot say it is O(1).
Just wanted to add that if there is interest, i can provide my solution as well.
The solution is to add the nodes in the binary heap to a queue. Every queue node has front and back pointers.We keep adding nodes to the end of this queue from left to right until we reach the last node in the binary heap. At this point, the last node in the binary heap will be in the rear of the queue.
Every time we need to find the last node, we dequeue from the rear,and the second-to-last now becomes the last node in the tree.
When we want to insert, we search backwards from the rear for the first node where we can insert and put it there. It is not exactly O(1) but reduces the running time dramatically.

Related

Avl tree- find first key missing from the tree after i

I have the above problem that I'm trying to solve. The Avl tree has also a size of substree for every node and I know the maximum. I need to find the next first number after i which isn't in the tree. I need to do it in O(logn) time.
I got to
if i bigger/equal the maximum then return i+1,
I tried to do the other cases to find the minimum after i that is in the tree and I know I can do it in O(logn) if the number I found is bigger than i+1 return i+1.
Now I understand that if i+1 is in the tree, I need to keep searching but I'm getting time complexity bigger than I need this way.
Would greatly appreciate any guidance. I'm not looking for code, only an idea or guidance how to solve it in the time specified

I think your problem might be more in within the time complexity analysis than the actual algorithm.
We know that, if properly done, the time of a search in a well formed AVL tree of height log[2](n) will always be log[2](n). Searching for a missing item in this case is no different than searching for an existing item.
Let's say you have an AVL tree A and it includes i and i+1. Then we know that i+1 must either be the parent node of i and i being the left child node, or that i+1 is the right child node of i. So we can conclude:
if i ^ i+1 in A => i+1(l)=i v i(r)=i+1
So if you find i and its parent node is not i+1 its right child node has to be i+1. You can extend this to i=i+1 after finding i+1 and keep checking for this condition. The cool thing here is there is only one place you need to look at for every value i+n after i if you keep track of the nodes you have traversed through.
If you go through [i+7, i+4, i] You immediately know that if A contains i it cannot contain i+1. This is due to i+1 < i+4 but i < i+1 < i+4.
If you go through [i-6, i-2, i] You also immediately know that if A contains i+1 it cannot contain i+1. This is due to i-2 < i+1 but i-2 < i < i+1.
If you were to go through [i+7, i+3, i+1, i] you found i, i+1 and since i+3 is not i+2 you know i+2 has to be the right child node of i+1 since it must not be a child of i+3 since it is smaller, but i+1 already took the left child position. So you check if i+1's right child is i+2, you continue checking for i+4 from i+3 on, essentially using the algorithm:
define stack //use your favourite stack implementaion
let n = root node
let i = yourval
while n.val != i
stack.push(n)
if i > n.val
n = n.right
else //equivalent to "else if i < n.val" since loop condition ensures they are not equal
n = n.left
while !stack.empty
if stack.peek.right.val != queue.peek.val + 1
//Implies parent holds value
temp = stack.pop.val + 1
if(temp != stack.peek.val) //If the parent does not hold the next value return it
return temp;
else //Right child holds value
stack.push(queue.peek.right)
i = stack.peek.val
return i+1 //if the stack is empty eventually return the next value
Due to how AVL trees are formed your stack will at most be 2*logn[2](n) elements large (if i is a leaf on the LHS and the last value is a leaf on the RHS). So in total your search will take log[2](n) for the initial search for i and another 2*log[2](n) combined that makes 3*log[2](n), which in Big Omicron is still O(log[2](n)).

As a hint, think about how you'd solve this problem if you had the elements in an array rather than an AVL tree. How would you solve this problem in time O(log n) in an array by using a modified binary search?
Once you've worked out how to do that, see if you can adjust your solution to work in a binary search tree rather than an array. The intuition will be pretty similar, except that instead of looking at the middle of the overall range of elements at each point, you'll look at the root of the current subtree, which, while close to the middle, isn't always exactly at the middle.

Counting p-cousins on a directed tree

We're given a directed tree to work with. We define the concepts of p-ancestor and p-cousin as follows
p-ancestor: A node is an 1-ancestor of another if it is the parent of it. It is the p-ancestor of a node, if it is the parent of the (p-1)-th ancestor.
p-cousin: A node is the p-cousin of another, if they share the same p-ancestor.
For example, consider the tree below.
4 has three 1-cousins i,e, 3, 4 and 5 since they all share the common
1-ancestor, which is 1
For a particular tree, the problem is as follows. You are given multiple pairs of (node,p) and are supposed to count (and output) the number of p-cousins of the corresponding nodes.
A slow algorithm would be to crawl up to the p-ancestor and run a BFS for each node.
What is the (asymptotically) fastest way to solve the problem?

If an off-line solution is acceptable, two Depth first searches can do the job.
Assume that we can index all of those n queries (node, p) from 0 to n - 1
We can convert each query (node, p) into another type of query (ancestor , p) as follow:
Answer for query (node, p), with node has level a (distance from root to this node is a), is the number of descendants level a of the ancestor at level a - p. So, for each queries, we can find who is that ancestor:
Pseudo code
dfs(int node, int level, int[]path, int[] ancestorForQuery, List<Query>[]data){
path[level] = node;
visit all child node;
for(Query query : data[node])
if(query.p <= level)
ancestorForQuery[query.index] = path[level - p];
}
Now, after the first DFS, instead of the original query, we have a new type of query (ancestor, p)
Assume that we have an array count, which at index i stores the number of node which has level i. Assume that, node a at level x , we need to count number of p descendants, so, the result for this query is:
query result = count[x + p] after we visit a - count[x + p] before we visit a
Pseudo code
dfs2(int node, int level, int[] result, int[]count, List<TransformedQuery>[]data){
count[level] ++;
for(TransformedQuery query : data[node]){
result[query.index] -= count[level + query.p];
}
visit all child node;
for(TransformedQuery query : data[node]){
result[query.index] += count[level + query.p];
}
}
Result of each query is stored in result array.

If p is fixed, I suggest the following algorithm:
Let's say that count[v] is number of p-children of v. Initially all count[v] are set to 0. And pparent[v] is p-parent of v.
Let's now run a dfs on the tree and keep the stack of visited nodes, i.e. when we visit some v, we put it into the stack. Once we leave v, we pop.
Suppose we've come to some node v in our dfs. Let's do count[stack[size - p]]++, indicating that we are a p-child of v. Also pparent[v] = stack[size-p]
Once your dfs is finished, you can calculate the desired number of p-cousins of v like this:
count[pparent[v]]
The complexity of this is O(n + m) for dfs and O(1) for each query

First I'll describe a fairly simple way to answer each query in O(p) time that uses O(n) preprocessing time and space, and then mention a way that query times can be sped up to O(log p) time for a factor of just O(log n) extra preprocessing time and space.
O(p)-time query algorithm
The basic idea is that if we write out the sequence of nodes visited during a DFS traversal of the tree in such a way that every node is written out at a vertical position corresponding to its level in the tree, then the set of p-cousins of a node form a horizontal interval in this diagram. Note that this "writing out" looks very much like a typical tree diagram, except without lines connecting nodes, and (if a postorder traversal is used; preorder would be just as good) parent nodes always appearing to the right of their children. So given a query (v, p), what we will do is essentially:
Find the p-th ancestor u of the given node v. Naively this takes O(p) time.
Find the p-th left-descendant l of u -- that is, the node you reach after repeating the process of visiting the leftmost child of the current node, p times. Naively this takes O(p) time.
Find the p-th right-descendant r of u (defined similarly). Naively this takes O(p) time.
Return the value x[r] - x[l] + 1, where x[i] is a precalculated value that records the number of nodes in the sequence described above that are at the same level as, and at or to the left of, node i. This takes constant time.
The preprocessing step is where we calculate x[i], for each 1 <= i <= n. This is accomplished by performing a DFS that builds up a second array y[] that records the number y[d] of nodes visited so far at depth d. Specifically, y[d] is initially 0 for each d; during the DFS, when we visit a node v at depth d, we simply increment y[d] and then set x[v] = y[d].
O(log p)-time query algorithm
The above algorithm should already be fast enough if the tree is fairly balanced -- but in the worst case, when each node has just a single child, O(p) = O(n). Notice that it is navigating up and down the tree in the first 3 of the above 4 steps that force O(p) time -- the last step takes constant time.
To fix this, we can add some extra pointers to make navigating up and down the tree faster. A simple and flexible way uses "pointer doubling": For each node v, we will store log2(depth(v)) pointers to successively higher ancestors. To populate these pointers, we perform log2(maxDepth) DFS iterations, where on the i-th iteration we set each node v's i-th ancestor pointer to its (i-1)-th ancestor's (i-1)-th ancestor: this takes just two pointer lookups per node per DFS. With these pointers, moving any distance p up the tree always takes at most log(p) jumps, because the distance can be reduced by at least half on each jump. The exact same procedure can be used to populate corresponding lists of pointers for "left-descendants" and "right-descendants" to speed up steps 2 and 3, respectively, to O(log p) time.

Counting number of nodes in a complete binary tree

I want to count the number of nodes in a Complete Binary tree but all I can think of is traversing the entire tree. This will be a O(n) algorithm where n is the number of nodes in the tree. what could be the most efficient algorithm to achieve this?

Suppose that we start off by walking down the left and right spines of the tree to determine their heights. We'll either find that they're the same, in which case the last row is full, or we'll find that they're different. If the heights come back the same (say the height is h), then we know that there are 2h - 1 nodes and we're done. (refer figure below for reference)
Otherwise, the heights must be h+1 and h, respectively. We know that there are then at least 2h - 1 nodes, plus the number of nodes in the bottom layer of the tree. The question, then, is how to figure that out. One way to do this would be to find the rightmost node in the last layer. If you know at which index that node is, you know exactly how many nodes are in the last layer, so you can add that to 2h - 1 and you're done.
If you have a complete binary tree with left height h+1, then there are between 1 and 2h - 1 possible nodes that could be in the last layer. The question is then how to determine this as efficiently as possible.
Fortunately, since we know the nodes in the last layer get filled in from the left to the right, we can use binary search to try to figure out where the last filled node in the last layer is. Essentially, we guess the index where it might be, walk from the root of the tree down to where that leaf should be, and then either find a node there (so we know that the rightmost node in the bottom layer is either that node or to the right) or we don't (so we know that the rightmost node in the bottom layer must purely be to the right of the current location). We can walk down to where the kth node in the bottom layer should be by using the bits in the number k to guide a search down: we start at the root, then go left if the first bit of k is 0 and right if the first bit of k is 1, then use the remaining bits in a corresponding manner to walk down the tree. The total number of times we'll do this is O(h) and each probe takes time O(h), so the total work done here is O(h2). Since h is the height of the tree, we know that h = O(log n), so this algorithm takes time O(log2 n) time to complete.
I'm not sure whether it's possible to improve upon this algorithm. I can get an Ω(log n) lower bound on any correct algorithm, though. I'm going to argue that any algorithm that is always correct in all cases must inspect the rightmost leaf node in the final row of the tree. To see why, suppose there's a tree T where the algorithm doesn't do this. Let's suppose that the rightmost node that the algorithm inspects in the bottom row is x, that the actual rightmost node in the bottom row is y, and that the leftmost missing node in the bottom row that the algorithm detected is z. We know that x must be to the left of y (because the algorithm didn't inspect the leftmost node in the bottom row) and that y must be to the left of z (because y exists and z doesn't, so z must be further to the right than y). If you think about what the algorithm's "knowledge" is at this point, the algorithm has no idea whether or not there are any nodes purely to the right of x or purely to the left of z. Therefore, if we were to give it a modified tree T' where we deleted y, the algorithm wouldn't notice that anything had changed and would have exactly the same execution path on T and T'. However, since T and T' have a different number of nodes, the algorithm has to be wrong on at least one of them. Inspecting this node takes time at least Ω(log n) because of the time required to walk down the tree.
In short, you can do better than O(n) with the above O(log2 n)-time algorithm, and you might be able to do even better than that, though I'm not entirely sure how or whether that's possible. I suspect it isn't because I suspect that binary search is the optimal way to check the bottom row and the lengths of the paths down to the nodes you'd probe, even after taking into account that they share nodes in common, is Θ(log2 n), but I'm not sure how to prove it.
Hope this helps!
Images source

public int leftHeight(TreeNode root){
int h=0;
while(root!=null){
root=root.left;
h++;
}
return h;
}
public int rightHeight(TreeNode root){
int h=0;
while(root!=null){
root=root.right;
h++;
}
return h;
}
public int countNodes(TreeNode root) {
if(root==null)
return 0;
int lh=leftHeight(root);
int rh=rightHeight(root);
if(lh==rh)
return (1<<lh)-1;
return countNodes(root.left)+countNodes(root.right)+1;
}
In each recursive call,we need to traverse along the left and right boundaries of the complete binary tree to compute the left and right height. If they are equal the tree is full with 2^h-1 nodes.Otherwise we recurse on the left subtree and right subtree. The first call is from the root (level=0) which take O(h) time to get left and right height.We have recurse till we get a subtree which is full binary tree.In worst case it can happen that the we go till the leaf node. So the complexity will be (h + (h-1) +(h-2) + ... + 0)= (h(h+1)/2)= O(h^2).Also space complexity is size of the call stack,which is O(h).
NOTE:For complete binary tree h=log(n).

If the binary tree is definitely complete (as opposed to 'nearly complete' or 'almost complete' as defined in the Wikipedia article) you should simply descend down one branch of the tree down to the leaf. This will be O(logn). Then sum the powers of two up to this depth. So 2^0 + 2^1... + 2^d

C# Sample might helps others. This is similar to the time complexity well explained above by templatetypedef
public int GetLeftHeight(TreeNode treeNode)
{
int heightCnt = 0;
while (treeNode != null)
{
heightCnt++;
treeNode = treeNode.LeftNode;
}
return heightCnt;
}
public int CountNodes(TreeNode treeNode)
{
int heightIndx = GetLeftHeight(treeNode);
int nodeCnt = 0;
while (treeNode != null)
{
int rightHeight = GetLeftHeight(treeNode.RightNode);
nodeCnt += (int)Math.Pow(2, rightHeight); //(1 << rh);
treeNode = (rightHeight == heightIndx - 1) ? treeNode.RightNode : treeNode.LeftNode;
heightIndx--;
}
return nodeCnt;
}

Using Recursion:
int countNodes(TreeNode* root) {
if (!root){
return 0;
}
else{
return countNodes(root->left)+countNodes(root->right)+1;
}
}

Select a Node at Random from Unbalanced Binary Tree

One of my friends had the following interview question, and neither of us are quite sure what the correct answer is. Does anyone have an idea about how to approach this?
Given an unbalanced binary tree, describe an algorithm to select a node at random such that each node has an equal probability of being selected.

You can do it with a single pass of the tree. The algorithm is the same as with a list.
When you see the first item in the tree, you set it as the selected item.
When you see the second item, you pick a random number in the range (0,2]. If it's 1, then the new item becomes the selected item. Otherwise you skip that item.
For each node you see, you increase the count, and with probability 1/count, you select it. So at the 101st node, you pick a random number in the range (0,101]. If it's 100, that node is the new selected node.
When you're done traversing the tree, return the selected node. The operation is O(n) in time, with n being the number of nodes in the tree, and O(1) in space. No preprocessing required.

We can do this recursively in one parse by selecting the random node while parsing the tree and counting the number of nodes in left and right sub tree. At every step in recursion, we return the number of nodes at the root and a random node selected uniformly randomly from nodes in sub tree rooted at root.
Let's say number of nodes in left sub tree is n_l and number of nodes in right sub tree is n_r. Also, randomly selected node from left and right subtree be R_l and R_r respectively. Then, select a uniform random number in [0,1] and select R_l with probability n_l/(n_l+n_r+1) or select root with probability 1/(n_l+n_r+1) or select R_r with probability n_r/(n_l+n_r+1).

Note
If you're only doing a single query, and you don't already have a count at each node, the best time complexity you can get is O(n), so the depth-first-search approach would be the best one.
For repeated queries, the best option depends on the given constraints
(the fastest per-query approach is using a supplementary array).
Supplementary array
O(n) space, O(n) preprocessing, O(1) insert / remove, O(1) query
Have a supplementary array containing all the nodes.
Also have each node store its own index (so you can remove it from the array in O(1) - the way to do this would be to swap it with the last element in the array, update the index of the node that was at the last index appropriately and decrease the size of the array (removing the last element).
To get a random node, simply generate a random index in the array.
Per-node count
Modified tree (O(n) space), N/A (or O(n)) preprocessing, O(depth) insert / remove, O(depth) query
Let each node contain the number of elements in its subtree.
When generating a random node, go left or right based on the value of a random number generated and the counts of the left or right subtrees.
// note that subtreeCount = leftCount + rightCount + 1
val = getRandomNumber(subtreeCount)
if val = 0
return this node
else if val <= leftCount
go left
else
go right
Depth-first-search
O(depth) space, O(1) preprocessing, O(1) insert / remove, O(n) query
Count the number of nodes in the tree (if you don't already have the count).
Generate a random number between 0 and the number of nodes.
Simply do a depth-first-search through the tree and stop when you've processed the desired number of nodes.
This presumes a node doesn't have a parent member - having this will make this O(1) space.

I implemented #jim-mischel's algorithm in C# and it works great:
private void SelectRandomNode(ref int count, Node curNode, ref Node selectedNode)
{
foreach( var childNode in curNode.Children )
{
++count;
if( random.Next(count) == count - 1 )
selectedNode = childNode;
SelectRandomNode(ref count, childNode, ref selectedNode);
}
}
Call it like this:
var count = 1;
Node selected = root;
SelectRandomNode(ref count, root, ref selected);

Create Binary Tree from Ancestor Matrix

The question is how to create a binary tree, given its ancestor matrix. I found a cool solution at http://www.ritambhara.in/build-binary-tree-from-ancestor-matrics/. Problem is that it involves deleting rows and columns from the matrix. Now how do I do that? Can anybody suggest a pseudocode for this? Or, is there any better algo possible?

You don't have to actually delete the rows and columns. You can either flag them as deleted in some additional array, or you can make them all zeros, which I think will be effectively the same (actually, you'll still need to know that they are removed, so you don't choose them again in step 4.c - so, flagging the node as deleted should be good enough).
Here are the modifications to the pseudocode from the page:
4.b.
used[temp] = true;
for (i = 0 to N)
Sum[i] -= matrix[i][temp]; (aka decrement sum if temp is a predecessor of i)
matrix[i][temp] = 0;
4.c. Look for all rows for which Sum[i] == 0 and used[i] == false.

This reminds me of the Dancing Links used by Doanld Knuth to implement his Algorithm X
It's basically a structure of circular doubly linked list. You could maintain a seperate Sum array and update it with removal of rows and columns as required.
Actually you don't need to maintain a separate Sum array.
Edit:
I meant -
You could use a structure made up of circular 2D linked lists.
The node structure would somewhat look like:
struct node{
int val;
struct node *left;
struct node *right;
struct node *down;
};
The Top-most and Left-most List is the header List for the vertices(Binary tree node values).
If vertex j is an ancestor of vertex i, build a (empty)new node such that j column's current down is assigned this new node and i's current left is assigned this new node. Note: Structure can be easily built by scanning each rows of ancestor matrix from left to right and inserting rows from 0 to N. (assuming N is the no. of vertices here)
I borrowed these images from Image1 and Image2 to give an idea of the grid. 2nd image is missing the Left-most header though.
If N is no. of vertices. There can be at worse O(N^2) entries in ancestor matrix(in case tree is skewed) or on average O(NlogN) entries.
To search for current Root: O(N)
Assuming a dummy node to start with, linearly scan the Leftmost header and choose a node with node->down->right == node->down.
To delete this vertex information: O(N)
Deleting row:O(1)
node->down = node->down->down;
Deleting column:O(N)
Goto the corresponding column - say(p):
node* q = p;
while(q->down != p){
q->down->left->right = q->down->right;
q->down->right->left = q->down->left;
q = q->down;
}
After discovering current Root you can assign it to it's parent node and insert them into a Queue to process the next level as per that link suggests you to.
Overall time complexity: N + (N-1) + (N-2) +.... = O(N^2).
Worst case space complexity O(N^2)
Though there is no big improvement in the asymptotic run-time from the solution you already have. I thought it's worth mentioning since, this kind of structure can be particularly useful for storing sparse matrices and defining operations like multiplication on them or if you are working with some backtracking algorithm which removes a row/column and later backtracks and adds it again like Knuth's AlgorithmX.

You don't have to update the matrix. Just decrement the values in the sum array for any descendents of the current node, and check if any of them reaches zero, which means the current noe is that last ancestor, e.g. the direct parent:
for (i = 0 to N)
if matrix[i][temp]==1:
Sum[i]=Sum[i]-1
if Sum[i]==0:
add i as child of temp
add i to queue

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio