Counting number of nodes in a complete binary tree - algorithm

I want to count the number of nodes in a Complete Binary tree but all I can think of is traversing the entire tree. This will be a O(n) algorithm where n is the number of nodes in the tree. what could be the most efficient algorithm to achieve this?

Suppose that we start off by walking down the left and right spines of the tree to determine their heights. We'll either find that they're the same, in which case the last row is full, or we'll find that they're different. If the heights come back the same (say the height is h), then we know that there are 2h - 1 nodes and we're done. (refer figure below for reference)
Otherwise, the heights must be h+1 and h, respectively. We know that there are then at least 2h - 1 nodes, plus the number of nodes in the bottom layer of the tree. The question, then, is how to figure that out. One way to do this would be to find the rightmost node in the last layer. If you know at which index that node is, you know exactly how many nodes are in the last layer, so you can add that to 2h - 1 and you're done.
If you have a complete binary tree with left height h+1, then there are between 1 and 2h - 1 possible nodes that could be in the last layer. The question is then how to determine this as efficiently as possible.
Fortunately, since we know the nodes in the last layer get filled in from the left to the right, we can use binary search to try to figure out where the last filled node in the last layer is. Essentially, we guess the index where it might be, walk from the root of the tree down to where that leaf should be, and then either find a node there (so we know that the rightmost node in the bottom layer is either that node or to the right) or we don't (so we know that the rightmost node in the bottom layer must purely be to the right of the current location). We can walk down to where the kth node in the bottom layer should be by using the bits in the number k to guide a search down: we start at the root, then go left if the first bit of k is 0 and right if the first bit of k is 1, then use the remaining bits in a corresponding manner to walk down the tree. The total number of times we'll do this is O(h) and each probe takes time O(h), so the total work done here is O(h2). Since h is the height of the tree, we know that h = O(log n), so this algorithm takes time O(log2 n) time to complete.
I'm not sure whether it's possible to improve upon this algorithm. I can get an Ω(log n) lower bound on any correct algorithm, though. I'm going to argue that any algorithm that is always correct in all cases must inspect the rightmost leaf node in the final row of the tree. To see why, suppose there's a tree T where the algorithm doesn't do this. Let's suppose that the rightmost node that the algorithm inspects in the bottom row is x, that the actual rightmost node in the bottom row is y, and that the leftmost missing node in the bottom row that the algorithm detected is z. We know that x must be to the left of y (because the algorithm didn't inspect the leftmost node in the bottom row) and that y must be to the left of z (because y exists and z doesn't, so z must be further to the right than y). If you think about what the algorithm's "knowledge" is at this point, the algorithm has no idea whether or not there are any nodes purely to the right of x or purely to the left of z. Therefore, if we were to give it a modified tree T' where we deleted y, the algorithm wouldn't notice that anything had changed and would have exactly the same execution path on T and T'. However, since T and T' have a different number of nodes, the algorithm has to be wrong on at least one of them. Inspecting this node takes time at least Ω(log n) because of the time required to walk down the tree.
In short, you can do better than O(n) with the above O(log2 n)-time algorithm, and you might be able to do even better than that, though I'm not entirely sure how or whether that's possible. I suspect it isn't because I suspect that binary search is the optimal way to check the bottom row and the lengths of the paths down to the nodes you'd probe, even after taking into account that they share nodes in common, is Θ(log2 n), but I'm not sure how to prove it.
Hope this helps!
Images source

public int leftHeight(TreeNode root){
int h=0;
while(root!=null){
root=root.left;
h++;
}
return h;
}
public int rightHeight(TreeNode root){
int h=0;
while(root!=null){
root=root.right;
h++;
}
return h;
}
public int countNodes(TreeNode root) {
if(root==null)
return 0;
int lh=leftHeight(root);
int rh=rightHeight(root);
if(lh==rh)
return (1<<lh)-1;
return countNodes(root.left)+countNodes(root.right)+1;
}
In each recursive call,we need to traverse along the left and right boundaries of the complete binary tree to compute the left and right height. If they are equal the tree is full with 2^h-1 nodes.Otherwise we recurse on the left subtree and right subtree. The first call is from the root (level=0) which take O(h) time to get left and right height.We have recurse till we get a subtree which is full binary tree.In worst case it can happen that the we go till the leaf node. So the complexity will be (h + (h-1) +(h-2) + ... + 0)= (h(h+1)/2)= O(h^2).Also space complexity is size of the call stack,which is O(h).
NOTE:For complete binary tree h=log(n).

If the binary tree is definitely complete (as opposed to 'nearly complete' or 'almost complete' as defined in the Wikipedia article) you should simply descend down one branch of the tree down to the leaf. This will be O(logn). Then sum the powers of two up to this depth. So 2^0 + 2^1... + 2^d

C# Sample might helps others. This is similar to the time complexity well explained above by templatetypedef
public int GetLeftHeight(TreeNode treeNode)
{
int heightCnt = 0;
while (treeNode != null)
{
heightCnt++;
treeNode = treeNode.LeftNode;
}
return heightCnt;
}
public int CountNodes(TreeNode treeNode)
{
int heightIndx = GetLeftHeight(treeNode);
int nodeCnt = 0;
while (treeNode != null)
{
int rightHeight = GetLeftHeight(treeNode.RightNode);
nodeCnt += (int)Math.Pow(2, rightHeight); //(1 << rh);
treeNode = (rightHeight == heightIndx - 1) ? treeNode.RightNode : treeNode.LeftNode;
heightIndx--;
}
return nodeCnt;
}

Using Recursion:
int countNodes(TreeNode* root) {
if (!root){
return 0;
}
else{
return countNodes(root->left)+countNodes(root->right)+1;
}
}

Related

Runtime of the following recursive algorithm?

I am working through the book "Cracking the coding interview" by Gayle McDowell and came across an interesting recursive algorithm that sums the values of all the nodes in a balanced binary search tree.
int sum(Node node) {
if (node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
}
Now Gayle says the runtime is O(N) which I find confusing as I don't see how this algorithm will ever terminate. For a given node, when node.left is passed to sum in the first call, and then node.right is consequently passed to sum in the second call, isn't the algorithm computing sum(node) for the second time around? Wouldn't this process go on forever? I'm still new to recursive algorithms so it might just not be very intuitive yet.
Cheers!
The process won't go on forever. The data structure in question is a Balanced Binary Search Tree and not a Graph which can contain cycles.
Starting from root, all the nodes will be explored in the manner - left -> itself -> right, like a Depth First Search.
node.left will explore the left subtree of a node and node.right will explore the right subtree of the same node. Both subtrees have nothing intersecting. Draw the trail of program control to see the order in which the nodes are explored and also to see that there is no overlapping in the traversal.
Since each node will be visited only once and the recursion will start unwinding when a leaf node will be hit, the running time will be O(N), N being the number of nodes.
The key to understanding a recursive algorithm is to trust that it does what it is deemed to. Let me explain.
First admit that the function sum(node) returns the sum of the values of all nodes of the subtree rooted at node.
Then the code
if (node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
can do two things:
if node is null, return 0; this is a non-recursive case and the returned value is trivially correct;
otherwise, the fuction computes the sum for the left subtree plus the value at node plus the sum for the right subtree, i.e. the sum for the subtree rooted at node.
So in a way, if the function is correct, then it is correct :) Actually the argument isn't circular thanks to the non-recursive case, which is also correct.
We can use the same way of reasoning to prove the running time of the algorithm.
Assume that the time required to process the tree rooted at node is proportional to the size of this subtree, let |T|. This is another act of faith.
Then if node is null, the time is constant, let 1 unit. And if node isn't null, the time is |L| + 1 + |R| units, which is precisely |T|. So if the time to sum a subtree is proportional to the size of the subtree, the time to sum a tree is prortional to the size of the tree!

What is an Efficient Algorithm to find Graph Centroid?

Graph Centroid is a vertex at equal distance or at distance less than or equal (N/2) where N is the size of the connected components connected through this vertex?! [Needs Correction?!]
Here's a problem at CodeForces that asks to find whether each vertex is a centroid, but after removing and replacing exactly one edge at a time.
Problem Statement
I need help to refine this PseudoCode / Algorithm.
Loop all Vertices:
Loop all edges:
Position each edge in every empty edge position between two unconnected nodes
Count Size of each Connected Component (*1).
If all sizes are less than or equal N/2,
Then return true
The problem is that this algorithm will run in at least O(N*M^2)). It's not acceptable.
I looked up the answers, but I couldn't come up with the high level abstraction of the algorithm used by others. Could you please help me understand how these solutions work?
Solutions' Link
(*1) DFS Loop
I will try to describe you a not so complex algorithm for solving this problem in linear time, for future references see my code (it have some comments).
The main idea is that you can root the tree T at an arbitrary vertex and traverse it, for each vertex V you can do this:
Cut subtree V from T.
Find the heaviest vertex H having size <= N/2 (H can be in any of T or subtree V).
Move subtree H to become child of subtree V.
Re root T with V and find if the heaviest vertex have size <= N/2
The previous algorithm can be implemented carefully to get linear time complexity, the issue is that it have a lot of cases to handle.
A better idea is to find the centroid C of T and root T at vertex C.
Having vertex C as the root of T is useful because it guarantee us that every descendant of C have size <= N/2.
When traversing the tree we can avoid to check for the heaviest vertex down the tree but up, every time we visit a child W, we can pass the heaviest size (being <= N/2) if we re root T at W.
Try to understand what I explained, let me know if something is not clear.
Well, the Centroid of a tree can be determined in O(N) space and time complexity.
Construct a Matrix representing the Tree, with the row indexes representing the N nodes and the elements in the i-th row representing the nodes to which i-th node is connected. You can any other representation also.
Maintain 2 linear arrays of size N with index i representing the depth of i-th node (depth) and the parent of the i-th node (parent) respectively.
Also maintain 2 more linear arrays, the first one containing the BFS traversal sequence of the tree (queue), and the other one (leftOver) containing the value of [N - Number of nodes in the subtree rooted at that node]. In other words, the i-th index contains the number of nodes that is left in the whole tree when the i-th node is removed from the tree along with all its children.
Now, perform a BFS traversal taking any arbitrary node as root and fill the arrays 'parent' and 'depth'. This takes O(N) time complexity. Also, record the traversal sequence in the array 'queue'.
Starting from the leaf nodes, add the number of nodes present in the subtree rooted at that node, with the value at the parent index in the array 'leftOver'. This also takes O(N) time since you can use the already prepared 'queue' array and travel from backwards.
Finally, traverse through the array 'leftOver' and modify each value to [N-1 - Initial Value]. The 'leftOver' array is prepared. Cost: Another O(N).
Your work is almost done. Now, iterate over this 'leftOver' array and find the index whose value is closest to floor(N/2). However, this value must not exceed floor(N/2) at any cost.
This index is the index of the Centroid of the Tree. Overall time complexity: O(N).
Java Code:
import java.util.ArrayList;
import java.util.Iterator;
import java.util.Scanner;
class Find_Centroid
{
static final int MAXN=100_005;
static ArrayList<Integer>[] graph;
static int[] depth,parent; // Step 2
static int N;
static Scanner io=new Scanner(System.in);
public static void main(String[] args)
{
int i;
N=io.nextInt();
// Number of nodes in the Tree
graph=new ArrayList[N];
for(i=0;i<graph.length;++i)
graph[i]=new ArrayList<>();
//Initialisation
for(i=1;i<N;++i)
{
int a=io.nextInt()-1,b=io.nextInt()-1;
// Assuming 1-based indexing
graph[a].add(b); graph[b].add(a);
// Step 1
}
int centroid = findCentroid(new java.util.Random().nextInt(N));
// Arbitrary indeed... ;)
System.out.println("Centroid: "+(centroid+1));
// '+1' for output in 1-based index
}
static int[] queue=new int[MAXN],leftOver;
// Step 3
static int findCentroid(int r)
{
leftOver=new int[N];
int i,target=N/2,ach=-1;
bfs(r); // Step 4
for(i=N-1;i>=0;--i)
if(queue[i]!=r)
leftOver[parent[queue[i]]] += leftOver[queue[i]] +1;
// Step 5
for(i=0;i<N;++i)
leftOver[i] = N-1 -leftOver[i];
// Step 6
for(i=0;i<N;++i)
if(leftOver[i]<=target && leftOver[i]>ach)
// Closest to target(=N/2) but does not exceed it.
{
r=i; ach=leftOver[i];
}
// Step 7
return r;
}
static void bfs(int root) // Iterative
{
parent=new int[N]; depth=new int[N];
int st=0,end=0;
parent[root]=-1; depth[root]=1;
// Parent of root is obviously undefined. Hence -1.
// Assuming depth of root = 1
queue[end++]=root;
while(st<end)
{
int node = queue[st++], h = depth[node]+1;
Iterator<Integer> itr=graph[node].iterator();
while(itr.hasNext())
{
int ch=itr.next();
if(depth[ch]>0) // 'ch' is parent of 'node'
continue;
depth[ch]=h; parent[ch]=node;
queue[end++]=ch; // Recording the Traversal sequence
}
}
}
}
Now, for the problem, http://codeforces.com/contest/709/problem/E, iterate over each node i, consider it as the root, keep descenting down the child which has >N/2 nodes and try to arrive at a node which has just less than N/2 nodes (closest to N/2 nodes) under it. If, on removing this node along with all its children makes 'i' the centroid, print '1' otherwise print 0. This process can be carried out efficiently, as the 'leftOver' array is already there for you.
Actually, you are detaching the disturbing node (the node which is preventing i from being the centroid) along with its children and attaching it to the i-th node itself. The subtree is guaranteed to have atmost N/2 nodes (as checked earlier) and so won't cause a problem now.
Happy Coding..... :)

AVL Trees: How to do index access?

I noticed on the AVL Tree Wikipedia page the following comment:
"If each node additionally records the size of its subtree (including itself and its descendants), then the nodes can be retrieved by index in O(log n) time as well."
I've googled and have found a few places mentioning accessing by index but can't seem to find an explanation of the algorithm one would write.
Many thanks
[UPDATE] Thanks people. If found #templatetypedef answer combined with one of #user448810 links to particularly help. Especially this snipit:
"The key to both these functions is that the index of a node is the size of its left child. As long as we are descending a tree via its left child, we just take the index of the node. But when we have to move down the tree via its right child, we have to adjust the size to include the half of the tree that we have excluded."
Because my implementation is immutable I didn't need to do any additional work when rebalancing as each node calculates it's size on construction (same as the scheme impl linked)
My final implementation ended up being:
class Node<K,V> implements AVLTree<K,V> { ...
public V index(int i) {
if (left.size() == i) return value;
if (i < left.size()) return left.index(i);
return right.index(i - left.size() - 1);
}
}
class Empty<K,V> implements AVLTree<K,V> { ...
public V index(int i) { throw new IndexOutOfBoundsException();}
}
Which is slightly different from the other implementations, let me know if you think I have a bug!
The general idea behind this construction is to take an existing BST and augment each node by storing the number of nodes in the left subtree. Once you have done this, you can look up the nth node in the tree by using the following recursive algorithm:
To look up the nth element in a BST whose root node has k elements in its left subtree:
If k = n, return the root node (since this is the zeroth node in the tree)
If n ≤ k, recursively look up the nth element in the left subtree.
Otherwise, look up the (n - k - 1)st element in the right subtree.
This takes time O(h), where h is the height of the tree. In an AVL tree, this O(log n). In CLRS, this construction is explored as applied to red/black trees, and they call such trees "order statistic trees."
You have to put in some extra logic during tree rotations to adjust the cached number of elements in the left subtree, but this is not particularly difficult.
Hope this helps!

Print binary tree in BFS fashion with O(1) space

I was wondering if it's possible to print a binary tree in breadth first order while using only O(1) space?
The difficult part is that one have to use additional space to memorize the next level to traverse, and that grows with n.
Since we haven't place any limitation on the time part, maybe there are some inefficient (in terms of time) ways that can achieve this?
Any idea?
This is going to depend on some finer-grained definitions, for example if the edges have back-links. Then it's easy, because you can just follow a back link up the tree. Otherwise I can't think off hand of a way to do it without O(lg number of nodes) space, because you need to remember at least the nodes "above".
Update
Oh wait, of course it can be done in O(1) space with a space time trade. Everywhere you would want to do a back link, you save your place and do BFS, tracking the most recent node, until you find yours. Then back up to the most recently visited node and proceed.
Problem is, that's O(1) space but O(n^2) time.
Another update
Let's assume that we've reached node n_i, and we want to reach the parent of that node, which we'll call wlg n_j. We have identified the distinguished root node n_0.
Modify the breath-first search algorithm so that when it follows a directed edge (n_x,n_y), the efferent or "incoming" node is stored. Thus when you follow (n_x,n_y), you save n_x.
When you start the BFS again from n_0, you are guaranteed (assuming it really is a tree) that at SOME point, you will transition the edge (n_j,n_i). At that point you observe you're back at n_i. You've stored n_j and so you know the reverse edge is (n_i,n_j).
Thus, you get that single backtrack with only two extra cells, one for n_0 and one for the "saved" node. This is O(1)
I'm not so sure of O(n^2) -- it's late and it's been a hard day so I don't want to compose a proof. I'm sure it's O((|N|+|E|)^2) where |N| and |E| are the size of the sets of vertices and edges respectively.
An interesting special case is heaps.
From heapq docs:
Heaps are binary trees for which every parent node has a value less
than or equal to any of its children. This implementation uses arrays
for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k,
counting elements from zero. For the sake of comparison, non-existing
elements are considered to be infinite. The interesting property of a
heap is that its smallest element is always the root, heap[0]. [explanation by François Pinard]
How a tree represented in memory (indexes of the array):
0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
In this case nodes in the array are already stored in a breadth first order.
for value in the_heap:
print(value)
O(1) in space.
I know that this is strictly not an answer to the question, but visiting the nodes of a tree in breadth-first order can be done using O(d) space, where d is the depth of the tree, by a recursive iterative deepening depth first search (IDDFS). The space is required for the stack, of course. In the case of a balanced tree, d = O(lg n) where n is the number of nodes. I honestly don't see how you'd do it in constant space without the backlinks suggested by #Charlie Martin.
It is easy to implement a recursive method to get all the nodes of a tree at a given level. Hence, we could calculate the height of the tree and get all the nodes and each level. This is Level Order Traversal of the tree. But, the time complexity is O(n^2). Below is the Java implementation (source).
class Node
{
int data;
Node left, right;
public Node(int item)
{
data = item;
left = right = null;
}
}
class BinaryTree
{
Node root;
public BinaryTree()
{
root = null;
}
void PrintLevelOrder()
{
int h = height(root);
int i;
for (i=1; i<=h; i++)
printGivenLevel(root, i);
}
int Height(Node root)
{
if (root == null)
return 0;
else
{
int lheight = height(root.left);
int rheight = height(root.right);
}
}
void PrintGivenLevel (Node root ,int level)
{
if (root == null)
return;
if (level == 1)
System.out.print(root.data + " ");
else if (level > 1)
{
printGivenLevel(root.left, level-1);
printGivenLevel(root.right, level-1);
}
}
}

Finding last element of a binary heap

quoting Wikipedia:
It is perfectly acceptable to use a
traditional binary tree data structure
to implement a binary heap. There is
an issue with finding the adjacent
element on the last level on the
binary heap when adding an element
which can be resolved
algorithmically...
Any ideas on how such an algorithm might work?
I was not able to find any information about this issue, for most binary heaps are implemented using arrays.
Any help appreciated.
Recently, I have registered an OpenID account and am not able to edit my initial post nor comment answers. That's why I am responding via this answer. Sorry for this.
quoting Mitch Wheat:
#Yse: is your question "How do I find
the last element of a binary heap"?
Yes, it is.
Or to be more precise, my question is: "How do I find the last element of a non-array-based binary heap?".
quoting Suppressingfire:
Is there some context in which you're
asking this question? (i.e., is there
some concrete problem you're trying to
solve?)
As stated above, I would like to know a good way to "find the last element of a non-array-based binary heap" which is necessary for insertion and deletion of nodes.
quoting Roy:
It seems most understandable to me to
just use a normal binary tree
structure (using a pRoot and Node
defined as [data, pLeftChild,
pRightChild]) and add two additional
pointers (pInsertionNode and
pLastNode). pInsertionNode and
pLastNode will both be updated during
the insertion and deletion subroutines
to keep them current when the data
within the structure changes. This
gives O(1) access to both insertion
point and last node of the structure.
Yes, this should work. If I am not mistaken, it could be a little bit tricky to find the insertion node and the last node, when their locations change to another subtree due to an deletion/insertion. But I'll give this a try.
quoting Zach Scrivena:
How about performing a depth-first
search...
Yes, this would be a good approach. I'll try that out, too.
Still I am wondering, if there is a way to "calculate" the locations of the last node and the insertion point. The height of a binary heap with N nodes can be calculated by taking the log (of base 2) of the smallest power of two that is larger than N. Perhaps it is possible to calculate the number of nodes on the deepest level, too. Then it was maybe possible to determine how the heap has to be traversed to reach the insertion point or the node for deletion.
Basically, the statement quoted refers to the problem of resolving the location for insertion and deletion of data elements into and from the heap. In order to maintain "the shape property" of a binary heap, the lowest level of the heap must always be filled from left to right leaving no empty nodes. To maintain the average O(1) insertion and deletion times for the binary heap, you must be able to determine the location for the next insertion and the location of the last node on the lowest level to use for deletion of the root node, both in constant time.
For a binary heap stored in an array (with its implicit, compacted data structure as explained in the Wikipedia entry), this is easy. Just insert the newest data member at the end of the array and then "bubble" it into position (following the heap rules). Or replace the root with the last element in the array "bubbling down" for deletions. For heaps in array storage, the number of elements in the heap is an implicit pointer to where the next data element is to be inserted and where to find the last element to use for deletion.
For a binary heap stored in a tree structure, this information is not as obvious, but because it's a complete binary tree, it can be calculated. For example, in a complete binary tree with 4 elements, the point of insertion will always be the right child of the left child of the root node. The node to use for deletion will always be the left child of the left child of the root node. And for any given arbitrary tree size, the tree will always have a specific shape with well defined insertion and deletion points. Because the tree is a "complete binary tree" with a specific structure for any given size, it is very possible to calculate the location of insertion/deletion in O(1) time. However, the catch is that even when you know where it is structurally, you have no idea where the node will be in memory. So, you have to traverse the tree to get to the given node which is an O(log n) process making all inserts and deletions a minimum of O(log n), breaking the usually desired O(1) behavior. Any search ("depth-first", or some other) will be at least O(log n) as well because of the traversal issue noted and usually O(n) because of the random nature of the semi-sorted heap.
The trick is to be able to both calculate and reference those insertion/deletion points in constant time either by augmenting the data structure ("threading" the tree, as mention in the Wikipedia article) or using additional pointers.
The implementation which seems to me to be the easiest to understand, with low memory and extra coding overhead, is to just use a normal simple binary tree structure (using a pRoot and Node defined as [data, pParent, pLeftChild, pRightChild]) and add two additional pointers (pInsert and pLastNode). pInsert and pLastNode will both be updated during the insertion and deletion subroutines to keep them current when the data within the structure changes. This implementation gives O(1) access to both insertion point and last node of the structure and should allow preservation of overall O(1) behavior in both insertion and deletions. The cost of the implementation is two extra pointers and some minor extra code in the insertion/deletion subroutines (aka, minimal).
EDIT: added pseudocode for an O(1) insert()
Here is pseudo code for an insert subroutine which is O(1), on average:
define Node = [T data, *pParent, *pLeft, *pRight]
void insert(T data)
{
do_insertion( data ); // do insertion, update count of data items in tree
# assume: pInsert points node location of the tree that where insertion just took place
# (aka, either shuffle only data during the insertion or keep pInsert updated during the bubble process)
int N = this->CountOfDataItems + 1; # note: CountOfDataItems will always be > 0 (and pRoot != null) after an insertion
p = new Node( <null>, null, null, null); // new empty node for the next insertion
# update pInsert (three cases to handle)
if ( int(log2(N)) == log2(N) )
{# #1 - N is an exact power of two
# O(log2(N))
# tree is currently a full complete binary tree ("perfect")
# ... must start a new lower level
# traverse from pRoot down tree thru each pLeft until empty pLeft is found for insertion
pInsert = pRoot;
while (pInsert->pLeft != null) { pInsert = pInsert->pLeft; } # log2(N) iterations
p->pParent = pInsert;
pInsert->pLeft = p;
}
else if ( isEven(N) )
{# #2 - N is even (and NOT a power of 2)
# O(1)
p->pParent = pInsert->pParent;
pInsert->pParent->pRight = p;
}
else
{# #3 - N is odd
# O(1)
p->pParent = pInsert->pParent->pParent->pRight;
pInsert->pParent->pParent->pRight->pLeft = p;
}
pInsert = p;
// update pLastNode
// ... [similar process]
}
So, insert(T) is O(1) on average: exactly O(1) in all cases except when the tree must be increased by one level when it is O(log N), which happens every log N insertions (assuming no deletions). The addition of another pointer (pLeftmostLeaf) could make insert() O(1) for all cases and avoids the possible pathologic case of alternating insertion & deletion in a full complete binary tree. (Adding pLeftmost is left as an exercise [it's fairly easy].)
My first time to participate in stack overflow.
Yes, the above answer by Zach Scrivena (god I don't know how to properly refer to other people, sorry) is right. What I want to add is a simplified way if we are given the count of nodes.
The basic idea is:
Given the count N of nodes in this full binary tree, do "N % 2" calculation and push the results into a stack. Continue the calculation until N == 1. Then pop the results out. The result being 1 means right, 0 means left. The sequence is the route from root to target position.
Example:
The tree now have 10 nodes, I want insert another node at position 11. How to route it?
11 % 2 = 1 --> right (the quotient is 5, and push right into stack)
5 % 2 = 1 --> right (the quotient is 2, and push right into stack)
2 % 2 = 0 --> left (the quotient is 1, and push left into stack. End)
Then pop the stack: left -> right -> right. This is the path from the root.
You could use the binary representation of the size of the Binary Heap to find the location of the last node in O(log N). The size could be stored and incremented which would take O(1) time. The the fundamental concept behind this is the structure of the binary tree.
Suppose our heap size is 7. The binary representation of 7 is, "111". Now, remember to always omit the first bit. So, now we are left with "11". Read from left-to-right. The bit is '1', so, go to the right child of the root node. Then the string left is "1", the first bit is '1'. So, again go to the right child of the current node you are at. As you no longer have bits to process, this indicates that you have reached the last node. So, the raw working of the process is that, convert the size of the heap into bits. Omit the first bit. According to the leftmost bit, go to the right child of the current node if it is '1', and to the left child of the current node if it is '0'.
As you always to to the very end of the binary tree this operation always takes O(log N) time. This is a simple and accurate procedure to find the last node.
You may not understand it in the first reading. Try working this method on the paper for different values of Binary Heap, I'm sure you'll get the intuition behind it. I'm sure this knowledge is enough to solve your problem, if you want more explanation with figures, you can refer to my blog.
Hope my answer has helped you, if it did, let me know...! ☺
How about performing a depth-first search, visiting the left child before the right child, to determine the height of the tree. Thereafter, the first leaf you encounter with a shorter depth, or a parent with a missing child would indicate where you should place the new node before "bubbling up".
The depth-first search (DFS) approach above doesn't assume that you know the total number of nodes in the tree. If this information is available, then we can "zoom-in" quickly to the desired place, by making use of the properties of complete binary trees:
Let N be the total number of nodes in the tree, and H be the height of the tree.
Some values of (N,H) are (1,0), (2,1), (3,1), (4,2), ..., (7,2), (8, 3).
The general formula relating the two is H = ceil[log2(N+1)] - 1.
Now, given only N, we want to traverse from the root to the position for the new node, in the least number of steps, i.e. without any "backtracking".
We first compute the total number of nodes M in a perfect binary tree of height H = ceil[log2(N+1)] - 1, which is M = 2^(H+1) - 1.
If N == M, then our tree is perfect, and the new node should be added in a new level. This means that we can simply perform a DFS (left before right) until we hit the first leaf; the new node becomes the left child of this leaf. End of story.
However, if N < M, then there are still vacancies in the last level of our tree, and the new node should be added to the leftmost vacant spot.
The number of nodes that are already at the last level of our tree is just (N - 2^H + 1).
This means that the new node takes spot X = (N - 2^H + 2) from the left, at the last level.
Now, to get there from the root, you will need to make the correct turns (L vs R) at each level so that you end up at spot X at the last level. In practice, you would determine the turns with a little computation at each level. However, I think the following table shows the big picture and the relevant patterns without getting mired in the arithmetic (you may recognize this as a form of arithmetic coding for a uniform distribution):
0 0 0 0 0 X 0 0 <--- represents the last level in our tree, X marks the spot!
^
L L L L R R R R <--- at level 0, proceed to the R child
L L R R L L R R <--- at level 1, proceed to the L child
L R L R L R L R <--- at level 2, proceed to the R child
^ (which is the position of the new node)
this column tells us
if we should proceed to the L or R child at each level
EDIT: Added a description on how to get to the new node in the shortest number of steps assuming that we know the total number of nodes in the tree.
Solution in case you don't have reference to parent !!!
To find the right place for next node you have 3 cases to handle
case (1) Tree level is complete Log2(N)
case (2) Tree node count is even
case (3) Tree node count is odd
Insert:
void Insert(Node root,Node n)
{
Node parent = findRequiredParentToInsertNewNode (root);
if(parent.left == null)
parent.left = n;
else
parent.right = n;
}
Find the parent of the node in order to insert it
void findRequiredParentToInsertNewNode(Node root){
Node last = findLastNode(root);
//Case 1
if(2*Math.Pow(levelNumber) == NodeCount){
while(root.left != null)
root=root.left;
return root;
}
//Case 2
else if(Even(N)){
Node n =findParentOfLastNode(root ,findParentOfLastNode(root ,last));
return n.right;
}
//Case 3
else if(Odd(N)){
Node n =findParentOfLastNode(root ,last);
return n;
}
}
To find the last node you need to perform a BFS (breadth first search) and get the last element in the queue
Node findLastNode(Node root)
{
if (root.left == nil)
return root
Queue q = new Queue();
q.enqueue(root);
Node n = null;
while(!q.isEmpty()){
n = q.dequeue();
if ( n.left != null )
q.enqueue(n.left);
if ( n.right != null )
q.enqueue(n.right);
}
return n;
}
Find the parent of the last node in order to set the node to null in case replacing with the root in removal case
Node findParentOfLastNode(Node root ,Node lastNode)
{
if(root == null)
return root;
if( root.left == lastNode || root.right == lastNode )
return root;
Node n1= findParentOfLastNode(root.left,lastNode);
Node n2= findParentOfLastNode(root.left,lastNode);
return n1 != null ? n1 : n2;
}
I know this is an old thread but i was looking for a answer to the same question. But i could not afford to do an o(log n) solution as i had to find the last node thousands of times in a few seconds. I did have a O(log n) algorithm but my program was crawling because of the number of times it performed this operation. So after much thought I did finally find a fix for this. Not sure if anybody things this is interesting.
This solution is O(1) for search. For insertion it is definitely less than O(log n), although I cannot say it is O(1).
Just wanted to add that if there is interest, i can provide my solution as well.
The solution is to add the nodes in the binary heap to a queue. Every queue node has front and back pointers.We keep adding nodes to the end of this queue from left to right until we reach the last node in the binary heap. At this point, the last node in the binary heap will be in the rear of the queue.
Every time we need to find the last node, we dequeue from the rear,and the second-to-last now becomes the last node in the tree.
When we want to insert, we search backwards from the rear for the first node where we can insert and put it there. It is not exactly O(1) but reduces the running time dramatically.

Resources