Average number of descendant nodes in a complete and full binary tree

Average number of descendant nodes in a complete and full binary tree - binary-tree

Given a complete and full binary tree with n nodes, what is the average number of descendants a node has? For example, the root node has n - 1 descendants and each leaf node has 0 descendants, but considering all nodes, what is the average?

Let's change a little your question for a short moment: we call "descendants" will be the number of descendants including the node itself.
A leaf has 1 descendant, its parent has 3 descendants, then 7, then 15, etc. These numbers are of kind 2^k - 1 where k is the number of levels from the bottom of the tree (with k=1 fot the leaves).
For K levels, you have obviously 2^K-1 nodes in your tree; since you called this value n, you have n=2^K-1 and K=log2(n+1).
Still calling a descendant as I previously did (see later for your exact question), you have 1 descendant for 2^(K-1) nodes (the leaves), 3 descendants for 2^(K-2) nodes, ... until n descendants for 2^(K-K)=1 node.
The average number of descendants will be:
Sum(k=1,K, 2^(K-k) * (2^k-1) ) / n
According to your definition of the "descendants" (excluding the node itself) you have to substract 1:
Sum(k=1,K, 2^(K-k) * (2^k-1) ) / n - 1
And by replacing K, you get:
Sum(k=1,log2(n+1), (2^k-1)(n+1) / 2^k ) / n - 1
With the program Pari-GP, I type:
f(n)=sum(k=1,round(log(n+1)/log(2)), (2^k-1)*(n+1) / 2^k ) / n - 1
and I get:
f(1)=0
f(3)=2/3
f(7)=10/7
f(15)=34/15
which looks like A036799(n)/n. If the sequence is actually the same (I didn't check carefully), you could write the simpler expression (without the sum):
f(n)= ((log(n+1)/log(2)-2)*(n+1)+2)/n

Related

Big O time complexity for two inserts in an array in a loop?

Each insert to a python list is 0(n), so for the below snippet of code is the worst case time-complexity O(n+ 2k) or O(nk)? Where k is the elements, we move during the insert.
def bfs_binary_tree(root):
queue=[root]
result=[]
while queue:
node = queue.pop()
result.append(node.val)
if node.left :
queue.insert(0, node.left)
if node.right:
queue.insert(0, node.right)
return result
I am using arrays as FIFO queue, but inserting each element at the start of the list has O(k) complexity, so trying to figure out the total complexity for n elements in the queue.

Since each node ends up in the queue at most once, the outer loop will execute n times (where n is the number of nodes in the tree).
Two inserts are performed during each iteration of the loop and these inserts will require size_of_queue + 1 steps.
So we have n steps and size_of_queue steps as the two variables of interest.
The question is: the size of the queue changes, so what is the overall runtime complexity?
Well, the size of the queue will continuously grow until it is full of leaf nodes, which is the upper bound of the size of the queue. Since the number of leaf nodes is the upper bound of the queue, we know that the queue will never be larger than that.
Therefore, we know that the algorithm will never take more than n * leaf nodes steps. This is our upper bound.
So let's find out what the relationship between n and leaf_nodes is.
Note: I am assuming a balanced complete binary tree
The number of nodes at any level of a balanced binary tree with a height of at least 1 (the root node) is: 2^level. The max level of a tree is called its depth.
For example, a tree with a root and two children has 2 levels (0 and 1) and therefore has a depth of 1 and a height of 2.
Thhe total number of nodes in a tree (2^(depth+1))-1 (-1 because level 0 only has one node).
n=2^(depth+1)-1
We can also use this relationship to identify the depth of the balanced binary tree, given the total number of nodes:
If n=2^(depth+1) - 1
n + 1 = 2^(depth+1)
log(n+1) = depth+1 = number of levels, including the root. Subtract 1 to get the depth (ie., the max level) (in a balanced tree with 4 levels, level 3 is the max level because root is level 0).
What do we have so far
number_of_nodes = 2^(depth+1) - 1
depth = log(number_of_nodes)
number_of_nodes_at_level_k = 2^k
What we need
A way to derive the number of leaf nodes.
Since the depth == last_level and since the number_of_nodes_at_level_k = 2^k, it follows that the number of nodes at the last level (the leaf nodes) = 2^depth
So: leaf_nodes = 2^depth
Your runtime complexity is n * leaf_nodes = n * 2^depth = n * 2^(log n) = n * n = n^2.

Number of comparisons to find an element in a BST with 635 elements?

I am a freshman in Computer Science University, so please give me a understandable justification.
I have a binary tree that is equilibrated by height which has 635 nodes. What is the number of comparisons that will occur in the worst case scenario and why?

Here's one way to think about this. Every time you do a comparison in a binary search tree, one of the following happens:
You have walked off the tree. In this case, you're done.
The value you're looking for matches the node you're currently exploring. In this case, you're done.
The value you're looking for does not match the node you're exploring. In that case, you either descend to the left or descend to the right.
The key observation here is that after each step, you either terminate (yay!) or descend lower in the tree. At each point, you make one comparison. Since you can't descend forever, there are only so many comparisons that you can make - specifically, if the tree has height h, the maximum number of comparisons you can make is h + 1, which happens if you do one comparison per level.
In your question, you're given that you have a balanced binary search tree of 635 nodes. It's not 100% clear what "balanced" means in this context, since there are many different ways of determining whether a tree is balanced and they all lead to different tree heights. I'm going to assume that you are given a complete binary search tree, which is one in which all levels except the last are filled.
The reason this is important is that if you have a complete binary search tree of height h, it can have at most 2h + 1 - 1 nodes in it. If we try to solve for the height of the tree in terms of the number of nodes, we get this:
n = 2h+1 - 1
n + 1 = 2h+1
lg (n + 1) = h + 1
lg (n + 1) - 1 = h
Therefore, if you have the number of nodes n, you can determine the minimum height of a complete binary search tree holding n nodes. In your case, n = 635, so we get
lg (635 + 1) - 1 = h
lg (636) - 1 = h
9.312882955 - 1 = h
8.312882955 = h
Therefore, the tree has height 8.312882955. Of course, trees can't have fractional height, so we can take the ceiling to find that the height of the tree would be 9. Since the maximum number of comparisons made is h + 1, there are at most 10 comparisons made when doing a lookup.
Hope this helps!

Without any loss of generality you can say the maximum no. of comparison will be the height of the BST ... you dont have to visit every node in the node because each comparison takes you closer to the node...

Let's say it is a balanced BST (all nodes except last have 2 child nodes).
For instance,
Level 0 --> Height 1 --> Number of nodes = 1
Level 1 --> Height 2 --> Number of nodes = 2
Level 2 --> Height 3 --> Number of nodes = 3
Level 3 --> Height 4 --> Number of nodes = 8
......
......
Level n --> Height n+1 --> Number of nodes = 2^n or 2^(h-1)
Using the above logic, you can derive the search time for best, worst or average case.

Prove binary tree properties using induction

I am having trouble proving binary tree properties using induction:
Property 1 - A tree with N internal nodes has a maximum height of N+1
base case - 0 internal nodes has a height of 0
assume - a tree with k internal nodes has a maximum height of k+1 where k exists in n
show - true for all k+1 or true for all k-1
Property 2 - A tree with a tree with N internal nodes has N + 1 leaf nodes
base case - 0 internal nodes has 1 leaf node (null)
assume - a tree with k internal nodes has k + 1 leaf nodes where k exists in n
show - true for all k+1 or true for all k-1
Is my setup correct? And if so, how do I actually show these things. Everything I have tried has just ended up becoming a mess. Thanks for the help

You're missing a few things.
For property 1, your base case must be consistent with what you're trying to prove. So a tree with 0 internal nodes must have a height of at most 0+1=1. This is true: consider a tree with only a root.
For the inductive step, consider a tree with k-1 internal nodes. If its height is at most k (assumed), adding one more node can only raise it to k+1, and no higher. So the k-internal-node tree has height at most k+1.
Property 2 is false. Counterexample:
5
4 6
3 7
2 8
5 internal nodes, 2 leaves.
So there's probably something different you wanted to prove.

binary tree data structures

Can anybody give me proof how the number of nodes in strictly binary tree is 2n-1 where n is the number of leaf nodes??

Proof by induction.
Base case is when you have one leaf. Suppose it is true for k leaves. Then you should proove for k+1. So you get the new node, his parent and his other leaf (by definition of strict binary tree). The rest leaves are k-1 and then you can use the induction hypothesis. So the actual number of nodes are 2*(k-1) + 3 = 2k+1 == 2*(k+1)-1.

just go with the basics, assuming there are x nodes in total, then we have n nodes with degree 1(leaves), 1 with degree 2(the root) and x-n-1 with degree 3(the inner nodes)
as a tree with x nodes will have x-1 edges. so summing
n + 3*(x-n-1) + 2 = 2(x-1) (equating the total degrees)
solving for x we get x = 2n-1

I'm guessing that what you really want is something like a proof that the depth is log2(N), where N is the number of nodes. In this case, the answer is fairly simple: for any given depth D, the number of nodes is 2D.
Edit: in response to edited question: the same fact pretty much applies. Since the number of nodes at any depth is 2D, the number of nodes further up the tree is 2D-1 + 2D-2 + ...20 = 2D-1. Therefore, the total number of nodes in a balanced binary tree is 2D + 2D-1. If you set n = 2D, you've gone the full circle back to the original equation.

I think you are trying to work out a proof for: N = 2L - 1 where L is the number
of leaf nodes and N is the total number of nodes in a binary tree.
For this formula to hold you need to put a few restrictions on how the binary
tree is constructed. Each node is either a leaf, which means it has no children, or
it is an internal node. Internal nodes have 3
possible configurations:
2 child nodes
1 child and 1 internal node
2 internal nodes
All three configurations imply that an internal node connects to two other nodes. This explicitly
rules out the situation where node connects to a single child as in:
o
/
o
Informal Proof
Start with a minimal tree of 1 leaf: L = 1, N = 1 substitute into N = 2L - 1 and the see that
the formula holds true (1 = 1, so far so good).
Now add another minimal chunk to the tree. To do that you need to add another two nodes and
tree looks like:
o
/ \
o o
Notice that you must add nodes in pairs to satisfy the restriction stated earlier.
Adding a pair of nodes always adds
one leaf (two new leaf nodes, but you loose one as it becomes an internal node). Node growth
progresses as the series: 1, 3, 5, 7, 9... but leaf growth is: 1, 2, 3, 4, 5... That is why the formula
N = 2L - 1 holds for this type of tree.
You might use mathematical induction to construct a formal proof, but this works find for me.

Proof by mathematical induction:
The statement that there are (2n-1) of nodes in a strictly binary tree with n leaf nodes is true for n=1. { tree with only one node i.e root node }
let us assume that the statement is true for tree with n-1 leaf nodes. Thus the tree has 2(n-1)-1 = 2n-3 nodes
to form a tree with n leaf nodes we need to add 2 child nodes to any of the leaf nodes in the above tree. Thus the total number of nodes = 2n-3+2 = 2n-1.
hence, proved

To prove: A strictly binary tree with n leaves contains 2n-1 nodes.
Show P(1): A strictly binary tree with 1 leaf contains 2(1)-1 = 1 node.
Show P(2): A strictly binary tree with 2 leaves contains 2(2)-1 = 3 nodes.
Show P(3): A strictly binary tree with 3 leaves contains 2(3)-1 = 5 nodes.
Assume P(K): A strictly binary tree with K leaves contains 2K-1 nodes.
Prove P(K+1): A strictly binary tree with K+1 leaves contains 2(K+1)-1 nodes.
2(K+1)-1 = 2K+2-1
= 2K+1
= 2K-1 +2*
* This result indicates that, for each leaf that is added, another node must be added to the father of the leaf , in order for it to continue to be a strictly binary tree. So, for every additional leaf, a total of two nodes must be added, as expected.

int N = 1000; insert here the value of N
int sum = 0; // the number of total nodes
int currFactor = 1;
for (int i = 0; i< log(N); ++i) //the is log(N) levels
{
sum += currFactor;
currFactor *= 2; //in each level the number of node is double than the upper level
}
if(sum == 2*N - 1)
{
cout<<"wow that the number of nodes is 2*N-1";
}

Total number of nodes in a tree data structure?

I have a tree data structure that is L levels deep each node has about N nodes. I want to work-out the total number of nodes in the tree. To do this (I think) I need to know what percentage of the nodes that will have children.
What is the correct term for this ratio of leaf nodes to non-leaf nodes in N?
What is the formula for working out the total number nodes in the three?
Update Someone mention Branching factor in one of the answer but it then disappeared. I think this was the term I was looking for. So shouldn't a formula take the branching factor into account?
Update I should have said an estimate about a hypothetical datastructure, not the exact figure!

Ok, each node has about N subnodes and the tree is L levels deep.
With 1 level, the tree has 1 node.
With 2 levels, the tree has 1 + N nodes.
With 3 levels, the tree has 1 + N + N^2 nodes.
With L levels, the tree has 1 + N + N^2 + ... + N^(L-1) nodes.
The total number of nodes is (N^L-1) / (N-1).
Ok, just a small example why, it is exponential:
[NODE]
|
/|\
/ | \
/ | \
/ | \
[NODE] [NODE] [NODE]
|
/|\
/ | \

Just to correct a typo in the first answer: the total number of nodes for a tree of depth L is (N^(L+1)-1) / (N-1)... (that is, to the power L+1 rather than just L).
This can be shown as follows. First, take our theorem:
1 + N^1 + N^2 + ... + N^L = (N^(L+1)-1)/(N-1)
Multiply both sides by (N-1):
(N-1)(1 + N^1 + N^2 + ... + N^L) = N^(L+1)-1.
Expand the left side:
N^1 + N^2 + N^3 + ... + N^(L+1) - 1 - N^1 - N^2 - ... - N^L.
All terms N^1 to N^L are cancelled out, which leaves N^(L+1) - 1. This is our right hand side, so the initial equality is true.

If your tree is approximately full, that is every level has its full complement of children except for the last two, then you have between N^(L-2) and N^(L-1) leaf nodes and between N^(L-1) and N^L nodes total.
If your tree is not full, then knowing the number of leaf nodes doesn't help as a totally unbalanced tree will have one leaf node but arbitrarily many parents.
I wonder how precise your statement 'each node has about N nodes' is - if you know the average branching factor, perhaps you can compute the expected size of the tree.
If you are able to find the ratio of leaves to internal nodes, and you know the average number of children, you can approximate this as (n*ratio)^N = n. This won't give you your answer, but I wonder if someone with better maths than me can figure out a way to interpose L into this equation and give you something soluble.
Still, if you want to know precisely, you must iterate over the structure of the tree and count nodes as you go.

The formula for calculating the amount of nodes in depth L is: (Given that there are N root nodes)
NL
To calculate the number of all nodes one needs to do this for every layer:
for depth in (1..L)
nodeCount += N ** depth
If there's only 1 root node, subtract 1 from L and add 1 to the total nodes count.
Be aware that if in one node the amount of leaves is different from the average case this can have a big impact on your number. The further up in the tree the more impact.
* * * N ** 1
*** *** *** N ** 2
*** *** *** *** *** *** *** *** *** N ** 3
This is community wiki, so feel free to alter my appalling algebra.

Knuth's estimator [1],[2] is a point estimate that targets the number of nodes in an arbitrary finite tree without needing to go through all of the nodes and even if the tree is not balanced. Knuth's estimator is an example of an unbiased estimator; the expected value of Knuth's estimator will be the number of nodes in the tree. With that being said, Knuth's estimator may have a large variance if the tree in question is unbalanced, but in your case, since each node will have around N children, I do not think the variance of Knuth's estimator should be too large. This estimator is especially helpful when one is trying to measure the amount of time it will take to perform a brute force search.
For the following functions, we shall assume all trees are represented as lists of lists.
For example, [] denotes the tree with the single node, and [[],[[],[]]] will denote a tree with 5 nodes and 3 leaves (the nodes in the tree are in a one-to-one correspondence with the left brackets). The following functions are written in the language GAP.
The function simpleestimate gives an output an estimate for the number of nodes in the tree tree. The idea behind simpleestimate is that we randomly choose a path x_0,x_1,...,x_n from the root x_0 of the tree to a leaf x_n. Suppose that x_i has a_i successors. Then simpleestimate will return 1+a_1+a_1*a_2+...+a_1*a_2*…*a_n.
point:=tree; prod:=1; count:=1; list:=[];
while Length(point)>0 do prod:=prod*Length(point); count:=count+prod; point:=Random(point); od;
return count; end;
The function estimate will simply give the arithmetical mean of the estimates given by applying the function simpleestimate(tree) samplesize many times.
estimate:=function(samplesize,tree) local count,i;
count:=0;
for i in [1..samplesize] do count:=count+simpleestimate(tree); od;
return Float(count/samplesize); end;
Example: simpleestimate([[[],[[],[]]],[[[],[]],[]]]); returns 15 while
estimate(10000,[[[],[[],[]]],[[[],[]],[]]]); returns 10.9608 (and the tree actually does have 11 nodes).
Estimating Search Tree Size.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.5569&rep=rep1&type=pdf
Estimating the Efficiency of Backtrack Programs. Donald E. Knuth
http://www.ams.org/journals/mcom/1975-29-129/S0025-5718-1975-0373371-6/S0025-5718-1975-0373371-6.pdf

If you know nothing else but the depth of the tree then your only option for working out the total size is to go through and count them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio