Total number of nodes in a tree data structure? - data-structures

I have a tree data structure that is L levels deep each node has about N nodes. I want to work-out the total number of nodes in the tree. To do this (I think) I need to know what percentage of the nodes that will have children.
What is the correct term for this ratio of leaf nodes to non-leaf nodes in N?
What is the formula for working out the total number nodes in the three?
Update Someone mention Branching factor in one of the answer but it then disappeared. I think this was the term I was looking for. So shouldn't a formula take the branching factor into account?
Update I should have said an estimate about a hypothetical datastructure, not the exact figure!

Ok, each node has about N subnodes and the tree is L levels deep.
With 1 level, the tree has 1 node.
With 2 levels, the tree has 1 + N nodes.
With 3 levels, the tree has 1 + N + N^2 nodes.
With L levels, the tree has 1 + N + N^2 + ... + N^(L-1) nodes.
The total number of nodes is (N^L-1) / (N-1).
Ok, just a small example why, it is exponential:
[NODE]
|
/|\
/ | \
/ | \
/ | \
[NODE] [NODE] [NODE]
|
/|\
/ | \

Just to correct a typo in the first answer: the total number of nodes for a tree of depth L is (N^(L+1)-1) / (N-1)... (that is, to the power L+1 rather than just L).
This can be shown as follows. First, take our theorem:
1 + N^1 + N^2 + ... + N^L = (N^(L+1)-1)/(N-1)
Multiply both sides by (N-1):
(N-1)(1 + N^1 + N^2 + ... + N^L) = N^(L+1)-1.
Expand the left side:
N^1 + N^2 + N^3 + ... + N^(L+1) - 1 - N^1 - N^2 - ... - N^L.
All terms N^1 to N^L are cancelled out, which leaves N^(L+1) - 1. This is our right hand side, so the initial equality is true.

If your tree is approximately full, that is every level has its full complement of children except for the last two, then you have between N^(L-2) and N^(L-1) leaf nodes and between N^(L-1) and N^L nodes total.
If your tree is not full, then knowing the number of leaf nodes doesn't help as a totally unbalanced tree will have one leaf node but arbitrarily many parents.
I wonder how precise your statement 'each node has about N nodes' is - if you know the average branching factor, perhaps you can compute the expected size of the tree.
If you are able to find the ratio of leaves to internal nodes, and you know the average number of children, you can approximate this as (n*ratio)^N = n. This won't give you your answer, but I wonder if someone with better maths than me can figure out a way to interpose L into this equation and give you something soluble.
Still, if you want to know precisely, you must iterate over the structure of the tree and count nodes as you go.

The formula for calculating the amount of nodes in depth L is: (Given that there are N root nodes)
NL
To calculate the number of all nodes one needs to do this for every layer:
for depth in (1..L)
nodeCount += N ** depth
If there's only 1 root node, subtract 1 from L and add 1 to the total nodes count.
Be aware that if in one node the amount of leaves is different from the average case this can have a big impact on your number. The further up in the tree the more impact.
* * * N ** 1
*** *** *** N ** 2
*** *** *** *** *** *** *** *** *** N ** 3
This is community wiki, so feel free to alter my appalling algebra.

Knuth's estimator [1],[2] is a point estimate that targets the number of nodes in an arbitrary finite tree without needing to go through all of the nodes and even if the tree is not balanced. Knuth's estimator is an example of an unbiased estimator; the expected value of Knuth's estimator will be the number of nodes in the tree. With that being said, Knuth's estimator may have a large variance if the tree in question is unbalanced, but in your case, since each node will have around N children, I do not think the variance of Knuth's estimator should be too large. This estimator is especially helpful when one is trying to measure the amount of time it will take to perform a brute force search.
For the following functions, we shall assume all trees are represented as lists of lists.
For example, [] denotes the tree with the single node, and [[],[[],[]]] will denote a tree with 5 nodes and 3 leaves (the nodes in the tree are in a one-to-one correspondence with the left brackets). The following functions are written in the language GAP.
The function simpleestimate gives an output an estimate for the number of nodes in the tree tree. The idea behind simpleestimate is that we randomly choose a path x_0,x_1,...,x_n from the root x_0 of the tree to a leaf x_n. Suppose that x_i has a_i successors. Then simpleestimate will return 1+a_1+a_1*a_2+...+a_1*a_2*…*a_n.
point:=tree; prod:=1; count:=1; list:=[];
while Length(point)>0 do prod:=prod*Length(point); count:=count+prod; point:=Random(point); od;
return count; end;
The function estimate will simply give the arithmetical mean of the estimates given by applying the function simpleestimate(tree) samplesize many times.
estimate:=function(samplesize,tree) local count,i;
count:=0;
for i in [1..samplesize] do count:=count+simpleestimate(tree); od;
return Float(count/samplesize); end;
Example: simpleestimate([[[],[[],[]]],[[[],[]],[]]]); returns 15 while
estimate(10000,[[[],[[],[]]],[[[],[]],[]]]); returns 10.9608 (and the tree actually does have 11 nodes).
Estimating Search Tree Size.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.5569&rep=rep1&type=pdf
Estimating the Efficiency of Backtrack Programs. Donald E. Knuth
http://www.ams.org/journals/mcom/1975-29-129/S0025-5718-1975-0373371-6/S0025-5718-1975-0373371-6.pdf

If you know nothing else but the depth of the tree then your only option for working out the total size is to go through and count them.

Related

Job interview question in 2-3 trees (B trees)

This is a part of job interview question which got harder in its second part.
Given two 2-3 trees T1 and T2 such that for each tree h in known (h for height) and m, M for each tree are known too (m for minimum and M for maximum), Plus that each node in T1 < every node in T2.
I was asked to find an algorithm to join both of them into one tree in O(|h1-h2|+1)
This one was quite easy, and I have to point that this algorithm may result in a tree with h bigger than the previous two.
Now, I was given k 2-3 trees (T1,T2,T3...Tk) with the same exact conditions plus knowing that h_1<=h_2<=...<=h_k and that no three trees share the same height to join them in O(h_k-h_1+k).
At first I thought about using the previou algorithm to join the first two together then to join the third to the result and so on but I felt that something is going wrong here since I didn't utilise the fact that "no three trees share the same height".
What I am missing here?
Your solution is correct, but it would not be if you had more than 2 trees of the same height. For example, if you have k trees of identical height, then the first two would be indeed merged in O(h_1 - h_1) = O(1) time, but the resulting height can become h_1 + 1. While it only might become, or it might not so let me show that it is possible that everything goes wrong.
The maximum amount of keys we can have inside a tree of height n is 3^(n+1)-1. That's because each vertex has at most 3 subtrees, therefore i-th level has 3^i vertices, adding n levels would result in (3^(n + 1) - 1)/2 vertices. Because each vertex has 2 keys in such a scenario the total number of keys is 3^(n + 1) - 1.
Therefore, if we merge 4 such maximum trees, we would for sure get a tree with height increased by 2, 16 merged trees to get height increased by 3, and so on. Thus, while the first 3 merges are done in constant time, the next 12 ones are done twice slower, and the next 48 are done 3 times slower, and so on. You would do Ω(i) operations Ω(3^(i+1) - 3^i) times for each i starting from 1 and up to log(k).
Because Ω(3^log(k)) = Ω(k) this sum is definitely Ω(k log k), therefore inappropriate for given asymptotic bounds.
When no 3 trees share the same height, this problem does not occur, because whenever you merge two trees the resulting height is max(h_i, h_(i+1)) + 1 = h_(i+1) + 1, and h_(i+3) >= h_(i+1) + 1, therefore the height of merged part never goes one above the next tree, and that's where +k part gets from in asymptotic bound.

Minimum nodes in a n-ary tree

I know the formula for finding the maximum number of nodes in a n-ary tree given the height ( a tree containing either n nodes or none per node): (N^h+1)-1/(N-1). But how would you find the minimum number of nodes in a tree given the height and N.
I am assuming that this isn't homework (since you didn't phrase it as if it were) but is to satisfy your curiosity.
To start the tree you need 1 node. To keep it growing you need a minimum of N new nodes per level. Since you are trying to minimize the total number of nodes, the total is thus
1 + N + N + .... + N = 1 + h*N
Intuitively, the minimum number is a tree in which only one node has children and the rest have none.

Number of nodes of a tree where each node has two children nodes

I have a tree that has the following form:
On the first pictures, the height of the tree is 1 and there are 3 total nodes. 2 for 7 on the next and 3 for 15 for the last one. How can I determine how many number of nodes a tree of this form of l height will have? Also, what kind of tree is that (is it called something in particular?)?
That is a perfect binary tree
You can get the number of node considering this recursive approch:
n(0) = 1
n(l+1) = n(l) + 2^l
so
n(l) = 2^(l+1) - 1
A complete binary tree at depth ‘d’ is the strictly binary tree, where all the leaves are at level d.
for fig1, d=1
for fig2, d=2
for fig3, d=3
So, Suppose a binary tree T of depth d. Then at most 2(d+1)-1 nodes n can be there in T.
for fig1, d=1; 2(1+1)-1=2(2)-1=4-1=3
for fig2, d=2; 2(2+1)-1=2(3)-1=8-1=7
for fig3 d=3; 2(3+1)-1=2(4)-1=16-1=15
The height(h) and depth(d) of a tree (the length of the longest path from the root to leaf node) are numerically equal.
Here's an answer detailing how to compute depth and height.
The number of nodes in a complete tree is...
n = 2^(h+1) - 1.
What you're describing sounds like "a perfect binary tree".
"a binary tree is a tree data structure in which each node has at most two children"
http://en.wikipedia.org/wiki/Binary_tree
A perfect tree is "A binary tree with all leaf nodes at the same depth."
http://xlinux.nist.gov/dads//HTML/perfectBinaryTree.html
height to maximum number of nodes in perfect binary tree
=2^(height+1)-1
number of nodes to minimum height
=CEILING(LOG(nodes+1,2)-1,1)
Definitions associated with Binary trees can be found at the Wikipedia wiki cited earlier.
This can also be understood like this.
In case of a perfect binary tree
total number of leaf nodes are 2^H (H = Height of the tree)
and total number of internal nodes are 2^H - 1
Hence the total number of nodes will be 2^H + 2^H - 1 which is 2^(H+1) - 1 as mentioned by others.
Hope this will help.

Number of comparisons to find an element in a BST with 635 elements?

I am a freshman in Computer Science University, so please give me a understandable justification.
I have a binary tree that is equilibrated by height which has 635 nodes. What is the number of comparisons that will occur in the worst case scenario and why?
Here's one way to think about this. Every time you do a comparison in a binary search tree, one of the following happens:
You have walked off the tree. In this case, you're done.
The value you're looking for matches the node you're currently exploring. In this case, you're done.
The value you're looking for does not match the node you're exploring. In that case, you either descend to the left or descend to the right.
The key observation here is that after each step, you either terminate (yay!) or descend lower in the tree. At each point, you make one comparison. Since you can't descend forever, there are only so many comparisons that you can make - specifically, if the tree has height h, the maximum number of comparisons you can make is h + 1, which happens if you do one comparison per level.
In your question, you're given that you have a balanced binary search tree of 635 nodes. It's not 100% clear what "balanced" means in this context, since there are many different ways of determining whether a tree is balanced and they all lead to different tree heights. I'm going to assume that you are given a complete binary search tree, which is one in which all levels except the last are filled.
The reason this is important is that if you have a complete binary search tree of height h, it can have at most 2h + 1 - 1 nodes in it. If we try to solve for the height of the tree in terms of the number of nodes, we get this:
n = 2h+1 - 1
n + 1 = 2h+1
lg (n + 1) = h + 1
lg (n + 1) - 1 = h
Therefore, if you have the number of nodes n, you can determine the minimum height of a complete binary search tree holding n nodes. In your case, n = 635, so we get
lg (635 + 1) - 1 = h
lg (636) - 1 = h
9.312882955 - 1 = h
8.312882955 = h
Therefore, the tree has height 8.312882955. Of course, trees can't have fractional height, so we can take the ceiling to find that the height of the tree would be 9. Since the maximum number of comparisons made is h + 1, there are at most 10 comparisons made when doing a lookup.
Hope this helps!
Without any loss of generality you can say the maximum no. of comparison will be the height of the BST ... you dont have to visit every node in the node because each comparison takes you closer to the node...
Let's say it is a balanced BST (all nodes except last have 2 child nodes).
For instance,
Level 0 --> Height 1 --> Number of nodes = 1
Level 1 --> Height 2 --> Number of nodes = 2
Level 2 --> Height 3 --> Number of nodes = 3
Level 3 --> Height 4 --> Number of nodes = 8
......
......
Level n --> Height n+1 --> Number of nodes = 2^n or 2^(h-1)
Using the above logic, you can derive the search time for best, worst or average case.

Proof that the height of a balanced binary-search tree is log(n)

The binary-search algorithm takes log(n) time, because of the fact that the height of the tree (with n nodes) would be log(n).
How would you prove this?
Now here I am not giving mathematical proof. Try to understand the problem using log to the base 2. Log2 is the normal meaning of log in computer science.
First, understand it is binary logarithm (log2n) (logarithm to the base 2).
For example,
the binary logarithm of 1 is 0
the binary logarithm of 2 is 1
the binary logarithm of 3 is 1
the binary logarithm of 4 is 2
the binary logarithm of 5, 6, 7 is 2
the binary logarithm of 8-15 is 3
the binary logarithm of 16-31 is 4 and so on.
For each height the number of nodes in a fully balanced tree are
Height Nodes Log calculation
0 1 log21 = 0
1 3 log23 = 1
2 7 log27 = 2
3 15 log215 = 3
Consider a balanced tree with between 8 and 15 nodes (any number, let's say 10). It is always going to be height 3 because log2 of any number from 8 to 15 is 3.
In a balanced binary tree the size of the problem to be solved is halved with every iteration. Thus roughly log2n iterations are needed to obtain a problem of size 1.
I hope this helps.
Let's assume at first that the tree is complete - it has 2^N leaf nodes. We try to prove that you need N recursive steps for a binary search.
With each recursion step you cut the number of candidate leaf nodes exactly by half (because our tree is complete). This means that after N halving operations there is exactly one candidate node left.
As each recursion step in our binary search algorithm corresponds to exactly one height level the height is exactly N.
Generalization to all balanced binary trees: If the tree has less nodes than 2^N we for sure don't need more halvings. We might need less or the same amount but never more.
Assuming that we have a complete tree to work with, we can say that at depth k, there are 2k nodes. You can prove this using simple induction, based on the intuition that adding an extra level to the tree will increase the number of nodes in the entire tree by the number of nodes that were in the previous level times two.
The height k of the tree is log(N), where N is the number of nodes. This can be stated as
log2(N) = k,
and it is equivalent to
N = 2k
To understand this, here's an example:
16 = 24 => log2(16) = 4
The height of the tree and the number of nodes are related exponentially. Taking the log of the number of nodes just allows you to work backwards to find the height.
Just look up the rigorous proof in Knuth, Volume 3 - Searching and Sorting Algorithms ... He does it far more rigorously than anyone else I can think of.
http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming
You can find it in any good Computer Science library and on the bookshelves of many (very) old geeks.
Why is the height of a balanced binary tree equal to ceil(log2N) for N nodes?
w = width of base (maximum number of leaves)
h = height of tree (maximum number of edges from root to leaf)
Divide w by 2 (h times) to get to 1, which counts the single root node at top.
N = w + w/2 + ... + 1
N = 2h + ... + 21 + 20
= (1-2h+1) / (1-2) = 2h+1-1
log2(N+1) = h+1
Check: if N=1, h=0. If h=1, N=3.
This formula is for if the bottom level is full. N will not always be so great, but would still have the same height, h. So we must take the log's ceiling.

Resources