Searching in a balanced binary search tree - data-structures

I was reading about balanced binary search tree. I found this statement about searching in such tree:
It is not true that when you are looking for something in a balanced binary search tree with n elements, it can in worst case needed n/2 comparisons.
Why it is not true?
Isn't it that we look either to the right side or the left side of the tree so the comparisons should be n/2?

The search worst case of Balanced Binary Search tree is governed by its height. It is O(height) where the height is log2(n) since it is balanced.
In worst case, the node that we looking for resides in a leaf or doesn't exist at all, and hence we need to traverse the tree from the root to its leafs which is O(lgn) and not O(n/2)

Consider the following balanced binary tree for n=7 (this is in fact a complete binary search tree, but lets leave that out of this discussion, as a complete binary search tree is also a balanced binary search tree).
5 depth 1 (root)
/----+----\
2 6 depth 2
/--+--\ /--+--\
1 3 4 7 depth 3
For searching of any number in this tree, the worst case scenario is that we reach the maximum depth of the tree (e.g., 3 in this case), until we terminate the search. At depth 3, we have performed 3 comparisons, hence, at arbitrary depth l, we would have performed l comparisons.
Now, for a complete binary search tree as the one above, of arbitrary size, we can hold 1 + 2^(maxDepth-1) different numbers. Now, let's say we have a complete binary search tree with exactly n (distinct) numbers. Then the following holds
n = 1 + 2^(maxDepth-1) (+)
Hence
(+) <=> 2^(maxDepth-1) = n - 1
<=> log2(2^(maxDepth - 1)) = log2(n - 1)
<=> maxDepth - 1 = log2(n - 1)
=> maxDepth = log2(n - 1) + 1
Recall from above that maxDepth told us the worst case number of comparisons for us to find a number (or it's non-existance) in our complete binary tree. Hence
worst case scenario, n nodes : log2(n-1) + 1
For studying asymptotic or limiting behaviour of this search, n can be considered sufficiently large, and hence log2(n) ~= log2(n-1) holds, and subsequently, we can say that a quite good (tight) upper bound for the algorithm is O(log2(n)). Hence
The time complexity for searching in a complete binary tree,
for n nodes, is O(log2(n))
For a non-complete binary search tree, an analogous reasoning as the one above leads to the same time complexity. Note that for a non-balanced search tree the worst case scenario for n nodes is n comparisons.
Answer: From above, it's clear that O(n/2) is not a proper bound for the time complexity of a binary search tree of size n; whereas however O(log2(n)) is. (Note that the prior might be a correct bound for sufficiently large n, but not a very good/tight one!)

Imagine the tree with 10 nodes: 1,2,3,4,5..10.
If you are looking for 5, how many comparisons would it take? How about if you look for 10?
It's actually never N/2.

The worst case scenario is that the element you are searching for is a leaf (or isn't contained in a tree), and the number of comparisons then is equal to tree height which is log2(n).

The best balanced binary tree is the AVL tree. I say "the best" conditioned to the fact that their modifying operations are O(log(n)). If the tree is perfectly balanced, then its height is still less (but it is not known a way for modifying it in O(log(n)).
It could be shown that the maximum height of an AVL tree is less than
1.4404 log(n+2) - 0.3277
Consequently the worst case for a search in an AVL tree is an unsuccessful search whose path from the root ends in the deepest node. But by the previous result, this path cannot be longer than 1.4404 log(n+2) - 0.3277.
And since 1.4404 log(n+2) - 0.3277 < n/2, the statement is false (assuming a n enough large)

lets first see the BST(binary search tree) properties which tell that..
-- root must be > then left_child
-- root must be < right child
10
/ \
8 12
/ \ / \
5 9 11 15
/ \ / \
1 7 14 25
height of given tree is 3(number of edges in longest path 10-14).
suppose you query to search 14 in given balanced BST
node-- 10 14 > 10
/ \ go to right sub tree because all nodes
8 12 in right sub tree are > 10
/ \ / \
5 9 11 15 n = 11 total node
/ \ / \
1 7 14 25
node-- 12 14 > 12
/ \ again go to right sub tree..
11 15
/ \ n = 5
14 25
node-- 15 14 > 15
/ \ this time node value is > required value so
14 25 goto left sub tree
n = 3
'node -- 14 14 == 14 value find
n = 1'
from above example we can see that at every comparison size of problem(number of nodes) halve we can also say that at every comparison we switch to next level thus height of tree is increased by 1 .
as max height of balanced BST is log(N) in worst case we need to go to leaf of tree hence we take log(N) step to do so..
hence O of BST search is log(N).

Related

Building an AVL Tree out of Binary Search Tree

I need to suggest an algorithm that takes BST (Binary Search Tree), T1 that has 2^(n + 1) - 1 keys, and build an AVL tree with same keys. The algorithm should be effective in terms of worst and average time complexity (as function of n).
I'm not sure how should I approach this. It is clear that the minimal size of a BST that has 2^(n + 1) - 1 keys is n (and that will be the case if it is full / balanced), but how does it help me?
There is the straight forward method that is to iterate over the tree , each time adding the root of T1 to the AVL tree and then removing it from T1:
Since T1 may not be balanced the delete may cost O(n) in worst case
Insert to the AVL will cost O(log n)
There are 2^(n + 1) - 1
So in total that will cost O(n*logn*2^n) and that is ridiculously expensive.
But why should I remove from T1? I'm paying a lot there and for no good reason.
So I figured why not using tree traversal over T1 , and for each node I'm visiting , add it to the AVL tree:
There are 2^(n + 1) - 1 nodes so traversal will cost O(2^n) (visiting each node once)
Adding the current node each time to the AVL will cost O(logn)
So in total that will cost O(logn * 2^n).
and that is the best time complexity I could think of, the question is, can it be done in a faster way? like in O(2^n) ?
Some way that will make the insert to the AVL tree cost only O(1)?
I hope I was clear and that my question belongs here.
Thank you very much,
Noam
There is an algorithm that balances a BST and runs in linear time called Day Stout Warren Algorithm
Basically all it does is convert the BST into a sorted array or linked list by doing an in-order traversal (O(n)). Then, it recursively takes the middle element of the array, makes it the root, and makes its children the middle elements of the left and right subarrays respectively (O(n)). Here's an example,
UNBALANCED BST
5
/ \
3 8
/ \
7 9
/ \
6 10
SORTED ARRAY
|3|5|6|7|8|9|10|
Now here are the recursive calls and resulting tree,
DSW(initial array)
7
7.left = DSW(left array) //|3|5|6|
7.right = DSW(right array) //|8|9|10|
7
/ \
5 9
5.left = DSW(|3|)
5.right = DSW(|6|)
9.left = DSW(|8|)
9.right = DSW(|10|)
7
/ \
5 9
/ \ / \
3 6 8 10

Number of comparisons to find an element in a BST with 635 elements?

I am a freshman in Computer Science University, so please give me a understandable justification.
I have a binary tree that is equilibrated by height which has 635 nodes. What is the number of comparisons that will occur in the worst case scenario and why?
Here's one way to think about this. Every time you do a comparison in a binary search tree, one of the following happens:
You have walked off the tree. In this case, you're done.
The value you're looking for matches the node you're currently exploring. In this case, you're done.
The value you're looking for does not match the node you're exploring. In that case, you either descend to the left or descend to the right.
The key observation here is that after each step, you either terminate (yay!) or descend lower in the tree. At each point, you make one comparison. Since you can't descend forever, there are only so many comparisons that you can make - specifically, if the tree has height h, the maximum number of comparisons you can make is h + 1, which happens if you do one comparison per level.
In your question, you're given that you have a balanced binary search tree of 635 nodes. It's not 100% clear what "balanced" means in this context, since there are many different ways of determining whether a tree is balanced and they all lead to different tree heights. I'm going to assume that you are given a complete binary search tree, which is one in which all levels except the last are filled.
The reason this is important is that if you have a complete binary search tree of height h, it can have at most 2h + 1 - 1 nodes in it. If we try to solve for the height of the tree in terms of the number of nodes, we get this:
n = 2h+1 - 1
n + 1 = 2h+1
lg (n + 1) = h + 1
lg (n + 1) - 1 = h
Therefore, if you have the number of nodes n, you can determine the minimum height of a complete binary search tree holding n nodes. In your case, n = 635, so we get
lg (635 + 1) - 1 = h
lg (636) - 1 = h
9.312882955 - 1 = h
8.312882955 = h
Therefore, the tree has height 8.312882955. Of course, trees can't have fractional height, so we can take the ceiling to find that the height of the tree would be 9. Since the maximum number of comparisons made is h + 1, there are at most 10 comparisons made when doing a lookup.
Hope this helps!
Without any loss of generality you can say the maximum no. of comparison will be the height of the BST ... you dont have to visit every node in the node because each comparison takes you closer to the node...
Let's say it is a balanced BST (all nodes except last have 2 child nodes).
For instance,
Level 0 --> Height 1 --> Number of nodes = 1
Level 1 --> Height 2 --> Number of nodes = 2
Level 2 --> Height 3 --> Number of nodes = 3
Level 3 --> Height 4 --> Number of nodes = 8
......
......
Level n --> Height n+1 --> Number of nodes = 2^n or 2^(h-1)
Using the above logic, you can derive the search time for best, worst or average case.

Proof that the height of a balanced binary-search tree is log(n)

The binary-search algorithm takes log(n) time, because of the fact that the height of the tree (with n nodes) would be log(n).
How would you prove this?
Now here I am not giving mathematical proof. Try to understand the problem using log to the base 2. Log2 is the normal meaning of log in computer science.
First, understand it is binary logarithm (log2n) (logarithm to the base 2).
For example,
the binary logarithm of 1 is 0
the binary logarithm of 2 is 1
the binary logarithm of 3 is 1
the binary logarithm of 4 is 2
the binary logarithm of 5, 6, 7 is 2
the binary logarithm of 8-15 is 3
the binary logarithm of 16-31 is 4 and so on.
For each height the number of nodes in a fully balanced tree are
Height Nodes Log calculation
0 1 log21 = 0
1 3 log23 = 1
2 7 log27 = 2
3 15 log215 = 3
Consider a balanced tree with between 8 and 15 nodes (any number, let's say 10). It is always going to be height 3 because log2 of any number from 8 to 15 is 3.
In a balanced binary tree the size of the problem to be solved is halved with every iteration. Thus roughly log2n iterations are needed to obtain a problem of size 1.
I hope this helps.
Let's assume at first that the tree is complete - it has 2^N leaf nodes. We try to prove that you need N recursive steps for a binary search.
With each recursion step you cut the number of candidate leaf nodes exactly by half (because our tree is complete). This means that after N halving operations there is exactly one candidate node left.
As each recursion step in our binary search algorithm corresponds to exactly one height level the height is exactly N.
Generalization to all balanced binary trees: If the tree has less nodes than 2^N we for sure don't need more halvings. We might need less or the same amount but never more.
Assuming that we have a complete tree to work with, we can say that at depth k, there are 2k nodes. You can prove this using simple induction, based on the intuition that adding an extra level to the tree will increase the number of nodes in the entire tree by the number of nodes that were in the previous level times two.
The height k of the tree is log(N), where N is the number of nodes. This can be stated as
log2(N) = k,
and it is equivalent to
N = 2k
To understand this, here's an example:
16 = 24 => log2(16) = 4
The height of the tree and the number of nodes are related exponentially. Taking the log of the number of nodes just allows you to work backwards to find the height.
Just look up the rigorous proof in Knuth, Volume 3 - Searching and Sorting Algorithms ... He does it far more rigorously than anyone else I can think of.
http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming
You can find it in any good Computer Science library and on the bookshelves of many (very) old geeks.
Why is the height of a balanced binary tree equal to ceil(log2N) for N nodes?
w = width of base (maximum number of leaves)
h = height of tree (maximum number of edges from root to leaf)
Divide w by 2 (h times) to get to 1, which counts the single root node at top.
N = w + w/2 + ... + 1
N = 2h + ... + 21 + 20
= (1-2h+1) / (1-2) = 2h+1-1
log2(N+1) = h+1
Check: if N=1, h=0. If h=1, N=3.
This formula is for if the bottom level is full. N will not always be so great, but would still have the same height, h. So we must take the log's ceiling.

Time complexity of next/previous functions on a BST

I'm interested in the worst-case efficiency of stepping forwards and backwards through binary search trees.
Unbalanced tree:
5
/
1
\
2
\
3
\
4
It looks like the worst case would be 4->5, which takes 4 operations.
Balanced tree:
2
/ \
1 4
/ \
3 5
Worst case is 2->3, which takes 2 operations.
Am I right in thinking that the worst case for any BST is O(height-1), which is O(log n) for balanced trees, and O(n-1) for unbalanced trees?
Am I right in thinking that the worst case for any BST is O(height-1), which is O(log n) for balanced trees, and O(n-1) for unbalanced trees?
Yes, you will only ever need to go up or down when travelling from k to k+1, never both (because the invariant is left child < parent < right child).
Although O(height-1) can be written O(height) (and similarly for O(n)).
If you are considering just traversing the tree in order, the complexity does not change with regards to balance. The algorithm is still
walk( Node n)
walk( n.left )
visit( n )
walk( n.right )
1 op per step.
It's when you start to apply lookups, inserts and deletes the balance comes into play.
And for these operations to be in O(log N ) a balanced tree is required.
If you are trying to find the next element in the sequence defined by the tree, you may be required to travel the entire height of the tree, and of course in a balanced tree this is O ( log N ), and in an unbalanced tree this is O( N)

Total number of nodes in a tree data structure?

I have a tree data structure that is L levels deep each node has about N nodes. I want to work-out the total number of nodes in the tree. To do this (I think) I need to know what percentage of the nodes that will have children.
What is the correct term for this ratio of leaf nodes to non-leaf nodes in N?
What is the formula for working out the total number nodes in the three?
Update Someone mention Branching factor in one of the answer but it then disappeared. I think this was the term I was looking for. So shouldn't a formula take the branching factor into account?
Update I should have said an estimate about a hypothetical datastructure, not the exact figure!
Ok, each node has about N subnodes and the tree is L levels deep.
With 1 level, the tree has 1 node.
With 2 levels, the tree has 1 + N nodes.
With 3 levels, the tree has 1 + N + N^2 nodes.
With L levels, the tree has 1 + N + N^2 + ... + N^(L-1) nodes.
The total number of nodes is (N^L-1) / (N-1).
Ok, just a small example why, it is exponential:
[NODE]
|
/|\
/ | \
/ | \
/ | \
[NODE] [NODE] [NODE]
|
/|\
/ | \
Just to correct a typo in the first answer: the total number of nodes for a tree of depth L is (N^(L+1)-1) / (N-1)... (that is, to the power L+1 rather than just L).
This can be shown as follows. First, take our theorem:
1 + N^1 + N^2 + ... + N^L = (N^(L+1)-1)/(N-1)
Multiply both sides by (N-1):
(N-1)(1 + N^1 + N^2 + ... + N^L) = N^(L+1)-1.
Expand the left side:
N^1 + N^2 + N^3 + ... + N^(L+1) - 1 - N^1 - N^2 - ... - N^L.
All terms N^1 to N^L are cancelled out, which leaves N^(L+1) - 1. This is our right hand side, so the initial equality is true.
If your tree is approximately full, that is every level has its full complement of children except for the last two, then you have between N^(L-2) and N^(L-1) leaf nodes and between N^(L-1) and N^L nodes total.
If your tree is not full, then knowing the number of leaf nodes doesn't help as a totally unbalanced tree will have one leaf node but arbitrarily many parents.
I wonder how precise your statement 'each node has about N nodes' is - if you know the average branching factor, perhaps you can compute the expected size of the tree.
If you are able to find the ratio of leaves to internal nodes, and you know the average number of children, you can approximate this as (n*ratio)^N = n. This won't give you your answer, but I wonder if someone with better maths than me can figure out a way to interpose L into this equation and give you something soluble.
Still, if you want to know precisely, you must iterate over the structure of the tree and count nodes as you go.
The formula for calculating the amount of nodes in depth L is: (Given that there are N root nodes)
NL
To calculate the number of all nodes one needs to do this for every layer:
for depth in (1..L)
nodeCount += N ** depth
If there's only 1 root node, subtract 1 from L and add 1 to the total nodes count.
Be aware that if in one node the amount of leaves is different from the average case this can have a big impact on your number. The further up in the tree the more impact.
* * * N ** 1
*** *** *** N ** 2
*** *** *** *** *** *** *** *** *** N ** 3
This is community wiki, so feel free to alter my appalling algebra.
Knuth's estimator [1],[2] is a point estimate that targets the number of nodes in an arbitrary finite tree without needing to go through all of the nodes and even if the tree is not balanced. Knuth's estimator is an example of an unbiased estimator; the expected value of Knuth's estimator will be the number of nodes in the tree. With that being said, Knuth's estimator may have a large variance if the tree in question is unbalanced, but in your case, since each node will have around N children, I do not think the variance of Knuth's estimator should be too large. This estimator is especially helpful when one is trying to measure the amount of time it will take to perform a brute force search.
For the following functions, we shall assume all trees are represented as lists of lists.
For example, [] denotes the tree with the single node, and [[],[[],[]]] will denote a tree with 5 nodes and 3 leaves (the nodes in the tree are in a one-to-one correspondence with the left brackets). The following functions are written in the language GAP.
The function simpleestimate gives an output an estimate for the number of nodes in the tree tree. The idea behind simpleestimate is that we randomly choose a path x_0,x_1,...,x_n from the root x_0 of the tree to a leaf x_n. Suppose that x_i has a_i successors. Then simpleestimate will return 1+a_1+a_1*a_2+...+a_1*a_2*…*a_n.
point:=tree; prod:=1; count:=1; list:=[];
while Length(point)>0 do prod:=prod*Length(point); count:=count+prod; point:=Random(point); od;
return count; end;
The function estimate will simply give the arithmetical mean of the estimates given by applying the function simpleestimate(tree) samplesize many times.
estimate:=function(samplesize,tree) local count,i;
count:=0;
for i in [1..samplesize] do count:=count+simpleestimate(tree); od;
return Float(count/samplesize); end;
Example: simpleestimate([[[],[[],[]]],[[[],[]],[]]]); returns 15 while
estimate(10000,[[[],[[],[]]],[[[],[]],[]]]); returns 10.9608 (and the tree actually does have 11 nodes).
Estimating Search Tree Size.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.5569&rep=rep1&type=pdf
Estimating the Efficiency of Backtrack Programs. Donald E. Knuth
http://www.ams.org/journals/mcom/1975-29-129/S0025-5718-1975-0373371-6/S0025-5718-1975-0373371-6.pdf
If you know nothing else but the depth of the tree then your only option for working out the total size is to go through and count them.

Resources