Search operation in imbalanced binary search tree - algorithm

I could not understand and think of a scenario when a search operation in an imbalanced binary search tree could be more efficient than in a balanced binary search tree. If the binary search tree is highly skewed, then it might take the form of a linked list at which time, the run time complexity will be O(n). Is there any scenario where it can really happen? My professor insisted that there are some but I simply cannot think of any.

There are some cases where an imbalanced tree could be better. Take the following two trees as an example:
The idea behind a binary search tree is that if you're equally likely to search for any of the values in the set of nodes, then keeping the tree balanced minimizes the average number of comparisons you have to do.
For example, if I search for each of the values (1, 2, 3, 4, 5, 6), then I'd want to be searching the balanced tree on the left. Performing that sequence of searches on the balanced tree would result in (3 + 2 + 3 + 1 + 2 + 3) = 14 comparisons. Doing the same sequence of searches on the unbalanced tree would result in (1 + 2 + 3 + 4 + 5 + 6) = 21 comparisons.
But what if I knew that the values I need to search for weren't going to be evenly distributed, but that they skewed low? What if I wanted to search for the values (1, 2, 1, 2, 1, 3)? Which tree would give better performance then?
Performing those searches on the balanced tree would result in (3 + 2 + 3 + 2 + 3 + 3) = 16 comparisons. Not bad, but the same sequence of searches on the unbalanced tree would only take (1 + 2 + 1 + 2 + 1 + 3) = 10 comparisons.
This is a contrived example, but it shows that knowing your data and knowing the values that people are likely to search for most often can help you choose the right arrangement for that data to give better performance.

Related

Building an AVL Tree out of Binary Search Tree

I need to suggest an algorithm that takes BST (Binary Search Tree), T1 that has 2^(n + 1) - 1 keys, and build an AVL tree with same keys. The algorithm should be effective in terms of worst and average time complexity (as function of n).
I'm not sure how should I approach this. It is clear that the minimal size of a BST that has 2^(n + 1) - 1 keys is n (and that will be the case if it is full / balanced), but how does it help me?
There is the straight forward method that is to iterate over the tree , each time adding the root of T1 to the AVL tree and then removing it from T1:
Since T1 may not be balanced the delete may cost O(n) in worst case
Insert to the AVL will cost O(log n)
There are 2^(n + 1) - 1
So in total that will cost O(n*logn*2^n) and that is ridiculously expensive.
But why should I remove from T1? I'm paying a lot there and for no good reason.
So I figured why not using tree traversal over T1 , and for each node I'm visiting , add it to the AVL tree:
There are 2^(n + 1) - 1 nodes so traversal will cost O(2^n) (visiting each node once)
Adding the current node each time to the AVL will cost O(logn)
So in total that will cost O(logn * 2^n).
and that is the best time complexity I could think of, the question is, can it be done in a faster way? like in O(2^n) ?
Some way that will make the insert to the AVL tree cost only O(1)?
I hope I was clear and that my question belongs here.
Thank you very much,
Noam
There is an algorithm that balances a BST and runs in linear time called Day Stout Warren Algorithm
Basically all it does is convert the BST into a sorted array or linked list by doing an in-order traversal (O(n)). Then, it recursively takes the middle element of the array, makes it the root, and makes its children the middle elements of the left and right subarrays respectively (O(n)). Here's an example,
UNBALANCED BST
5
/ \
3 8
/ \
7 9
/ \
6 10
SORTED ARRAY
|3|5|6|7|8|9|10|
Now here are the recursive calls and resulting tree,
DSW(initial array)
7
7.left = DSW(left array) //|3|5|6|
7.right = DSW(right array) //|8|9|10|
7
/ \
5 9
5.left = DSW(|3|)
5.right = DSW(|6|)
9.left = DSW(|8|)
9.right = DSW(|10|)
7
/ \
5 9
/ \ / \
3 6 8 10

Searching in a balanced binary search tree

I was reading about balanced binary search tree. I found this statement about searching in such tree:
It is not true that when you are looking for something in a balanced binary search tree with n elements, it can in worst case needed n/2 comparisons.
Why it is not true?
Isn't it that we look either to the right side or the left side of the tree so the comparisons should be n/2?
The search worst case of Balanced Binary Search tree is governed by its height. It is O(height) where the height is log2(n) since it is balanced.
In worst case, the node that we looking for resides in a leaf or doesn't exist at all, and hence we need to traverse the tree from the root to its leafs which is O(lgn) and not O(n/2)
Consider the following balanced binary tree for n=7 (this is in fact a complete binary search tree, but lets leave that out of this discussion, as a complete binary search tree is also a balanced binary search tree).
5 depth 1 (root)
/----+----\
2 6 depth 2
/--+--\ /--+--\
1 3 4 7 depth 3
For searching of any number in this tree, the worst case scenario is that we reach the maximum depth of the tree (e.g., 3 in this case), until we terminate the search. At depth 3, we have performed 3 comparisons, hence, at arbitrary depth l, we would have performed l comparisons.
Now, for a complete binary search tree as the one above, of arbitrary size, we can hold 1 + 2^(maxDepth-1) different numbers. Now, let's say we have a complete binary search tree with exactly n (distinct) numbers. Then the following holds
n = 1 + 2^(maxDepth-1) (+)
Hence
(+) <=> 2^(maxDepth-1) = n - 1
<=> log2(2^(maxDepth - 1)) = log2(n - 1)
<=> maxDepth - 1 = log2(n - 1)
=> maxDepth = log2(n - 1) + 1
Recall from above that maxDepth told us the worst case number of comparisons for us to find a number (or it's non-existance) in our complete binary tree. Hence
worst case scenario, n nodes : log2(n-1) + 1
For studying asymptotic or limiting behaviour of this search, n can be considered sufficiently large, and hence log2(n) ~= log2(n-1) holds, and subsequently, we can say that a quite good (tight) upper bound for the algorithm is O(log2(n)). Hence
The time complexity for searching in a complete binary tree,
for n nodes, is O(log2(n))
For a non-complete binary search tree, an analogous reasoning as the one above leads to the same time complexity. Note that for a non-balanced search tree the worst case scenario for n nodes is n comparisons.
Answer: From above, it's clear that O(n/2) is not a proper bound for the time complexity of a binary search tree of size n; whereas however O(log2(n)) is. (Note that the prior might be a correct bound for sufficiently large n, but not a very good/tight one!)
Imagine the tree with 10 nodes: 1,2,3,4,5..10.
If you are looking for 5, how many comparisons would it take? How about if you look for 10?
It's actually never N/2.
The worst case scenario is that the element you are searching for is a leaf (or isn't contained in a tree), and the number of comparisons then is equal to tree height which is log2(n).
The best balanced binary tree is the AVL tree. I say "the best" conditioned to the fact that their modifying operations are O(log(n)). If the tree is perfectly balanced, then its height is still less (but it is not known a way for modifying it in O(log(n)).
It could be shown that the maximum height of an AVL tree is less than
1.4404 log(n+2) - 0.3277
Consequently the worst case for a search in an AVL tree is an unsuccessful search whose path from the root ends in the deepest node. But by the previous result, this path cannot be longer than 1.4404 log(n+2) - 0.3277.
And since 1.4404 log(n+2) - 0.3277 < n/2, the statement is false (assuming a n enough large)
lets first see the BST(binary search tree) properties which tell that..
-- root must be > then left_child
-- root must be < right child
10
/ \
8 12
/ \ / \
5 9 11 15
/ \ / \
1 7 14 25
height of given tree is 3(number of edges in longest path 10-14).
suppose you query to search 14 in given balanced BST
node-- 10 14 > 10
/ \ go to right sub tree because all nodes
8 12 in right sub tree are > 10
/ \ / \
5 9 11 15 n = 11 total node
/ \ / \
1 7 14 25
node-- 12 14 > 12
/ \ again go to right sub tree..
11 15
/ \ n = 5
14 25
node-- 15 14 > 15
/ \ this time node value is > required value so
14 25 goto left sub tree
n = 3
'node -- 14 14 == 14 value find
n = 1'
from above example we can see that at every comparison size of problem(number of nodes) halve we can also say that at every comparison we switch to next level thus height of tree is increased by 1 .
as max height of balanced BST is log(N) in worst case we need to go to leaf of tree hence we take log(N) step to do so..
hence O of BST search is log(N).

Number of comparisons to find an element in a BST with 635 elements?

I am a freshman in Computer Science University, so please give me a understandable justification.
I have a binary tree that is equilibrated by height which has 635 nodes. What is the number of comparisons that will occur in the worst case scenario and why?
Here's one way to think about this. Every time you do a comparison in a binary search tree, one of the following happens:
You have walked off the tree. In this case, you're done.
The value you're looking for matches the node you're currently exploring. In this case, you're done.
The value you're looking for does not match the node you're exploring. In that case, you either descend to the left or descend to the right.
The key observation here is that after each step, you either terminate (yay!) or descend lower in the tree. At each point, you make one comparison. Since you can't descend forever, there are only so many comparisons that you can make - specifically, if the tree has height h, the maximum number of comparisons you can make is h + 1, which happens if you do one comparison per level.
In your question, you're given that you have a balanced binary search tree of 635 nodes. It's not 100% clear what "balanced" means in this context, since there are many different ways of determining whether a tree is balanced and they all lead to different tree heights. I'm going to assume that you are given a complete binary search tree, which is one in which all levels except the last are filled.
The reason this is important is that if you have a complete binary search tree of height h, it can have at most 2h + 1 - 1 nodes in it. If we try to solve for the height of the tree in terms of the number of nodes, we get this:
n = 2h+1 - 1
n + 1 = 2h+1
lg (n + 1) = h + 1
lg (n + 1) - 1 = h
Therefore, if you have the number of nodes n, you can determine the minimum height of a complete binary search tree holding n nodes. In your case, n = 635, so we get
lg (635 + 1) - 1 = h
lg (636) - 1 = h
9.312882955 - 1 = h
8.312882955 = h
Therefore, the tree has height 8.312882955. Of course, trees can't have fractional height, so we can take the ceiling to find that the height of the tree would be 9. Since the maximum number of comparisons made is h + 1, there are at most 10 comparisons made when doing a lookup.
Hope this helps!
Without any loss of generality you can say the maximum no. of comparison will be the height of the BST ... you dont have to visit every node in the node because each comparison takes you closer to the node...
Let's say it is a balanced BST (all nodes except last have 2 child nodes).
For instance,
Level 0 --> Height 1 --> Number of nodes = 1
Level 1 --> Height 2 --> Number of nodes = 2
Level 2 --> Height 3 --> Number of nodes = 3
Level 3 --> Height 4 --> Number of nodes = 8
......
......
Level n --> Height n+1 --> Number of nodes = 2^n or 2^(h-1)
Using the above logic, you can derive the search time for best, worst or average case.

Proof that the height of a balanced binary-search tree is log(n)

The binary-search algorithm takes log(n) time, because of the fact that the height of the tree (with n nodes) would be log(n).
How would you prove this?
Now here I am not giving mathematical proof. Try to understand the problem using log to the base 2. Log2 is the normal meaning of log in computer science.
First, understand it is binary logarithm (log2n) (logarithm to the base 2).
For example,
the binary logarithm of 1 is 0
the binary logarithm of 2 is 1
the binary logarithm of 3 is 1
the binary logarithm of 4 is 2
the binary logarithm of 5, 6, 7 is 2
the binary logarithm of 8-15 is 3
the binary logarithm of 16-31 is 4 and so on.
For each height the number of nodes in a fully balanced tree are
Height Nodes Log calculation
0 1 log21 = 0
1 3 log23 = 1
2 7 log27 = 2
3 15 log215 = 3
Consider a balanced tree with between 8 and 15 nodes (any number, let's say 10). It is always going to be height 3 because log2 of any number from 8 to 15 is 3.
In a balanced binary tree the size of the problem to be solved is halved with every iteration. Thus roughly log2n iterations are needed to obtain a problem of size 1.
I hope this helps.
Let's assume at first that the tree is complete - it has 2^N leaf nodes. We try to prove that you need N recursive steps for a binary search.
With each recursion step you cut the number of candidate leaf nodes exactly by half (because our tree is complete). This means that after N halving operations there is exactly one candidate node left.
As each recursion step in our binary search algorithm corresponds to exactly one height level the height is exactly N.
Generalization to all balanced binary trees: If the tree has less nodes than 2^N we for sure don't need more halvings. We might need less or the same amount but never more.
Assuming that we have a complete tree to work with, we can say that at depth k, there are 2k nodes. You can prove this using simple induction, based on the intuition that adding an extra level to the tree will increase the number of nodes in the entire tree by the number of nodes that were in the previous level times two.
The height k of the tree is log(N), where N is the number of nodes. This can be stated as
log2(N) = k,
and it is equivalent to
N = 2k
To understand this, here's an example:
16 = 24 => log2(16) = 4
The height of the tree and the number of nodes are related exponentially. Taking the log of the number of nodes just allows you to work backwards to find the height.
Just look up the rigorous proof in Knuth, Volume 3 - Searching and Sorting Algorithms ... He does it far more rigorously than anyone else I can think of.
http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming
You can find it in any good Computer Science library and on the bookshelves of many (very) old geeks.
Why is the height of a balanced binary tree equal to ceil(log2N) for N nodes?
w = width of base (maximum number of leaves)
h = height of tree (maximum number of edges from root to leaf)
Divide w by 2 (h times) to get to 1, which counts the single root node at top.
N = w + w/2 + ... + 1
N = 2h + ... + 21 + 20
= (1-2h+1) / (1-2) = 2h+1-1
log2(N+1) = h+1
Check: if N=1, h=0. If h=1, N=3.
This formula is for if the bottom level is full. N will not always be so great, but would still have the same height, h. So we must take the log's ceiling.

Counting Treaps

Consider the problem of counting the number of structurally distinct binary search trees:
Given N, find the number of structurally distinct binary search trees containing the values 1 .. N
It's pretty easy to give an algorithm that solves this: fix every possible number in the root, then recursively solve the problem for the left and right subtrees:
countBST(numKeys)
if numKeys <= 1
return 1
else
result = 0
for i = 1 .. numKeys
leftBST = countBST(i - 1)
rightBST = countBST(numKeys - i)
result += leftBST * rightBST
return result
I've recently been familiarizing myself with treaps, and I posed the following problem to myself:
Given N, find the number of distinct treaps containing the values 1 .. N with priorities 1 .. N. Two treaps are distinct if they are structurally different relative to EITHER the key OR the priority (read on for clarification).
I've been trying to figure out a formula or an algorithm that can solve this for a while now, but I haven't been successful. This is what I noticed though:
The answers for n = 2 and n = 3 seem to be 2 and 6, based on me drawing trees on paper.
If we ignore the part that says treaps can also be different relative to the priority of the nodes, the problem seems to be identical to counting just binary search trees, since we'll be able to assign priorities to each BST such that it also respects the heap invariant. I haven't proven this though.
I think the hard part is accounting for the possibility to permute the priorities without changing the structure. For example, consider this treap, where the nodes are represented as (key, priority) pairs:
(3, 5)
/ \
(2, 3) (4, 4)
/ \
(1, 1) (5, 2)
We can permute the priorities of both the second and third levels while still maintaining the heap invariant, so we get more solutions even though no keys switch place. This probably gets even uglier for bigger trees. For example, this is a different treap from the one above:
(3, 5)
/ \
(2, 4) (4, 3) // swapped priorities
/ \
(1, 1) (5, 2)
I'd appreciate if anyone can share any ideas on how to approach this. It seemed like an interesting counting problem when I thought about it. Maybe someone else thought about it too and even solved it!
Interesting question! I believe the answer is N factorial!
Given a tree structure, there is exactly one way to fill in the binary search tree key values.
Thus all we need to do is count the different number of heaps.
Given a heap, consider an in-order traversal of the tree.
This corresponds to a permutation of the numbers 1 to N.
Now given any permutation of {1,2...,N}, you can construct a heap as follows:
Find the position of the largest element. The elements to its left form the left subtree and the elements to its right form the right subtree. These subtrees are formed recursively by finding the largest element and splitting there.
This gives rise to a heap, as we always choose the max element and the in-order traversal of that heap is the permutation we started with. Thus we have a way of going from a heap to a permutaion and back uniquely.
Thus the required number is N!.
As an example:
5
/ \
3 4 In-order traversal -> 35142
/ \
1 2
Now start with 35142. Largest is 5, so 3 is left subtree and 142 is right.
5
/ \
3 {142}
In 142, 4 is largest and 1 is left and 2 is right, so we get
5
/ \
3 4
/ \
1 2
The only way to fill in binary search keys for this is:
(2,5)
/ \
(1,3) (4,4)
/ \
(3,1) (5,2)
For a more formal proof:
If HN is the number of heaps on 1...N, then we have that
HN = Sum_{L=0 to N-1} HL * HN-1-L * (N-1 choose L)
(basically we pick the max and assign to root. Choose the size of left subtree, and choose that many elements and recurse on left and right).
Now,
H0 = 1
H1 = 1
H2 = 2
H3 = 6
If Hn = n! for 0 ≤ n ≤ k
Then HK+1 = Sum_{L=0 to K} L! * (K-L)! * (K!/L!*(K-L)!) = (K+1)!
def countBST(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => (1L to numKeys).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
}
You didn't actually define structural similarity for treaps -- you just gave examples. I'm going to assume the following definition: two trees are structurally different if and only if they have a different shape, or there exist nodes a (from tree A) and b (from tree B) such that a and b are in the same position, and the priorities of the children of a are in the opposite order of the priorities of the children of b. (It's obvious that if two treaps on the same values have the same shape, then the values in corresponding nodes are the same.)
In other words, if we visualize two trees by just giving the priorities on the nodes, the following two trees are structurally similar:
7 7
6 5 6 5
4 3 2 1 2 1 4 3 <--- does not change the relative order
of the children of any node
6's left child is still greater than 6's right child
5's left child is still greater than 5's right child
but the following two trees are structurally different:
7 7
5 6 6 5 <--- changes the relative order of the children
4 3 2 1 4 3 2 1 of node 7
Thus for the treap problem, each internal node has 2 orderings, and these two orderings do not otherwise affect the shape of the tree. So...
def countTreap(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => 2 * countBST(numKeys-1) + //2 situations when the tree has only 1 child
2 * (2L to (numKeys-1)).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
// and for each situation where the tree has 2 children, this node
// contributes 2 orderings the priorities of its children
// (which is independent of the shape of the tree below this level)
}

Resources