Big O Complexity in Binary Search Tree(BST) - data-structures

i've been reviewing all the stuff i've learned, and found out that this website, and it is saying the worst case of searching in Binary Tree has O(n) complexity. So far i've known, in Binary search tree is a sorted tree that we can search with binary search which has O(log n)-log base 2 probably.
Could anyone explain?

In the absolute worst case, a binary tree with N elements would be like a linked list.
Hence, there would be N levels, and a search would take N traversals.
~ Root ~
____
| 42 |
|____|
/ \
____/ \
| 13 | X
|____|
/ \
____/ \
| 11 | X
|____|
/ \
/ \
... X
That's why it's O(N) in the worst case.
And this is why we need to balance the trees to achieve O(log N) search.

O(log n) is valid only if btree is balanced.
In case of your insertions are all on the same side of the tree, to find something you must traverse all the items, then O(n)

Related

Building an AVL Tree out of Binary Search Tree

I need to suggest an algorithm that takes BST (Binary Search Tree), T1 that has 2^(n + 1) - 1 keys, and build an AVL tree with same keys. The algorithm should be effective in terms of worst and average time complexity (as function of n).
I'm not sure how should I approach this. It is clear that the minimal size of a BST that has 2^(n + 1) - 1 keys is n (and that will be the case if it is full / balanced), but how does it help me?
There is the straight forward method that is to iterate over the tree , each time adding the root of T1 to the AVL tree and then removing it from T1:
Since T1 may not be balanced the delete may cost O(n) in worst case
Insert to the AVL will cost O(log n)
There are 2^(n + 1) - 1
So in total that will cost O(n*logn*2^n) and that is ridiculously expensive.
But why should I remove from T1? I'm paying a lot there and for no good reason.
So I figured why not using tree traversal over T1 , and for each node I'm visiting , add it to the AVL tree:
There are 2^(n + 1) - 1 nodes so traversal will cost O(2^n) (visiting each node once)
Adding the current node each time to the AVL will cost O(logn)
So in total that will cost O(logn * 2^n).
and that is the best time complexity I could think of, the question is, can it be done in a faster way? like in O(2^n) ?
Some way that will make the insert to the AVL tree cost only O(1)?
I hope I was clear and that my question belongs here.
Thank you very much,
Noam
There is an algorithm that balances a BST and runs in linear time called Day Stout Warren Algorithm
Basically all it does is convert the BST into a sorted array or linked list by doing an in-order traversal (O(n)). Then, it recursively takes the middle element of the array, makes it the root, and makes its children the middle elements of the left and right subarrays respectively (O(n)). Here's an example,
UNBALANCED BST
5
/ \
3 8
/ \
7 9
/ \
6 10
SORTED ARRAY
|3|5|6|7|8|9|10|
Now here are the recursive calls and resulting tree,
DSW(initial array)
7
7.left = DSW(left array) //|3|5|6|
7.right = DSW(right array) //|8|9|10|
7
/ \
5 9
5.left = DSW(|3|)
5.right = DSW(|6|)
9.left = DSW(|8|)
9.right = DSW(|10|)
7
/ \
5 9
/ \ / \
3 6 8 10

Time complexity of checking tweaked identical binary trees

I came cross a problem on an online judge as follows:
Check two given binary trees are identical or not. Assuming any number of tweaks are allowed. A tweak is defined as a swap of the children of one node in the tree.
I came up with the following naive algorithm which was accepted.
/**
* #aaram a, b, the root of binary trees.
* #return true if they are tweaked identical, or false.
*/
bool isTweakedIdentical(TreeNode* a, TreeNode* b) {
if (!a || !b)
return !a && !b;
return a->val == b->val &&
((isTweakedIdentical(a->left, b->left) && isTweakedIdentical(a->right, b->right)) ||
(isTweakedIdentical(a->left, b->right) && isTweakedIdentical(a->right, b->left)));
}
However, I can't figure out the time complexity of my solution. Can anyone teach me how to analyze it? I'm not sure when worst case happens.
For starters, let's imagine that you're working with a perfect binary tree with n nodes in it. At each level, you fire off at most four recursive calls, each to a binary tree that has (roughly) n / 2 nodes in it. You're doing O(1) work per call, so we get the recurrence relation
T(n) ≤ 4T(n / 2) + O(1)
Using the Master Theorem, we're in a case where a = 4, b = 2, and d = 0, and since logb a = log2 4 = 2 and d = 0, we see that the runtime is O(n2) in this case.
The problem with this upper bound is that it's not tight. In particular, looking at the structure of the recursive calls that get made, we only get four recursive calls if no short-circuiting exists. That would require the first call in each of the two cases to return true and the second branch to return false. I've spent the last half hour trying to craft a worst-case tree and prove that it's a worst-case tree, but I'm having a heck of a time doing so. I suspect that the answer is somewhere between O(nlog2 3), which is what you get at three calls per node, and O(n2), but I'm not certain about it. Sorry!
This depends on how "binary" you expect the tree to be.
If the trees constructed tend to be more linearly chained, then your algorithm will tend to perform linearly, O(N) (with N non-tail-call recursions which could potentially lead to stack overflow with large trees).
If your trees constructed tend to be more binary, as in both children have roughly equal tree sizes at each node, then your algorithm will perform logarithmically quadratically, O(log2(N)) O(N^2).
Your worst case is when a is linear down the right side, and b is linear down the left side. You could improve your average case by pseudorandomly choosing which paths to check first, but at the cost of a call to rand() for each call to your function. I don't know that it would be worth the cost, but it's just a suggestion.
EDIT
Consider this example for binary trees:
_1_
/ \
_2 3_
/ | | \
4 5 6 7
/| |\ /| |\
8 9 A B C D E F
_1_
/ \
_3 2_
/ | | \
7 6 5 4
/| |\ /| |\
F E D C B A 9 8
These are two examples of heaps. And they match, except they're mirrored. This would take 43 calls to isTweakedIdentical() in order to return true. But this seems to be an O(N) operation, considering it averages 3 calls per node (except for the root). I'll finish editing this later.

Searching in a balanced binary search tree

I was reading about balanced binary search tree. I found this statement about searching in such tree:
It is not true that when you are looking for something in a balanced binary search tree with n elements, it can in worst case needed n/2 comparisons.
Why it is not true?
Isn't it that we look either to the right side or the left side of the tree so the comparisons should be n/2?
The search worst case of Balanced Binary Search tree is governed by its height. It is O(height) where the height is log2(n) since it is balanced.
In worst case, the node that we looking for resides in a leaf or doesn't exist at all, and hence we need to traverse the tree from the root to its leafs which is O(lgn) and not O(n/2)
Consider the following balanced binary tree for n=7 (this is in fact a complete binary search tree, but lets leave that out of this discussion, as a complete binary search tree is also a balanced binary search tree).
5 depth 1 (root)
/----+----\
2 6 depth 2
/--+--\ /--+--\
1 3 4 7 depth 3
For searching of any number in this tree, the worst case scenario is that we reach the maximum depth of the tree (e.g., 3 in this case), until we terminate the search. At depth 3, we have performed 3 comparisons, hence, at arbitrary depth l, we would have performed l comparisons.
Now, for a complete binary search tree as the one above, of arbitrary size, we can hold 1 + 2^(maxDepth-1) different numbers. Now, let's say we have a complete binary search tree with exactly n (distinct) numbers. Then the following holds
n = 1 + 2^(maxDepth-1) (+)
Hence
(+) <=> 2^(maxDepth-1) = n - 1
<=> log2(2^(maxDepth - 1)) = log2(n - 1)
<=> maxDepth - 1 = log2(n - 1)
=> maxDepth = log2(n - 1) + 1
Recall from above that maxDepth told us the worst case number of comparisons for us to find a number (or it's non-existance) in our complete binary tree. Hence
worst case scenario, n nodes : log2(n-1) + 1
For studying asymptotic or limiting behaviour of this search, n can be considered sufficiently large, and hence log2(n) ~= log2(n-1) holds, and subsequently, we can say that a quite good (tight) upper bound for the algorithm is O(log2(n)). Hence
The time complexity for searching in a complete binary tree,
for n nodes, is O(log2(n))
For a non-complete binary search tree, an analogous reasoning as the one above leads to the same time complexity. Note that for a non-balanced search tree the worst case scenario for n nodes is n comparisons.
Answer: From above, it's clear that O(n/2) is not a proper bound for the time complexity of a binary search tree of size n; whereas however O(log2(n)) is. (Note that the prior might be a correct bound for sufficiently large n, but not a very good/tight one!)
Imagine the tree with 10 nodes: 1,2,3,4,5..10.
If you are looking for 5, how many comparisons would it take? How about if you look for 10?
It's actually never N/2.
The worst case scenario is that the element you are searching for is a leaf (or isn't contained in a tree), and the number of comparisons then is equal to tree height which is log2(n).
The best balanced binary tree is the AVL tree. I say "the best" conditioned to the fact that their modifying operations are O(log(n)). If the tree is perfectly balanced, then its height is still less (but it is not known a way for modifying it in O(log(n)).
It could be shown that the maximum height of an AVL tree is less than
1.4404 log(n+2) - 0.3277
Consequently the worst case for a search in an AVL tree is an unsuccessful search whose path from the root ends in the deepest node. But by the previous result, this path cannot be longer than 1.4404 log(n+2) - 0.3277.
And since 1.4404 log(n+2) - 0.3277 < n/2, the statement is false (assuming a n enough large)
lets first see the BST(binary search tree) properties which tell that..
-- root must be > then left_child
-- root must be < right child
10
/ \
8 12
/ \ / \
5 9 11 15
/ \ / \
1 7 14 25
height of given tree is 3(number of edges in longest path 10-14).
suppose you query to search 14 in given balanced BST
node-- 10 14 > 10
/ \ go to right sub tree because all nodes
8 12 in right sub tree are > 10
/ \ / \
5 9 11 15 n = 11 total node
/ \ / \
1 7 14 25
node-- 12 14 > 12
/ \ again go to right sub tree..
11 15
/ \ n = 5
14 25
node-- 15 14 > 15
/ \ this time node value is > required value so
14 25 goto left sub tree
n = 3
'node -- 14 14 == 14 value find
n = 1'
from above example we can see that at every comparison size of problem(number of nodes) halve we can also say that at every comparison we switch to next level thus height of tree is increased by 1 .
as max height of balanced BST is log(N) in worst case we need to go to leaf of tree hence we take log(N) step to do so..
hence O of BST search is log(N).

Time complexity of a recursive algorithm which split into two per run mostly

Dynamic Programming | Set 33 (Find if a string is interleaved of two other strings)
http://www.geeksforgeeks.org/check-whether-a-given-string-is-an-interleaving-of-two-other-given-strings-set-2/
I found a question in this website and it said "The worst case time complexity of recursive solution is O(2^n).". Therefore, I tried to draw a tree diagram about the worst case this question. I assume that when the length and value of a and b are the same, it will lead to a worst case. It will split into 2 parts till the length of a/b is 0 (using substring).
aa,aa,aaaa
/ \
/ \
a,aa,aaa aa,a,aaa
/ \ / \
/ \ / \
-,aa,aa a,a,aa a,a,aa aa,-,aa
/ / | | \ \
/ / | | \ \
-,a,a -,a,a a,-,a -,a,a a,-,a a,-,a
In this case, it has 13 nodes which is really a worst case, but how to calculate it step by step? IF the length of c increases by 2, it will got 49 nodes. If it increases to a large number, it will be difficult to draw a tree diagram.
Can someone explain this in details, please?
The recurrence for the running time is
T(n) = 2T(n-1)
If you draw the recursion tree you'll see that you have a binary tree with height = n.
Since it is a binary tree, it will have 2^n leaves hence the worst case scenario is O(2^n).

Time complexity of next/previous functions on a BST

I'm interested in the worst-case efficiency of stepping forwards and backwards through binary search trees.
Unbalanced tree:
5
/
1
\
2
\
3
\
4
It looks like the worst case would be 4->5, which takes 4 operations.
Balanced tree:
2
/ \
1 4
/ \
3 5
Worst case is 2->3, which takes 2 operations.
Am I right in thinking that the worst case for any BST is O(height-1), which is O(log n) for balanced trees, and O(n-1) for unbalanced trees?
Am I right in thinking that the worst case for any BST is O(height-1), which is O(log n) for balanced trees, and O(n-1) for unbalanced trees?
Yes, you will only ever need to go up or down when travelling from k to k+1, never both (because the invariant is left child < parent < right child).
Although O(height-1) can be written O(height) (and similarly for O(n)).
If you are considering just traversing the tree in order, the complexity does not change with regards to balance. The algorithm is still
walk( Node n)
walk( n.left )
visit( n )
walk( n.right )
1 op per step.
It's when you start to apply lookups, inserts and deletes the balance comes into play.
And for these operations to be in O(log N ) a balanced tree is required.
If you are trying to find the next element in the sequence defined by the tree, you may be required to travel the entire height of the tree, and of course in a balanced tree this is O ( log N ), and in an unbalanced tree this is O( N)

Resources