Time complexity of next/previous functions on a BST - data-structures

I'm interested in the worst-case efficiency of stepping forwards and backwards through binary search trees.
Unbalanced tree:
5
/
1
\
2
\
3
\
4
It looks like the worst case would be 4->5, which takes 4 operations.
Balanced tree:
2
/ \
1 4
/ \
3 5
Worst case is 2->3, which takes 2 operations.
Am I right in thinking that the worst case for any BST is O(height-1), which is O(log n) for balanced trees, and O(n-1) for unbalanced trees?

Am I right in thinking that the worst case for any BST is O(height-1), which is O(log n) for balanced trees, and O(n-1) for unbalanced trees?
Yes, you will only ever need to go up or down when travelling from k to k+1, never both (because the invariant is left child < parent < right child).
Although O(height-1) can be written O(height) (and similarly for O(n)).

If you are considering just traversing the tree in order, the complexity does not change with regards to balance. The algorithm is still
walk( Node n)
walk( n.left )
visit( n )
walk( n.right )
1 op per step.
It's when you start to apply lookups, inserts and deletes the balance comes into play.
And for these operations to be in O(log N ) a balanced tree is required.
If you are trying to find the next element in the sequence defined by the tree, you may be required to travel the entire height of the tree, and of course in a balanced tree this is O ( log N ), and in an unbalanced tree this is O( N)

Related

Big O Time Complexity for Recursive Pattern

I have question on runtime for recursive pattern.
Example 1
int f(int n) {
if(n <= 1) {
return 1;
}
return f(n - 1) + f(n - 1);
}
I can understand that the runtime for the above code is O(2^N) because if I pass 5, it calls 4 twice then each 4 calls 3 twice and follows till it reaches 1 i.e., something like O(branches^depth).
Example 2
Balanced Binary Tree
int sum(Node node) {
if(node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
}
I read that the runtime for the above code is O(2^log N) since it is balanced but I still see it as O(2^N). Can anyone explain it?
When the number of element gets halved each time, the runtime is log N. But how a binary tree works here?
Is it 2^log N just because it is balanced?
What if it is not balanced?
Edit:
We can solve O(2^log N) = O(N) but I am seeing it as O(2^N).
Thanks!
Binary tree will have complexity O(n) like any other tree here because you are ultimately traversing all of the elements of the tree. By halving we are not doing anything special other than calculating sum for the corresponding children separately.
The term comes this way because if it is balanced then 2^(log_2(n)) is the number of elements in the tree (leaf+non-leaf).(log2(n) levels)
Again if it is not balanced it doesn't matter. We are doing an operation for which every element needs to be consideredmaking the runtime to be O(n).
Where it could have mattered? If it was searching an element then it would have mattered (whether it is balanced or not).
I'll take a stab at this.
In a balanced binary tree, you should have half the child nodes to the left and half to the right of each parent node. The first layer of the tree is the root, with 1 element, then 2 elements in the next layer, then 4 elements in the next, then 8, and so on. So for a tree with L layers, you have 2^L - 1 nodes in the tree.
Reversing this, if you have N elements to insert into a tree, you end up with a balanced binary tree of depth L = log_2(N), so you only ever need to call your recursive algorithm for log_2(N) layers. At each layer, you are doubling the number of calls to your algorithm, so in your case you end up with 2^log_2(N) calls and O(2^log_2(N)) run time. Note that 2^log_2(N) = N, so it's the same either way, but we'll get to the advantage of a binary tree in a second.
If the tree is not balanced, you end up with depth greater than log_2(N), so you have more recursive calls. In the extreme case, when all of your children are to the left (or right) of their parent, you have N recursive calls, but each call returns immediately from one of its branches (no child on one side). Thus you would have O(N) run time, which is the same as before. Every node is visited once.
An advantage of a balanced tree is in cases like search. If the left-hand child is always less than the parent, and the right-hand child is always greater than, then you can search for an element n among N nodes in O(log_2(N)) time (not 2^log_2(N)!). If, however, your tree is severely imbalanced, this search becomes a linear traversal of all of the values and your search is O(N). If N is extremely large, or you perform this search a ton, this can be the difference between a tractable and an intractable algorithm.

Why is the space complexity of a recursive inorder traversal O(h) and not O(n)

So I know that the space complexity of a recursive in order traversal is O(h) and not O(n) as h = tree height and n = number of nodes in the tree.
Why is that? Lets say that this is the code for the traversal:
public void inorderPrint (TreeNode root) {
if (root == null) {
return;
}
inorderPrint(root.left);
System.out.println(root.data);
inorderPrint(root.right);
}
We are pushing n memory addresses to the call stack, therefore, the space complexity should be O(n).
What am I missing?
The addresses are removed from the stack when returning. This space is re-used when making a new call from a level closer to the root. So the maximum numbers of memory addresses on the stack at the same time is the tree height.
IMHO, you should treat the space complexity as O(n) instead. While dealing with space and time complexities in Big O notation we always try to give the complexity value as a function of number of input elements which is n in this case.
Also, if you consider the cases of right skewed binary tree or a left skewed binary then you would find this O(n) space complexity as a fitting one. Have a look at below cases of right skewed binary tree:
1
/ \
2
/ \
3
Number of nodes, n = 3
Number of stack frames required in recursive traversal = 3
1
/ \
2
/ \
3
/ \
4
/ \
Number of nodes, n = 4
Number of stack frames required in recursive traversal = 4
So you can conclude that O(n) is a fitting space complexity in such a worst case scenario (w.r.t. tree structure). In all other cases/types of trees number of stack frames required would always be less than n. And that is how we express complexities. The actual space taken by all possible cases should always be less than or equal to the depicted function.
Also, in all cases always it will be O(h) <= O(n). So thinking the space complexity as O(n) just gives us a uniform way of thinking in terms of input number of elements. Although, O(h) space complexity is equally correct due to the reasons mentioned by #StefanHaustein in his answer.
Space complexity of recursion is always the height / depth of recursion, so following this general rule, there can be at most h height in inorder traversal, where h is the length of deepest node from root. Space complexity of recursion = O(depth of recursion tree).

Building an AVL Tree out of Binary Search Tree

I need to suggest an algorithm that takes BST (Binary Search Tree), T1 that has 2^(n + 1) - 1 keys, and build an AVL tree with same keys. The algorithm should be effective in terms of worst and average time complexity (as function of n).
I'm not sure how should I approach this. It is clear that the minimal size of a BST that has 2^(n + 1) - 1 keys is n (and that will be the case if it is full / balanced), but how does it help me?
There is the straight forward method that is to iterate over the tree , each time adding the root of T1 to the AVL tree and then removing it from T1:
Since T1 may not be balanced the delete may cost O(n) in worst case
Insert to the AVL will cost O(log n)
There are 2^(n + 1) - 1
So in total that will cost O(n*logn*2^n) and that is ridiculously expensive.
But why should I remove from T1? I'm paying a lot there and for no good reason.
So I figured why not using tree traversal over T1 , and for each node I'm visiting , add it to the AVL tree:
There are 2^(n + 1) - 1 nodes so traversal will cost O(2^n) (visiting each node once)
Adding the current node each time to the AVL will cost O(logn)
So in total that will cost O(logn * 2^n).
and that is the best time complexity I could think of, the question is, can it be done in a faster way? like in O(2^n) ?
Some way that will make the insert to the AVL tree cost only O(1)?
I hope I was clear and that my question belongs here.
Thank you very much,
Noam
There is an algorithm that balances a BST and runs in linear time called Day Stout Warren Algorithm
Basically all it does is convert the BST into a sorted array or linked list by doing an in-order traversal (O(n)). Then, it recursively takes the middle element of the array, makes it the root, and makes its children the middle elements of the left and right subarrays respectively (O(n)). Here's an example,
UNBALANCED BST
5
/ \
3 8
/ \
7 9
/ \
6 10
SORTED ARRAY
|3|5|6|7|8|9|10|
Now here are the recursive calls and resulting tree,
DSW(initial array)
7
7.left = DSW(left array) //|3|5|6|
7.right = DSW(right array) //|8|9|10|
7
/ \
5 9
5.left = DSW(|3|)
5.right = DSW(|6|)
9.left = DSW(|8|)
9.right = DSW(|10|)
7
/ \
5 9
/ \ / \
3 6 8 10

Disjoint Set in a special ways?

We implement Disjoint Data structure with tree. in this data structure makeset() create a set with one element, merge(i, j) merge two tree of set i and j in such a way that tree with lower height become a child of root of the second tree. if we do n makeset() operation and n-1 merge() operations in random manner, and then do one find operation. what is the cost of this find operation in worst case?
I) O(n)
II) O(1)
III) O(n log n)
IV) O(log n)
Answer: IV.
Anyone could mentioned a good tips that the author get this solution?
The O(log n) find is only true when you use union by rank (also known as weighted union). When we use this optimisation, we always place the tree with lower rank under the root of the tree with higher rank. If both have the same rank, we choose arbitrarily, but increase the rank of the resulting tree by one. This gives an O(log n) bound on the depth of the tree. We can prove this by showing that a node that is i levels below the root (equivalent to being in a tree of rank >= i) is in a tree of at least 2i nodes (this is the same as showing a tree of size n has log n depth). This is easily done with induction.
Induction hypothesis: tree size is >= 2^j for j < i.
Case i == 0: the node is the root, size is 1 = 2^0.
Case i + 1: the length of a path is i + 1 if it was i and the tree was then placed underneath
another tree. By the induction hypothesis, it was in a tree of size >= 2^i at
that time. It is being placed under another tree, which by our merge rules means
it has at least rank i as well, and therefore also had >= 2^i nodes. The new tree
therefor has >= 2^i + 2^i = 2^(i + 1) nodes.

Searching in a balanced binary search tree

I was reading about balanced binary search tree. I found this statement about searching in such tree:
It is not true that when you are looking for something in a balanced binary search tree with n elements, it can in worst case needed n/2 comparisons.
Why it is not true?
Isn't it that we look either to the right side or the left side of the tree so the comparisons should be n/2?
The search worst case of Balanced Binary Search tree is governed by its height. It is O(height) where the height is log2(n) since it is balanced.
In worst case, the node that we looking for resides in a leaf or doesn't exist at all, and hence we need to traverse the tree from the root to its leafs which is O(lgn) and not O(n/2)
Consider the following balanced binary tree for n=7 (this is in fact a complete binary search tree, but lets leave that out of this discussion, as a complete binary search tree is also a balanced binary search tree).
5 depth 1 (root)
/----+----\
2 6 depth 2
/--+--\ /--+--\
1 3 4 7 depth 3
For searching of any number in this tree, the worst case scenario is that we reach the maximum depth of the tree (e.g., 3 in this case), until we terminate the search. At depth 3, we have performed 3 comparisons, hence, at arbitrary depth l, we would have performed l comparisons.
Now, for a complete binary search tree as the one above, of arbitrary size, we can hold 1 + 2^(maxDepth-1) different numbers. Now, let's say we have a complete binary search tree with exactly n (distinct) numbers. Then the following holds
n = 1 + 2^(maxDepth-1) (+)
Hence
(+) <=> 2^(maxDepth-1) = n - 1
<=> log2(2^(maxDepth - 1)) = log2(n - 1)
<=> maxDepth - 1 = log2(n - 1)
=> maxDepth = log2(n - 1) + 1
Recall from above that maxDepth told us the worst case number of comparisons for us to find a number (or it's non-existance) in our complete binary tree. Hence
worst case scenario, n nodes : log2(n-1) + 1
For studying asymptotic or limiting behaviour of this search, n can be considered sufficiently large, and hence log2(n) ~= log2(n-1) holds, and subsequently, we can say that a quite good (tight) upper bound for the algorithm is O(log2(n)). Hence
The time complexity for searching in a complete binary tree,
for n nodes, is O(log2(n))
For a non-complete binary search tree, an analogous reasoning as the one above leads to the same time complexity. Note that for a non-balanced search tree the worst case scenario for n nodes is n comparisons.
Answer: From above, it's clear that O(n/2) is not a proper bound for the time complexity of a binary search tree of size n; whereas however O(log2(n)) is. (Note that the prior might be a correct bound for sufficiently large n, but not a very good/tight one!)
Imagine the tree with 10 nodes: 1,2,3,4,5..10.
If you are looking for 5, how many comparisons would it take? How about if you look for 10?
It's actually never N/2.
The worst case scenario is that the element you are searching for is a leaf (or isn't contained in a tree), and the number of comparisons then is equal to tree height which is log2(n).
The best balanced binary tree is the AVL tree. I say "the best" conditioned to the fact that their modifying operations are O(log(n)). If the tree is perfectly balanced, then its height is still less (but it is not known a way for modifying it in O(log(n)).
It could be shown that the maximum height of an AVL tree is less than
1.4404 log(n+2) - 0.3277
Consequently the worst case for a search in an AVL tree is an unsuccessful search whose path from the root ends in the deepest node. But by the previous result, this path cannot be longer than 1.4404 log(n+2) - 0.3277.
And since 1.4404 log(n+2) - 0.3277 < n/2, the statement is false (assuming a n enough large)
lets first see the BST(binary search tree) properties which tell that..
-- root must be > then left_child
-- root must be < right child
10
/ \
8 12
/ \ / \
5 9 11 15
/ \ / \
1 7 14 25
height of given tree is 3(number of edges in longest path 10-14).
suppose you query to search 14 in given balanced BST
node-- 10 14 > 10
/ \ go to right sub tree because all nodes
8 12 in right sub tree are > 10
/ \ / \
5 9 11 15 n = 11 total node
/ \ / \
1 7 14 25
node-- 12 14 > 12
/ \ again go to right sub tree..
11 15
/ \ n = 5
14 25
node-- 15 14 > 15
/ \ this time node value is > required value so
14 25 goto left sub tree
n = 3
'node -- 14 14 == 14 value find
n = 1'
from above example we can see that at every comparison size of problem(number of nodes) halve we can also say that at every comparison we switch to next level thus height of tree is increased by 1 .
as max height of balanced BST is log(N) in worst case we need to go to leaf of tree hence we take log(N) step to do so..
hence O of BST search is log(N).

Resources