Skewed binary tree vs Perfect binary tree - space complexity - data-structures

Does a skewed binary tree take more space than, say, a perfect binary tree ?
I was solving the question #654 - Maximum Binary Tree on Leetcode, where given an array you gotta make a binary tree such that, the root is the maximum number in the array and the right and left sub-tree are made on the same principle by the sub-array on the right and left of the max number, and there its concluded that in average and best case(perfect binary tree) the space taken would be O(log(n)), and worst case(skewed binary tree) would be O(n).
For example, given nums = [1,3,2,7,4,6,5],
the tree would be as such,
7
/ \
3 6
/ \ / \
1 2 4 5
and if given nums = [7,6,5,4,3,2,1],
the tree would be as such,
7
\
6
/ \
5
/ \
4
/ \
3
/ \
2
/ \
1
According to my understanding they both should take O(n) space, since they both have n nodes. So i don't understand how they come to that conclusion.
Thanks in advance.

https://leetcode.com/problems/maximum-binary-tree/solution/
Under "Space complexity," it says:
Space complexity : O(n). The size of the set can grow upto n in the worst case. In the average case, the size will be nlogn for n elements in nums, giving an average case complexity of O(logn).
It's poorly worded, but it is correct. It's talking about the amount of memory required during construction of the tree, not the amount of memory that the tree itself occupies. As you correctly pointed out, the tree itself will occupy O(n) space, regardless if it's balanced or degenerate.
Consider the array [1,2,3,4,5,6,7]. You want the root to be the highest number, and the left to be everything that's to the left of the highest number in the array. Since the array is in ascending order, what happens is that you extract the 7 for the root, and then make a recursive call to construct the left subtree. Then you extract the 6 and make another recursive call to construct that node's left subtree. You continue making recursive calls until you place the 1. In all, you have six nested recursive calls: O(n).
Now look what happens if your initial array is [1,3,2,7,5,6,4]. You first place the 7, then make a recursive call with the subarray [1,3,2]. Then you place the 3 and make a recursive call to place the 1. Your tree is:
7
3
1
At this point, your call depth is 2. You return and place the 2. Then return from the two recursive calls. The tree is now:
7
3
1 2
Constructing the right subtree also requires a call depth of 2. At no point is the call depth more than two. That's O(log n).
It turns out that the call stack depth is the same as the tree's height. The height of a perfect tree is O(log n), and the height of a degenerate tree is O(n).

Related

What can a binary heap do that a binary search tree cannot?

This is something I do not quite understand. When I read literature on heaps, it always says that the big advantage of a heap is that you have the top (max if max heap) element immediately available. But couldn't you just use a BST and store a pointer to the same node (bottom-rightmost) and update the pointer with insertions/deletions?
If I'm not mistaken, with the BST implementation I'm describing you would have
================================================
| Insert | Remove Max
================================================
Special BST | O(log(n)) | O(1)
================================================
Max Heap | O(log(n)) | O(log(n))
================================================
making it better.
Pseudo-code:
Insert:
Same as regular BST insert, but can keep track of whether
item inserted is max because traversal will be entirely
in the right direction.
Delete
Set parent of max equal to null. Done.
What am I missing here?
But couldn't you just use a BST and store a pointer to the same node (bottom-rightmost) and update the pointer with insertions/deletions?
Yes, you could.
with the BST implementation I'm describing you would have [...] Remove Max O(1) [...] making it better.
[...] Set parent of max equal to null. Done.
No, Max removal wouldn't (always) be O(1), for the following reasons:
After you have removed the Max, you need to also update the pointer to reference the bottom right-most node. For example, take this tree, before the Max is removed:
8
/ \
5 20 <-- Max pointer
/ /
2 12
/ \
10 13
\
14
You'll have to find the node with value 14, so to update the Max pointer.
The above operation can be made to be O(1), by keeping the tree balanced, let's say according to the AVL rules. In that case the left child of the previous Max node would not have a right child, and the new Max node would be either its left child, or if it didn't have one, its parent. But as some deletions will make the tree unbalanced, they would need to be followed by a rebalancing operation. And that may involve several rotations. For instance, take this balanced BST:
8
/ \
5 13
/ \ / \
2 6 9 15 <-- Max pointer
/ \ \ \
1 4 7 10
/
3
After removal of node 15, it is easy to determine that 13 is the next Max, but the subtree rooted at 13 would not be balanced. After balancing it, the tree as a whole is unbalanced, and another rotation would be needed. The number of rotations could be O(logn).
Concluding, you can use a balanced BST with a Max pointer, but extraction of the Max node is still a O(logn) operation, giving it the same time complexity as the same operation in a binary heap.
What can a binary heap do that a binary search tree cannot?
Considering that a binary heap uses no pointers, and thus has much less "administrative" overhead than a self-balancing BST, the actual space consumption and runtime of the insert/delete operations will be better by a factor -- while their asymptotic complexity is the same.
Also, a binary heap can be built from a non-sorted array in O(n) time, while building a BST costs O(nlogn).
However, a BST is the way to go when you need to be able to traverse the values in their proper order, or find a value, or find a value's predecessor/successor. A binary heap has worse time complexities for such operations.
Both max heaps and balanced BST’S (eg AVL trees) perform these operations in O(log n) time. But BST’s take a constant factor more space due to pointers and their code is more complicated.
Since you're talking about BST's and not Balanced BST's, consider the following skewed BST:
1
\
2
\
3
\
...
\
n
You can hold a pointer reference to the max (n-th) element, but if you're inserting a value < n, it will require O(n) insertion time in the worst case. Also, to see the max value in the heap, you could simply do heap[0] (assuming the heap is implemented using an array) to get the max element in O(1) time for heap as well.

Can I achieve begin insertion on a binary tree in O(log(N))?

Consider a binary tree and some traverse criterion that defines an ordering of the tree's elements.
Does it exists some particular traverse criterion that would allow a begin_insert operation, i.e. the operation of adding a new element that would be at position 1 according to the ordering induced by the traverse criterion, with O(log(N)) cost?
I don't have any strict requirement, like the tree guaranteed to be balanced.
EDIT:
But I cannot accept lack of balance if that allows degeneration to O(N) in worst case scenarios.
EXAMPLE:
Let's try to see if in-order traversal would work.
Consider the BT (not a binary search tree)
6
/ \
13 5
/ \ /
2 8 9
In-order traversal gives 2-13-8-6-9-5
Perform begin_insert(7) in such a way that in-order traversal gives 7-2-13-8-6-9-5:
6
/ \
13 5
/ \ /
2 8 9
/
7
Now, I think this is not a legitimate O(log(N)) strategy, because if I keep adding values in this way the cost degenerates into O(N) as the tree becomes increasingly unbalanced
6
/ \
13 5
/ \ /
2 8 9
/
7
/
*
/
*
/
This strategy would work if I rebalance the tree by preserving ordering:
8
/ \
2 9
/ \ / \
7 13 6 5
but this costs at least O(N).
According to this example my conclusion would be that in-order traversal does not solve the problem, but since I received feedback that it should work maybe I am missing something?
Inserting, deleting and finding in a binary tree all rely on the same search algorithm to find the right position to do the operation. The complexity of this O(max height of the tree). The reason is that to find the right location you start at the root node and compare keys to decide if you should go into the left subtree or the right subtree and you do this until you find the right location. The worst case is when you have to travel down the longest chain which is also the definition for height of the tree.
If you don't have any constraints and allow any tree then this is going to be O(N) since you allow a tree with only left children (for example).
If you want to get better guarantees you must use algorithms that promise that the height of the tree has a lower bound. For example AVL guarantees that your tree is balanced so the max height is always log N and all the operations above run in O(log N). Red-black trees don't guarantee log N but promise that the tree is not going to be too unbalanced (min height * 2 >= max height) so it keeps O(log N) complexity (see page for details).
Depending on your usage patterns you might be able to find more specialized data structures that give even better complexity (see Fibonacci heap).

Does every level order traversal uniquely define a BST?

Suppose I have to compare whether two binary search trees are similar. Now, the basic approach is the recursive formulation that checks for the root to be equal and then continues to check the equality of the corresponding right and left subtrees.
However, will it be correct to state that if the binary search trees have the same level order traversals then they are the same? Stated differently, does every BST have a unique level order traversal?
No, it isn't.
The first one:
1
\
\
2
\
\
3
The second:
1
/ \
/ \
2 3
Level order will give 1 - 2 - 3 for these two.
Since the informational theory lower bound on representing a binary tree with n nodes is 2n - THETA(log n), I don't think any simple traversal should be able to identify a binary tree.
Google search confirms the lower bound:
lower bound bits binary tree
There is a simple reduction from BST to binary tree. Consider the BSTs with nodes value 1..n. The number of these BSTs is the number of binary trees with n nodes (you could always do a pre order traversal and insert the value in that order). If you can use a level order traversal to identify such a BST, you can use 1 for a "in-level" node, 0 for a "end-level" node. The first tree becomes "000", the second one "010". This will let a BST be identified with just n bits, with does not fit the information theory lower bound.
Well , I discussed this question with a friend of mine , so the answer isn't exactly mine! , but here's what came up, the level order traversal you do for a BST can be sorted and thus you can get the inorder traversal of the particular BST. Now you get two traversals which can then be used to uniquely identify the BST. Thus it wouldn't be incorrect to state that every BST has a unique level order traversal.
Algorithm:
ConstructBST(levelorder[] , int Size)
1. Declare array A of size n.
2. Copy levelorder into A
3. Sort A
From two traversals A and levelorder of a Binary Search Tree , of which one is inorder, construct the tree.

Why lookup in a Binary Search Tree is O(log(n))?

I can see how, when looking up a value in a BST we leave half the tree everytime we compare a node with the value we are looking for.
However I fail to see why the time complexity is O(log(n)). So, my question is:
If we have a tree of N elements, why the time complexity of looking up the tree and check if a particular value exists is O(log(n)), how do we get that?
Your question seems to be well answered here but to summarise in relation to your specific question it might be better to think of it in reverse; "what happens to the BST solution time as the number of nodes goes up"?
Essentially, in a BST every time you double the number of nodes you only increase the number of steps to solution by one. To extend this, four times the nodes gives two extra steps. Eight times the nodes gives three extra steps. Sixteen times the nodes gives four extra steps. And so on.
The base 2 log of the first number in these pairs is the second number in these pairs. It's base 2 log because this is a binary search (you halve the problem space each step).
For me the easiest way was to look at a graph of log2(n), where n is the number of nodes in the binary tree. As a table this looks like:
log2(n) = d
log2(1) = 0
log2(2) = 1
log2(4) = 2
log2(8) = 3
log2(16)= 4
log2(32)= 5
log2(64)= 6
and then I draw a little binary tree, this one goes from depth d=0 to d=3:
d=0 O
/ \
d=1 R B
/\ /\
d=2 R B R B
/\ /\ /\ /\
d=3 R B RB RB R B
So as the number of nodes, n, in the tree effectively doubles (e.g. n increases by 8 as it goes from 7 to 15 (which is almost a doubling) when the depth d goes from d=2 to d=3, increasing by 1.) So the additional amount of processing required (or time required) increases by only 1 additional computation (or iteration), because the amount of processing is related to d.
We can see that we go down only 1 additional level of depth d, from d=2 to d=3, to find the node we want out of all the nodes n, after doubling the number of nodes. This is true because we've now searched the whole tree, well, the half of it that we needed to search to find the node we wanted.
We can write this as d = log2(n), where d tells us how much computation (how many iterations) we need to do (on average) to reach any node in the tree, when there are n nodes in the tree.
This can be shown mathematically very easily.
Before I present that, let me clarify something. The complexity of lookup or find in a balanced binary search tree is O(log(n)). For a binary search tree in general, it is O(n). I'll show both below.
In a balanced binary search tree, in the worst case, the value I am looking for is in the leaf of the tree. I'll basically traverse from root to the leaf, by looking at each layer of the tree only once -due to the ordered structure of BSTs. Therefore, the number of searches I need to do is number of layers of the tree. Hence the problem boils down to finding a closed-form expression for the number of layers of a tree with n nodes.
This is where we'll do a simple induction. A tree with only 1 layer has only 1 node. A tree of 2 layers has 1+2 nodes. 3 layers 1+2+4 nodes etc. The pattern is clear: A tree with k layers has exactly
n=2^0+2^1+...+2^{k-1}
nodes. This is a geometric series, which implies
n=2^k-1,
equivalently:
k = log(n+1)
We know that big-oh is interested in large values of n, hence constants are irrelevant. Hence the O(log(n)) complexity.
I'll give another -much shorter- way to show the same result. Since while looking for a value we constantly split the tree into two halves, and we have to do this k times, where k is number of layers, the following is true:
(n+1)/2^k = 1,
which implies the exact same result. You have to convince yourself about where that +1 in n+1 is coming from, but it is okay even if you don't pay attention to it, since we are talking about large values of n.
Now let's discuss the general binary search tree. In the worst case, it is perfectly unbalanced, meaning all of its nodes has only one child (and it becomes a linked list) See e.g. https://www.cs.auckland.ac.nz/~jmor159/PLDS210/niemann/s_fig33.gif
In this case, to find the value in the leaf, I need to iterate on all nodes, hence O(n).
A final note is that these complexities hold true for not only find, but also insert and delete operations.
(I'll edit my equations with better-looking Latex math styling when I reach 10 rep points. SO won't let me right now.)
Whenever you see a runtime that has an O(log n) factor in it, there's a very good chance that you're looking at something of the form "keep dividing the size of some object by a constant." So probably the best way to think about this question is - as you're doing lookups in a binary search tree, what exactly is it that's getting cut down by a constant factor, and what exactly is that constant?
For starters, let's imagine that you have a perfectly balanced binary tree, something that looks like this:
*
/ \
* *
/ \ / \
* * * *
/ \ / \ / \ / \
* * * * * * * *
At each point in doing the search, you look at the current node. If it's the one you're looking for, great! You're totally done. On the other hand, if it isn't, then you either descend into the left subtree or the right subtree and then repeat this process.
If you walk into one of the two subtrees, you're essentially saying "I don't care at all about what's in that other subtree." You're throwing all the nodes in it away. And how many nodes are in there? Well, with a quick visual inspection - ideally one followed up with some nice math - you'll see that you're tossing out about half the nodes in the tree.
This means that at each step in a lookup, you either (1) find the node that you're looking for, or (2) toss out half the nodes in the tree. Since you're doing a constant amount of work at each step, you're looking at the hallmark behavior of O(log n) behavior - the work drops by a constant factor at each step, and so it can only do so logarithmically many times.
Now of course, not all trees look like this. AVL trees have the fun property that each time you descend down into a subtree, you throw away roughly a golden ratio fraction of the total nodes. This therefore guarantees you can only take logarithmically many steps before you run out of nodes - hence the O(log n) height. In a red/black tree, each step throws away (roughly) a quarter of the total nodes, and since you're shrinking by a constant factor you again get the O(log n) lookup time you'd like. The very fun scapegoat tree has a tuneable parameter that's used to determine how tightly balanced it is, but again you can show that every step you take throws away some constant factor based on this tuneable parameter, giving O(log n) lookups.
However, this analysis breaks down for imbalanced trees. If you have a purely degenerate tree - one where every node has exactly one child - then every step down the tree that you take only tosses away a single node, not a constant fraction. That means that the lookup time gets up to O(n) in the worst case, since the number of times you can subtract a constant from n is O(n).
If we have a tree of N elements, why the time complexity of looking up
the tree and check if a particular value exists is O(log(n)), how do
we get that?
That's not true. By default, a lookup in a Binary Search Tree is not O(log(n)), where n is a number of nodes. In the worst case, it can become O(n). For instance, if we insert values of the following sequence n, n - 1, ..., 1 (in the same order), then the tree will be represented as below:
n
/
n - 1
/
n - 2
/
...
1
A lookup for a node with value 1 has O(n) time complexity.
To make a lookup more efficient, the tree must be balanced so that its maximum height is proportional to log(n). In such case, the time complexity of lookup is O(log(n)) because finding any leaf is bounded by log(n) operations.
But again, not every Binary Search Tree is a Balanced Binary Search Tree. You must balance it to guarantee the O(log(n)) time complexity.

Big O(h) vs. Big O(logn) in trees

I have a question on time complex in trees operations.
It's said that (Data Structures, Horowitz et al) time complexity for insertion, deletion, search, finding mins-maxs, successor and predecessor nodes in BSTs is of O(h) while those of AVLs makes O(logn).
I don't exactly understand what the difference is. With h=[logn]+1 in mind, so why do we say O(h) and somewhere else O(logn)?
h is the height of the tree. It is always Omega(logn) [not asymptotically smaller then logn]. It can be very close to logn in complete tree (then you really get h=logn+1, but in a tree that decayed to a chain (each node has only one son) it is O(n).
For balanced trees, h=O(logn) (and in fact it is Theta(logn)), so any O(h) algorithm on those is actually O(logn).
The idea of self balancing search trees (and AVL is one of them) is to prevent the cases where the tree decays to a chain (or somewhere close to it), and its (the balanced tree) features ensures us O(logn) height.
EDIT:
To understand this issue better consider the next two trees (and forgive me for being terrible ascii artist):
tree 1 tree 2
7
/
6
/
5 4
/ / \
4 2 6
/ / \ / \
3 1 3 5 7
/
2
/
1
Both are valid Binary search trees, and in both searching for an element (say 1) will be O(h). But in the first, O(h) is actually O(n), while in the second it is O(logn)
O(h) means complexity linear dependent on tree height. If tree is balanced this asymptotic becomes O(logn) (n - number of elements). But it is not true for all trees. Imagine very unbalanced binary tree where each node has only left child, this tree become similar to list and number of elements in that tree equal to height of tree. Complexity for described operation will be O(n) instead of O(logn)

Resources