Suppose I have to compare whether two binary search trees are similar. Now, the basic approach is the recursive formulation that checks for the root to be equal and then continues to check the equality of the corresponding right and left subtrees.
However, will it be correct to state that if the binary search trees have the same level order traversals then they are the same? Stated differently, does every BST have a unique level order traversal?
No, it isn't.
The first one:
1
\
\
2
\
\
3
The second:
1
/ \
/ \
2 3
Level order will give 1 - 2 - 3 for these two.
Since the informational theory lower bound on representing a binary tree with n nodes is 2n - THETA(log n), I don't think any simple traversal should be able to identify a binary tree.
Google search confirms the lower bound:
lower bound bits binary tree
There is a simple reduction from BST to binary tree. Consider the BSTs with nodes value 1..n. The number of these BSTs is the number of binary trees with n nodes (you could always do a pre order traversal and insert the value in that order). If you can use a level order traversal to identify such a BST, you can use 1 for a "in-level" node, 0 for a "end-level" node. The first tree becomes "000", the second one "010". This will let a BST be identified with just n bits, with does not fit the information theory lower bound.
Well , I discussed this question with a friend of mine , so the answer isn't exactly mine! , but here's what came up, the level order traversal you do for a BST can be sorted and thus you can get the inorder traversal of the particular BST. Now you get two traversals which can then be used to uniquely identify the BST. Thus it wouldn't be incorrect to state that every BST has a unique level order traversal.
Algorithm:
ConstructBST(levelorder[] , int Size)
1. Declare array A of size n.
2. Copy levelorder into A
3. Sort A
From two traversals A and levelorder of a Binary Search Tree , of which one is inorder, construct the tree.
Related
Does a skewed binary tree take more space than, say, a perfect binary tree ?
I was solving the question #654 - Maximum Binary Tree on Leetcode, where given an array you gotta make a binary tree such that, the root is the maximum number in the array and the right and left sub-tree are made on the same principle by the sub-array on the right and left of the max number, and there its concluded that in average and best case(perfect binary tree) the space taken would be O(log(n)), and worst case(skewed binary tree) would be O(n).
For example, given nums = [1,3,2,7,4,6,5],
the tree would be as such,
7
/ \
3 6
/ \ / \
1 2 4 5
and if given nums = [7,6,5,4,3,2,1],
the tree would be as such,
7
\
6
/ \
5
/ \
4
/ \
3
/ \
2
/ \
1
According to my understanding they both should take O(n) space, since they both have n nodes. So i don't understand how they come to that conclusion.
Thanks in advance.
https://leetcode.com/problems/maximum-binary-tree/solution/
Under "Space complexity," it says:
Space complexity : O(n). The size of the set can grow upto n in the worst case. In the average case, the size will be nlogn for n elements in nums, giving an average case complexity of O(logn).
It's poorly worded, but it is correct. It's talking about the amount of memory required during construction of the tree, not the amount of memory that the tree itself occupies. As you correctly pointed out, the tree itself will occupy O(n) space, regardless if it's balanced or degenerate.
Consider the array [1,2,3,4,5,6,7]. You want the root to be the highest number, and the left to be everything that's to the left of the highest number in the array. Since the array is in ascending order, what happens is that you extract the 7 for the root, and then make a recursive call to construct the left subtree. Then you extract the 6 and make another recursive call to construct that node's left subtree. You continue making recursive calls until you place the 1. In all, you have six nested recursive calls: O(n).
Now look what happens if your initial array is [1,3,2,7,5,6,4]. You first place the 7, then make a recursive call with the subarray [1,3,2]. Then you place the 3 and make a recursive call to place the 1. Your tree is:
7
3
1
At this point, your call depth is 2. You return and place the 2. Then return from the two recursive calls. The tree is now:
7
3
1 2
Constructing the right subtree also requires a call depth of 2. At no point is the call depth more than two. That's O(log n).
It turns out that the call stack depth is the same as the tree's height. The height of a perfect tree is O(log n), and the height of a degenerate tree is O(n).
for the time efficiency of inserting into binary search tree,
I know that the best/average case of insertion is O(log n), where as the worst case is O(N).
What I'm wondering is if there is any way to ensure that we will always have best/average case when inserting besides implementing an AVL (Balanced BST)?
Thanks!
There is no guaranteed log n complexity without balancing a binary search tree. While searching/inserting/deleting, you have to navigate through the tree in order to position yourself at the right place and perform the operation. The key question is - what is the number of steps needed to get at the right position? If BST is balanced, you can expect on average 2^(i-1) nodes at the level i. This further means, if the tree has k levels (kis called the height of tree), the expected number of nodes in the tree is 1 + 2 + 4 + .. + 2^(k-1) = 2^k - 1 = n, which gives k = log n, and that is the average number of steps needed to navigate from the root to the leaf.
Having said that, there are various implementations of balanced BST. You mentioned AVL, the other very popular is red-black tree, which is used e.g. in C++ for implementing std::map or in Java for implementing TreeMap.
The worst case, O(n), can happen when you don't balance BST and your tree degenerates into a linked list. It is clear that in order to position at the end of the list (which is a worst case), you have to iterate through the whole list, and this requires n steps.
I have 2 binary search trees T1 and T2 with same number of nodes n >= 1. For each node P we have LEFT(P) and RIGHT(P) for links between nodes and KEY(P) for value off the node. The root of T1 is R1 and root of T2 is R2.
I need a linear algorithm which will determine values which are found both in T1 and in T2.
My idea until now is to do an inorder traversal of T1 and search in T2 for current element, like this:
inorder(node)
if node is not NULL
inorder(LEFT(node))
if find(KEY(node), R2)
print KEY(node)
inorder(RIGHT(node))
Where find(KEY(node), R2) implement a binary search for KEY(node) in tree T2.
Is this the correct solution? Is this a linear algorithm? (I know traversal is O(n) complexity). Or, there is another method to intersect 2 binary search trees?
Thanks!
Your current inorder traversal using recursion to perform the task. That makes it difficult to run more than one at the same time.
So, first I would re-write the method to use an explicit stack (example here in C#). Now, duplicate all of the state so that we perform traversals of both trees at the same time.
At any point where we're ready to yield a value from both trees, we compare their KEY() values. If they are unequal then we carry on the traversal of the tree with the lower KEY() value.
If both values are equal then we yield that value and continue traversing both trees again.
This is similar in concept to merging two sorted sequences - all we need to do is to examine the "next" value to be yielded by each sequence, yield the lower of the two values and then move forward in that sequence.
In answer to your original proposal:
Is this a linear algorithm?
No. For every node you visit during your inorder traversal, you're calling find which is O(log n). So your complete algorithm is (if I remember complexity correctly) O(n log n).
I can see how, when looking up a value in a BST we leave half the tree everytime we compare a node with the value we are looking for.
However I fail to see why the time complexity is O(log(n)). So, my question is:
If we have a tree of N elements, why the time complexity of looking up the tree and check if a particular value exists is O(log(n)), how do we get that?
Your question seems to be well answered here but to summarise in relation to your specific question it might be better to think of it in reverse; "what happens to the BST solution time as the number of nodes goes up"?
Essentially, in a BST every time you double the number of nodes you only increase the number of steps to solution by one. To extend this, four times the nodes gives two extra steps. Eight times the nodes gives three extra steps. Sixteen times the nodes gives four extra steps. And so on.
The base 2 log of the first number in these pairs is the second number in these pairs. It's base 2 log because this is a binary search (you halve the problem space each step).
For me the easiest way was to look at a graph of log2(n), where n is the number of nodes in the binary tree. As a table this looks like:
log2(n) = d
log2(1) = 0
log2(2) = 1
log2(4) = 2
log2(8) = 3
log2(16)= 4
log2(32)= 5
log2(64)= 6
and then I draw a little binary tree, this one goes from depth d=0 to d=3:
d=0 O
/ \
d=1 R B
/\ /\
d=2 R B R B
/\ /\ /\ /\
d=3 R B RB RB R B
So as the number of nodes, n, in the tree effectively doubles (e.g. n increases by 8 as it goes from 7 to 15 (which is almost a doubling) when the depth d goes from d=2 to d=3, increasing by 1.) So the additional amount of processing required (or time required) increases by only 1 additional computation (or iteration), because the amount of processing is related to d.
We can see that we go down only 1 additional level of depth d, from d=2 to d=3, to find the node we want out of all the nodes n, after doubling the number of nodes. This is true because we've now searched the whole tree, well, the half of it that we needed to search to find the node we wanted.
We can write this as d = log2(n), where d tells us how much computation (how many iterations) we need to do (on average) to reach any node in the tree, when there are n nodes in the tree.
This can be shown mathematically very easily.
Before I present that, let me clarify something. The complexity of lookup or find in a balanced binary search tree is O(log(n)). For a binary search tree in general, it is O(n). I'll show both below.
In a balanced binary search tree, in the worst case, the value I am looking for is in the leaf of the tree. I'll basically traverse from root to the leaf, by looking at each layer of the tree only once -due to the ordered structure of BSTs. Therefore, the number of searches I need to do is number of layers of the tree. Hence the problem boils down to finding a closed-form expression for the number of layers of a tree with n nodes.
This is where we'll do a simple induction. A tree with only 1 layer has only 1 node. A tree of 2 layers has 1+2 nodes. 3 layers 1+2+4 nodes etc. The pattern is clear: A tree with k layers has exactly
n=2^0+2^1+...+2^{k-1}
nodes. This is a geometric series, which implies
n=2^k-1,
equivalently:
k = log(n+1)
We know that big-oh is interested in large values of n, hence constants are irrelevant. Hence the O(log(n)) complexity.
I'll give another -much shorter- way to show the same result. Since while looking for a value we constantly split the tree into two halves, and we have to do this k times, where k is number of layers, the following is true:
(n+1)/2^k = 1,
which implies the exact same result. You have to convince yourself about where that +1 in n+1 is coming from, but it is okay even if you don't pay attention to it, since we are talking about large values of n.
Now let's discuss the general binary search tree. In the worst case, it is perfectly unbalanced, meaning all of its nodes has only one child (and it becomes a linked list) See e.g. https://www.cs.auckland.ac.nz/~jmor159/PLDS210/niemann/s_fig33.gif
In this case, to find the value in the leaf, I need to iterate on all nodes, hence O(n).
A final note is that these complexities hold true for not only find, but also insert and delete operations.
(I'll edit my equations with better-looking Latex math styling when I reach 10 rep points. SO won't let me right now.)
Whenever you see a runtime that has an O(log n) factor in it, there's a very good chance that you're looking at something of the form "keep dividing the size of some object by a constant." So probably the best way to think about this question is - as you're doing lookups in a binary search tree, what exactly is it that's getting cut down by a constant factor, and what exactly is that constant?
For starters, let's imagine that you have a perfectly balanced binary tree, something that looks like this:
*
/ \
* *
/ \ / \
* * * *
/ \ / \ / \ / \
* * * * * * * *
At each point in doing the search, you look at the current node. If it's the one you're looking for, great! You're totally done. On the other hand, if it isn't, then you either descend into the left subtree or the right subtree and then repeat this process.
If you walk into one of the two subtrees, you're essentially saying "I don't care at all about what's in that other subtree." You're throwing all the nodes in it away. And how many nodes are in there? Well, with a quick visual inspection - ideally one followed up with some nice math - you'll see that you're tossing out about half the nodes in the tree.
This means that at each step in a lookup, you either (1) find the node that you're looking for, or (2) toss out half the nodes in the tree. Since you're doing a constant amount of work at each step, you're looking at the hallmark behavior of O(log n) behavior - the work drops by a constant factor at each step, and so it can only do so logarithmically many times.
Now of course, not all trees look like this. AVL trees have the fun property that each time you descend down into a subtree, you throw away roughly a golden ratio fraction of the total nodes. This therefore guarantees you can only take logarithmically many steps before you run out of nodes - hence the O(log n) height. In a red/black tree, each step throws away (roughly) a quarter of the total nodes, and since you're shrinking by a constant factor you again get the O(log n) lookup time you'd like. The very fun scapegoat tree has a tuneable parameter that's used to determine how tightly balanced it is, but again you can show that every step you take throws away some constant factor based on this tuneable parameter, giving O(log n) lookups.
However, this analysis breaks down for imbalanced trees. If you have a purely degenerate tree - one where every node has exactly one child - then every step down the tree that you take only tosses away a single node, not a constant fraction. That means that the lookup time gets up to O(n) in the worst case, since the number of times you can subtract a constant from n is O(n).
If we have a tree of N elements, why the time complexity of looking up
the tree and check if a particular value exists is O(log(n)), how do
we get that?
That's not true. By default, a lookup in a Binary Search Tree is not O(log(n)), where n is a number of nodes. In the worst case, it can become O(n). For instance, if we insert values of the following sequence n, n - 1, ..., 1 (in the same order), then the tree will be represented as below:
n
/
n - 1
/
n - 2
/
...
1
A lookup for a node with value 1 has O(n) time complexity.
To make a lookup more efficient, the tree must be balanced so that its maximum height is proportional to log(n). In such case, the time complexity of lookup is O(log(n)) because finding any leaf is bounded by log(n) operations.
But again, not every Binary Search Tree is a Balanced Binary Search Tree. You must balance it to guarantee the O(log(n)) time complexity.
Why is it important that a binary tree be balanced
Imagine a tree that looks like this:
A
\
B
\
C
\
D
\
E
This is a valid binary tree, but now most operations are O(n) instead of O(lg n).
The balance of a binary tree is governed by the property called skewness. If a tree is more skewed, then the time complexity to access an element of a the binary tree increases. Say a tree
1
/ \
2 3
\ \
7 4
\
5
\
6
The above is also a binary tree, but right skewed. It has 7 elements, so an ideal binary tree require O(log 7) = 3 lookups. But you need to go one more level deep = 4 lookups in worst case. So the skewness here is a constant 1. But consider if the tree has thousands of nodes. The skewness will be even more considerable in that case. So it is important to keep the binary tree balanced.
But again the skewness is the topic of debate as the probablity analysis of a random binary tree shows that the average depth of a random binary tree with n elements is 4.3 log n . So it is really the matter of balancing vs the skewness.
One more interesting thing, computer scientists have even found an advantage in the skewness and proposed a skewed datastructure called skew heap
To ensure log(n) search time, you need to divide the total number of down level nodes by 2 at each branch. For example, if you have a linear tree, never branching from root to the leaf node, then the search time will be linear as in a linked list.
An extremely unbalanced tree, for example a tree where all nodes are linked to the left, means you still search through every single node before finding the last one, which is not the point of a tree at all and has no benefit over a linked list. Balancing the tree makes for better search times O(log(n)) as opposed to O(n).
As we know that most of the operations on Binary Search Trees proportional to height of the Tree, So it is desirable to keep height small. It ensure that search time strict to O(log(n)) of complexity.
Rather than that most of the Tree Balancing Techniques available applies more to
trees which are perfectly full or close to being perfectly balanced.
At the end of the end you need the simplicity over your tree and go for best binary trees like red-black tree or avl