for the time efficiency of inserting into binary search tree,
I know that the best/average case of insertion is O(log n), where as the worst case is O(N).
What I'm wondering is if there is any way to ensure that we will always have best/average case when inserting besides implementing an AVL (Balanced BST)?
Thanks!
There is no guaranteed log n complexity without balancing a binary search tree. While searching/inserting/deleting, you have to navigate through the tree in order to position yourself at the right place and perform the operation. The key question is - what is the number of steps needed to get at the right position? If BST is balanced, you can expect on average 2^(i-1) nodes at the level i. This further means, if the tree has k levels (kis called the height of tree), the expected number of nodes in the tree is 1 + 2 + 4 + .. + 2^(k-1) = 2^k - 1 = n, which gives k = log n, and that is the average number of steps needed to navigate from the root to the leaf.
Having said that, there are various implementations of balanced BST. You mentioned AVL, the other very popular is red-black tree, which is used e.g. in C++ for implementing std::map or in Java for implementing TreeMap.
The worst case, O(n), can happen when you don't balance BST and your tree degenerates into a linked list. It is clear that in order to position at the end of the list (which is a worst case), you have to iterate through the whole list, and this requires n steps.
Related
I know that searching a balanced tree with n nodes is O(logN),but I don't even know why the tree the question states is also a balanced BST.
Well, as you said, a balanced BST with k has a lookup time of O(log k). So all we have to do is plug in n2n for k to see what we get:
log (n2n)
= log n + log 2n
= log n + n log 2
= O(n).
And that makes sense, since a tree with exponentially many nodes in it hit with a logarithmic-time algorithm ought to take linear time.
The best case running time for binary search is O(log(n)), if the binary tree is balanced. The worst case would be, if the binary tree is so unbalanced, that it basically represents a linked list. In that case the running time of a binary search would be O(n).
However, what if the tree is only slightly unbalanced, as is teh case for this tree:
Best case would still be O(log n) if I am not mistaken. But what would be the worst case?
Typically, when we say something like "the cost of looking up an element in a balanced binary search tree is O(log n)," what we mean is "in the worst case, we have to do O(log n) work in the course of performing a search on a balanced binary search tree." And since we're talking about big-O notation here, the previous statement is meant to be taken about balanced trees in general rather than a specific concrete tree.
If you have a specific BST in mind, you can work out the maximum number of comparisons required to find any element. Just find the deepest node in the tree, then imagine searching for a value that's bigger than that value but smaller than the next value in the tree. That will cause you to walk all the way down the tree as deeply as possible, making the maximum number of comparisons possible (specifically, h + 1 of them, where h is the height of the tree).
To be able to talk about the big-O cost of performing lookups in a tree, you'd need to talk about a family of trees of different numbers of nodes. You could imagine "kinda balanced" trees whose depth is Θ(√n), for example, where lookups would take time O(√n), for example. However, it's uncommon to encounter trees like that in practice, since generally you'd either (1) have a totally imbalanced tree or (2) use some sort of balanced tree that would prevent the height from getting that high.
In a sorted array of n values, the run-time of binary search for a value, is
O(log n), in the worst case. In the best case, the element you are searching for, is in the exact middle, and it can finish up in constant-time. In the average case too, the run-time is O(log n).
I have two questions:
What is the difference between nearly balanced BST and nearly Complete Binary tree. Even if the definition of the former is clear then we can differenciate, but not able to get a relevant article.
Today, in my class I was taught about the condition to be balanced as:
max( size(root.left) , size(root.right) ) <= 3*n/4 ------------ (eqn 1).
Hence, H(n) = height for the tree of n nodes following the above property < = 1+H(3*n/4).
Continuing the recursive steps we get the bound for logn.
My question is that, is this a specific type of BST ? For example in case of AVL trees, as I remember the condtion is that the difference in the heights of left and right childs being atmost 1, or is this a more general result and the equation 1 as stated earlier can be reduced to prove the result for AVL Trees as well ? i.e. any Balanced BST will result in difference of heights of siblings being atmost 1 ?
In case its different than AVL, how do we manage the Insertion and Delete Operations in this new kind of tree ?
EDIT : Also if you can explain why 3*n/4 only ?
My Thought: It is because we can then surely say that H(n) <= 1+H(3*n/4), since if we take something like 3n/5 less than 3n/4 then H(3n/5) wont be necessarily less that H(2n/5) as the raio of 3n/5 and 2n/5 is less than 2 and as we know a factor of 2 for number of nodes increases the height by 1.
So we wont surely write H(n) <= 1 + H(3n/5), it may be H(2n/5) in place of H(3n/5) as well, am I right ?
A nearly complete BST is a BST where all levels are filled, except the last one. Definitions are kind of messed up here (some call this property perfect). Please refer to wikipedia for this.
Being balanced is a less strict criterion, i.e. all (nearly) complete BSTs are balanced, but not all balanced BSTs are complete. In that Wikipedia article is a definition for that too. Im my world a BST is balanced, if it leads to O(log n) operation cost.
For example, one can say, a BST is balanced, if each subtree has at most epsilon * n nodes, where epsilon < 1 (for example epsilon = 3/4 or even epsilon = 0.999 -- which are practically not balanced at all).
The reason for that is that the height of such a BST is roughly log_{1/epsilon} n = log_2 n / (- log_2 epsilon) = O(log n), but 1 / (- log_2 0.99) = 99.5 is a huge constant. You can try to prove that with the usual ration of epsilon = 1/2, where both subtrees have roughly the same size.
I don't know of a common BST, which uses this 3/4. Common BSTs are for example Red-Black-Trees, Splay-Trees or - if you are on a hard disk - a whole family of B-Trees. For your example, you can probably implement the operations by augumenting each node with two integers representing the number of nodes in the left and the right subtree respectively. When inserting or deleting someting, you update the numbers as you walk from the root to the leaf (or up) and if the condition is validated, you do a rotation.
In what situation, searching a term using a binary search tree requires a time complexity that is linear to the size of the term vocabulary (say M)? How to ensure a worst time complexity of log M?
A complete binary tree is one for which every level, except possibly the last, is completely filled. The worst case search peformance is the height of the tree, which in this case would be O(lgM), assuming M vocabulay terms in the tree.
One way to ensure this performance would be to use a self-balancing tree, e.g. a red-black tree.
Since binary search is a divide-and-conquer algorithm, we can ensure O(log M) if the tree is balanced, with equal number of terms under the sub-trees of any node. O(log M) basically means that time goes up linearly while M goes up exponentially. If it takes 1 second to search a balanced binary tree that is 10 nodes, it’d take 2 seconds to search an equally balanced tree with 100 nodes, 3 seconds to search 1000 nodes, and so on.
But if the binary search tree is extremely unbalanced to the point where it looks a lot like a linked list, we would have to go through every node, requiring a time complexity that is linear to M.
I can see how, when looking up a value in a BST we leave half the tree everytime we compare a node with the value we are looking for.
However I fail to see why the time complexity is O(log(n)). So, my question is:
If we have a tree of N elements, why the time complexity of looking up the tree and check if a particular value exists is O(log(n)), how do we get that?
Your question seems to be well answered here but to summarise in relation to your specific question it might be better to think of it in reverse; "what happens to the BST solution time as the number of nodes goes up"?
Essentially, in a BST every time you double the number of nodes you only increase the number of steps to solution by one. To extend this, four times the nodes gives two extra steps. Eight times the nodes gives three extra steps. Sixteen times the nodes gives four extra steps. And so on.
The base 2 log of the first number in these pairs is the second number in these pairs. It's base 2 log because this is a binary search (you halve the problem space each step).
For me the easiest way was to look at a graph of log2(n), where n is the number of nodes in the binary tree. As a table this looks like:
log2(n) = d
log2(1) = 0
log2(2) = 1
log2(4) = 2
log2(8) = 3
log2(16)= 4
log2(32)= 5
log2(64)= 6
and then I draw a little binary tree, this one goes from depth d=0 to d=3:
d=0 O
/ \
d=1 R B
/\ /\
d=2 R B R B
/\ /\ /\ /\
d=3 R B RB RB R B
So as the number of nodes, n, in the tree effectively doubles (e.g. n increases by 8 as it goes from 7 to 15 (which is almost a doubling) when the depth d goes from d=2 to d=3, increasing by 1.) So the additional amount of processing required (or time required) increases by only 1 additional computation (or iteration), because the amount of processing is related to d.
We can see that we go down only 1 additional level of depth d, from d=2 to d=3, to find the node we want out of all the nodes n, after doubling the number of nodes. This is true because we've now searched the whole tree, well, the half of it that we needed to search to find the node we wanted.
We can write this as d = log2(n), where d tells us how much computation (how many iterations) we need to do (on average) to reach any node in the tree, when there are n nodes in the tree.
This can be shown mathematically very easily.
Before I present that, let me clarify something. The complexity of lookup or find in a balanced binary search tree is O(log(n)). For a binary search tree in general, it is O(n). I'll show both below.
In a balanced binary search tree, in the worst case, the value I am looking for is in the leaf of the tree. I'll basically traverse from root to the leaf, by looking at each layer of the tree only once -due to the ordered structure of BSTs. Therefore, the number of searches I need to do is number of layers of the tree. Hence the problem boils down to finding a closed-form expression for the number of layers of a tree with n nodes.
This is where we'll do a simple induction. A tree with only 1 layer has only 1 node. A tree of 2 layers has 1+2 nodes. 3 layers 1+2+4 nodes etc. The pattern is clear: A tree with k layers has exactly
n=2^0+2^1+...+2^{k-1}
nodes. This is a geometric series, which implies
n=2^k-1,
equivalently:
k = log(n+1)
We know that big-oh is interested in large values of n, hence constants are irrelevant. Hence the O(log(n)) complexity.
I'll give another -much shorter- way to show the same result. Since while looking for a value we constantly split the tree into two halves, and we have to do this k times, where k is number of layers, the following is true:
(n+1)/2^k = 1,
which implies the exact same result. You have to convince yourself about where that +1 in n+1 is coming from, but it is okay even if you don't pay attention to it, since we are talking about large values of n.
Now let's discuss the general binary search tree. In the worst case, it is perfectly unbalanced, meaning all of its nodes has only one child (and it becomes a linked list) See e.g. https://www.cs.auckland.ac.nz/~jmor159/PLDS210/niemann/s_fig33.gif
In this case, to find the value in the leaf, I need to iterate on all nodes, hence O(n).
A final note is that these complexities hold true for not only find, but also insert and delete operations.
(I'll edit my equations with better-looking Latex math styling when I reach 10 rep points. SO won't let me right now.)
Whenever you see a runtime that has an O(log n) factor in it, there's a very good chance that you're looking at something of the form "keep dividing the size of some object by a constant." So probably the best way to think about this question is - as you're doing lookups in a binary search tree, what exactly is it that's getting cut down by a constant factor, and what exactly is that constant?
For starters, let's imagine that you have a perfectly balanced binary tree, something that looks like this:
*
/ \
* *
/ \ / \
* * * *
/ \ / \ / \ / \
* * * * * * * *
At each point in doing the search, you look at the current node. If it's the one you're looking for, great! You're totally done. On the other hand, if it isn't, then you either descend into the left subtree or the right subtree and then repeat this process.
If you walk into one of the two subtrees, you're essentially saying "I don't care at all about what's in that other subtree." You're throwing all the nodes in it away. And how many nodes are in there? Well, with a quick visual inspection - ideally one followed up with some nice math - you'll see that you're tossing out about half the nodes in the tree.
This means that at each step in a lookup, you either (1) find the node that you're looking for, or (2) toss out half the nodes in the tree. Since you're doing a constant amount of work at each step, you're looking at the hallmark behavior of O(log n) behavior - the work drops by a constant factor at each step, and so it can only do so logarithmically many times.
Now of course, not all trees look like this. AVL trees have the fun property that each time you descend down into a subtree, you throw away roughly a golden ratio fraction of the total nodes. This therefore guarantees you can only take logarithmically many steps before you run out of nodes - hence the O(log n) height. In a red/black tree, each step throws away (roughly) a quarter of the total nodes, and since you're shrinking by a constant factor you again get the O(log n) lookup time you'd like. The very fun scapegoat tree has a tuneable parameter that's used to determine how tightly balanced it is, but again you can show that every step you take throws away some constant factor based on this tuneable parameter, giving O(log n) lookups.
However, this analysis breaks down for imbalanced trees. If you have a purely degenerate tree - one where every node has exactly one child - then every step down the tree that you take only tosses away a single node, not a constant fraction. That means that the lookup time gets up to O(n) in the worst case, since the number of times you can subtract a constant from n is O(n).
If we have a tree of N elements, why the time complexity of looking up
the tree and check if a particular value exists is O(log(n)), how do
we get that?
That's not true. By default, a lookup in a Binary Search Tree is not O(log(n)), where n is a number of nodes. In the worst case, it can become O(n). For instance, if we insert values of the following sequence n, n - 1, ..., 1 (in the same order), then the tree will be represented as below:
n
/
n - 1
/
n - 2
/
...
1
A lookup for a node with value 1 has O(n) time complexity.
To make a lookup more efficient, the tree must be balanced so that its maximum height is proportional to log(n). In such case, the time complexity of lookup is O(log(n)) because finding any leaf is bounded by log(n) operations.
But again, not every Binary Search Tree is a Balanced Binary Search Tree. You must balance it to guarantee the O(log(n)) time complexity.