One way to find the lower bound of a comparison based algorithm is to use the decision tree. U have two questions regarding this method :
1) We know that the height of the tree is path that connects the root node to the farthest leaf node ( longest path) which is equal to the number of comparisons made from the root node to the leaf node. Therefore , when we draw the tree for a comparison based algorithm we simply need to find the worst case time which corresponds to the largest path and therefore corresponds to the height of the tree. Now for any tree the height<=log2(number of leaf nodes) which is identical to Cworst(n) <= log2(n) and now we have a lower bound for the Cworst(n) and therefore the lower bound of the problem = log2(n). Is my understanding right ?
2) What is the meaning of having an inequality for Cworst(n) for a specific comparison problem? Does this means that for a specific comparison problem we can draw many trees and every time for the path of the worst case scenario the height will have a value that satisfies the equality ? This means that for a specific problem we can draw many different trees ?
A decision tree illustrates the possible executions of an algorithm on a specific category of inputs. In the case of comparison-based sorting, a category would consist of all input lists of a certain size, so there's one tree for n = 1, one for n = 2, one for n = 3, and so on. Each tree is agnostic to the exact input values, but traces the possible directions the computation might go in depending on how the input values compare to each other.
One application of a decision tree is to reason about upper and lower bounds on the runtime complexity. As you mentioned, the height of a tree represents the upper bound on the runtime for that tree's input size, and if you can find a general relation between the input size and the tree height for all the decision trees of an algorithm, you have found an expression for the upper bound on the runtime. For example, if you analyze Bubble Sort, you will find that the decision tree for input size n has a height that's roughly n * (n + 1) / 2, so now you know that its runtime is bounded by O(n^2). And since we have an upper bound on the general runtime, that also becomes an upper bound on the worst-case runtime.
When looking at a specific algorithm, we're usually interested in how fast it could possibly be (best case), how fast it usually is (average case), and how slow it could possibly be (worst case). And we often express the best case using a lower bound (Omega), because a lower bound for the best case is also a lower bound for the general runtime; similarly, we often express the worst case using an upper bound (O), because an upper bound for the worst case is also an upper bound for the general runtime. But we don't have to - O, Omega, and Theta are only mathematical tools that say something about a function, not caring that the function in our case describes a runtime complexity. So let's do something unusual: let's look at all the possible algorithms for a problem, and use decision trees to try to figure something out about all their worst-case complexities. Then, the interesting question isn't what the upper bound is, because it's easy to make extremely slow sorting algorithms. Instead, we're interested in the lower bound: what's the best worst case? Which algorithm makes the best guarantee about how slow it will be in the worst case?
Any sorting algorithm must be able to handle any order of its input elements. Each leaf node represents one particular final permutation (rearrangement) of the input elements, and with input size n, there are n! permutations. So a decision tree for input size n has at least n! leaf nodes. Any algorithm that wants to have a good worst case needs to have a balanced tree where all the leaf nodes are on the deepest or second-deepest level. And a balanced tree with n! leaf nodes must have a height of at Omega(n lg n). Now we know something very interesting: for any comparison-based sorting algorithm, the best possible height (which represents the worst-case runtime) is at least n lg n! In other words, it is impossible to create a comparison-based sorting algorithm that always is faster than n lg n.
(Note: height <= log2(leaf nodes) is only the case for balanced trees. A tree's eight might be as much as the number of nodes minus one.)
Related
If someone wants to generates a complete binary tree. This tree has h levels where h can be any positive integer and as an input to the algorithm. What complexity will it lie in and why?
A complete binary tree is tree where all levels are full of nodes except the last level, we can define the time complexity in terms of upper bound.
If we know the height of the tree is h, then the maximum number of possible nodes in the tree are 2h - 1.
Therefore, time complexity = O(2h - 1).
To sell your algorithm in the market, you need tight upper bounds to prove that your algorithm is better than the others'.
A slightly tight upper bound for this problem can be defined after knowing exactly how many nodes are there in the tree. Let's say there are N.
Then, the time complexity = O(N).
I have an algorithm which operates on a rooted tree. It first recursively computes results for each of the root's child subtrees. It then does some work to combine them. The amount of work at the root is K^2 where K is the number of distinct values among the sizes of the subtrees.
What's the best bound on its runtime complexity? I haven't been able to construct a case in which it does more than linear work in the size of the tree.
This is governed by the Master Theorem of divide and conqour algorithms. For this particular case (me reading between the lines in what you have described) it is mainly determined by how much work it takes on a single node to combine the work compiled for K values in the subtrees. Specifically if it is less than K work, then the cost is dominated by the cost at the lowest level and would be O(K) in total, if the work at a given level is O(K) then the total work becomes O(K log(K)). For work at a level higher than O(K), it is dominated by the work at the higest level. We therefor have that your algorithm as a runtime complexity of O(K^2).
What is the range search complexity for R tree and R* Tree? I understand the process of range search: similar to a DFS search, it visits each node and if a node's bounding box intersects the target range, then include the node in the result set. More precisely, we also need to consider the branch-and-bound strategy it uses: if a parent node doesn't intersect with the target, then we don't visit its children nodes. Then the complexity should be smaller than O(n), where n is the number of nodes. I really don't know how to calculate the number of nodes given the number of leaves(or data points).
Could anybody give me an explanation here? Thank you.
Obviously, the worst case must be at least O(n) if your range is [-∞;∞] in every dimension. It may be as bad as O(n log n) then because of the tree.
Assuming the answer is a single entry, the average case probably is O(log n) - only few paths through the tree need to be followed (if you have little enough overlap).
It is log to the base of your page size. So it will usually not exceed 5, because you never want trees with more than say 1000^5=10^15 objects.
For all practical purposes, assume the runtime complexity is simply the answer set size O(s). Select 2% of your data it takes twice as long as 1%.
How do we calculate the time complexity of a tree which splits in an uneven manner say in the ratio of 1:3 unlike the typical binary tree splitting in two equal halves?
This smells of homework, but I'll give it a go anyway.
A tree doesn't inherently have time complexity, but I think I get what you mean.
Most algorithms with a nicely balanced tree have a log2(n) component to their complexity. If you split it in thirds instead, the base of the logarithm will be 3/2.
So where a binary search in a regular binary search tree would be in O(log2(n)), in your scenario, it would be O(log3/2(n)).
That said, the base of a logarithm can be changed by multiplying a constant, and we don't take constants into account in complexity theory. So technically, while the worst case is slower in this scenario, it's in the same time complexity.
I came to know the height of Random-BST/Red-Black trees and some other trees are O(log n).
I wonder, how this can be. Lets say I have a tree like this
The height of the tree is essentially the depth of the tree, which is in this case will be 4 (leaving the parent depth). But how could people say that the height can be represented by O(log n) notion?
I'm very to algorithms, and this point is confusing me a lot. Where I'm missing the point?
In algorithm complexity the variable n typically refers to the total number of items in a collection or involved in some calculation. In this case, n is the total number of nodes in the tree. So, in the picture you posted n=31. If the height of the tree is O(log n) that means that the height of the tree is proportional to the log of n. Since this is a binary tree, you'd use log base 2.
⌊log₂(31)⌋ = 4
Therefore, the height of the tree should be about 4—which is exactly the case in your example.
As I explained in a comment, a binary tree can have multiple cases:
In the degenerate case, a binary tree is simply a chain, and its height is O(n).
In the best case (for most search algorithms), a complete binary tree has the property that for any node, the height of the subtrees are the same. In this case the length will be the floor of log(n) (base 2, or base k, for k branches). You can prove this by induction on the size of the tree (structural induction in the constructors)
In the general case you will have a mix of these, a tree constructed where any node has subtress with possibly different height.