There is a comparison-based sorting algorithm that runs in O(n*log(sqrt(n))).
Given the existence of an Omega(n(log(n)) lower bound for sorting, how can this be possible?
Basically, this problem is asking you to prove that O(n*log(n)) = O(n*log(βn)), which means that you need to find some constant c > 0 such that: O(n*log(n)) = O(c*n*log(βn)). Remember that βn = n^(1/2) and that log(n^(1/2)) = 1/2*log(n). So, now we have O(n*log(n)) = O(1/2*n*log(n)). Since asymptotic notation ignores constant multipliers, we can rewrite this as O(n*log(n)) = O(n*log(n)). Voila, proof positive that it is possible.
For a sorting algorithm based on comparison you can draw a decision tree. It is a binary tree representing comparisons done by the algorithm, and every leaf of this tree is a permutation of the elements from given set.
There are n! possible permutations, where n is the size of the set, and only one of them represents the sorted set. Path leading to every leaf represents comparisons necessary to achieve permutation represented by the leaf.
Now lets make h the height of our decision tree, and l to be the number of leafs. Every possible permutation of the input set must be in one of the leafs, so n! <= l. A binary tree with height h can have at most 2^h leafs. Therefore we get n! <= l <= 2^h. Apply a logarithm to both sides, so you get h >= log(n!), and log(n!) is Omega(nlog(n)).
Because the height of the decision tree represents a number of comparisons necessary to get to the leaf, this is the proof that the lower bound for sorting algorithms based on comparison is nlog(n). This can't be done faster. So the only option left for this task to be correct is to assume that Omega(nlog(n) is also Omega(nlog(sqrt(n)). log(sqrt(n)) = log(n^(1/2)) = (1/2)log(n) => nlog(sqrt(n)) = n((1/2)log(n)) = (1/2)nlog(n). Ignore const = 1/2 (as we are interested in asympthotic complexity) an you get nlog(sqrt(n)) = nlog(n) in terms of complexity.
Related
Rank the following functions representing running times from smallest to largest (in terms of
growth rate with respect to n) and group those functions that are in the same equivalence class
list of functions
2n^3+12n^2+5,
8(log n)^2,
1.5^n,
n^4-12n^3,
4n^3(log n),
4n^3,
n!,
7n+6
My solution in ascending order is :
8(log n)^2 - logarithmic complexity
7n+6 - linear complexity
2n^3+12n^2+5, 4n^3(log n), 4n^3 - polynomial complexity
n^4-12n^3 - polynomial complexity
1.5^n - Factorial complexity
n! - Exponential complexity
Unsure if what I have done is correct. Any feedback will be greatly appreciated.
The complexities are as follows:
2π3+12n2+5 = O(π3)
8(logπ)2 = O(log2π)
1.5π = O(1.5π)
π4β12π3 = O(π4)
4π3logπ = O(π3logπ)
4π3 = O(π3)
π! = O(π!)
7π+6 = O(π)
When grouped by their complexities and put in ascending order:
O(log2π)
8(logπ)2
O(π)
7π+6
O(π3)
2π3+12n2+5
4π3
O(π3logπ)
4π3logπ
O(π4)
π4β12π3
O(1.5π)
1.5π
O(π!)
π!
So there is essentially one difference with your output: O(π3logπ) is not O(π3): the first is of a greater order:
We can prove this by contradiction. If for a moment we assume that O(π3logπ) is O(π3), then we can find a π0 and a πΆ such that π3logπ β€ πΆπ3 for all π > π0. But this would mean that logπ β€ πΆ or π β€ 2πΆ. This is not true for any π > 2πΆ, and so the proposition does not hold.
Another correction: the labels "Exponential" and "Factorial" were inverted in your output.
O(π!) is factorial and O(1.5π) is exponential.
I saw a very short algorithm for merging two binary search trees. I was surprised how easy and also very inefficient it is. But when I tried to guess its time complexity, I failed.
Lets have a two immutable binary search trees (not balanced) that contains integers and you want to merge them together with the following recursive algorithm in pseudo code. Function insert is auxiliary:
function insert(Tree t, int elem) returns Tree:
if elem < t.elem:
return new Tree(t.elem, insert(t.leftSubtree, elem), t.rightSubtree)
elseif elem > t.elem:
return new Tree(t.elem, t.leftSubtree, insert(t.rightSubtree, elem))
else
return t
function merge(Tree t1, Tree t2) returns Tree:
if t1 or t2 is Empty:
return chooseNonEmpty(t1, t2)
else
return insert(merge(merge(t1.leftSubtree, t1.rightSubtree), t2), t1.elem)
I guess its an exponencial algorithm but I cannot find an argument for that. What is the worst time complexity of this merge algorithm?
Let's consider the worst case:
At each stage every tree is in the maximally imbalanced state, i.e. each node has at least one sub-tree of size 1.
In this extremal case the complexity of insert is quite easily shown to be Σ¨(n) where n is the number of elements in the tree, as the height is ~ n/2.
Based on the above constraint, we can deduce a recurrence relation for the time complexity of merge:
where n, m are the sizes of t1, t2. It is assumed without loss of generality that the right sub-tree always contains a single element. The terms correspond to:
T(n - 2, 1): the inner call to merge on the sub-trees of t1
T(n - 1, m): the outer call to merge on t2
Σ¨(n + m): the final call to insert
To solve this, let's re-substitute the first term and observe a pattern:
We can solve this sum by stripping out the first term:
Where in step (*) we used a change-in-variable substitution i -> i + 1. The recursion stops when k = n:
T(1, m) is just the insertion of an element into a tree of size m, which is obviously Σ¨(m) in our assumed setup.
Therefore the absolute worst-case time complexity of merge is
Notes:
The order of the parameters matters. It is thus common to insert the smaller tree into the larger tree (in a manner of speaking).
Realistically you are extremely unlikely to have maximally imbalanced trees at every stage of the procedure. The average case will naturally involve semi-balanced trees.
The optimal case (i.e. always perfectly balanced trees) is much more complex (I am unsure that an analytical solution like the above exists; see gdelab's answer).
EDIT: How to evaluate the exponential sum
Suppose we want to compute the sum:
where a, b, c, n are positive constants. In the second step we changed the base to e (the natural exponential constant). With this substitution we can treat ln c as a variable x, differentiate a geometrical progression with respect to it, then set x = ln c:
But the geometrical progression has a closed-form solution (a standard formula which is not difficult to derive):
And so we can differentiate this result with respect to x by n times to obtain an expression for Sn. For the problem above we only need the first two powers:
So that troublesome term is given by:
which is exactly what Wolfram Alpha directly quoted. As you can see, the basic idea behind this was simple, although the algebra was incredibly tedious.
It's quite hard to compute exactly, but it looks like it's not polynomially bounded in the worst case (this is not a complete proof however, you'd need a better one):
insert has complexity O(h) at worst, where h is the height of the tree (i.e. at least log(n),possibly n).
The complexity of merge() could then be of the form: T(n1, n2) = O(h) + T(n1 / 2, n1 / 2) + T(n1 - 1, n2)
let's consider F(n) such that F(1)=T(1, 1) and F(n+1)=log(n)+F(n/2)+F(n-1). We can probably show that F(n) is smaller than T(n, n) (since F(n+1) contains T(n, n) instead of T(n, n+1)).
We have F(n)/F(n-1) = log(n)/F(n-1) + F(n/2) / F(n-1) + 1
Assume F(n)=Theta(n^k) for some k. Then F(n/2) / F(n-1) >= a / 2^k for some a>0 (that comes from the constants in the Theta).
Which means that (beyond a certain point n0) we always have F(n) / F(n-1) >= 1 + epsilon for some fixed epsilon > 0, which is not compatible with F(n)=O(n^k), hence a contradiction.
So F(n) is not a Theta(n^k) for any k. Intuitively, you can see that the problem is probably not the Omega part but the big-O part, hence it's probably not a O(n) (but technically we used the Omega part here to get a). Since T(n, n) should be even bigger than F(n), T(n, n) should not be polynomial, and is maybe exponential...
But then again, this is not rigorous at all, so maybe I'm actually dead wrong...
Let's an algorithm to find all paths between two nodes in a directed, acyclic, non-weighted graph, that may contain more than one edge between the same two vertices. (this DAG is just an example, please I'm not discussing this case specifically, so disregard it's correctness though it's correct, I think).
We have two effecting factors which are:
mc: max number of outgoing edges from a vertex.
ml: length of the max length path measured by number of edges.
Using an iterative fashion to solve the problem, where complexity in the following stands for count of processing operations done.
for the first iteration the complexity = mc
for the second iteration the complexity = mc*mc
for the third iteration the complexity = mc*mc*mc
for the (max length path)th iteration the complexity = mc^ml
Total worst complexity is (mc + mc*mc + ... + mc^ml).
1- can we say it's O(mc^ml)?
2- Is this exponential complexity?, as I know, in exponential complexity, the variable only appear at the exponent, not at the base.
3- Are mc and ml both are variables in my algorithm comlexity?
There's a faster way to achieve the answer in O(V + E), but seems like your question is about calculating complexity, not about optimizing algorithm.
Yes, seems like it's O(mc^ml)
Yes, they bot can be variables in your algorithm complexity
As about the complexity of your algorithm: let's do some transformation, using the fact that a^b = e^(b*ln(a)):
mc^ml = (e^ln(mc))^ml = e^(ml*ln(mc)) < e^(ml*mc) if ml,mc -> infinity
So, basically, your algorithm complexity upperbound is O(e^(ml*mc)), but we can still shorten it to see, if it's really an exponential complexity. Assume that ml, mc <= N, where N is, let's say, max(ml, mc). So:
e^(ml*mc) <= e^N^2 = e^(e^2*ln(N)) = (e^e)^(2*ln(N)) < (e^e)^(2*N) = [C = e^e] = O(C^N).
So, your algorithm complexity will be O(C^N), where C is a constant, and N is something that growth not faster than linear. So, basically - yes, it is exponetinal complexity.
My scenario is a perfectly balanced binary tree containing integers.
I've searched and found many explanations of best/worst case scenarios for binary trees. Best case is O(1) (target found in root), and worst is O(log(n)) (height of the tree).
I have found little to no information on calculating average complexity. The best answer I could find was O(log(n)) - 1, but I guess I don't quite understand (if correct) how this average case is calculated.
Also, would searching for an integer not in the tree yield the same complexity, I think it would, but any incite is appreciated.
Lets say we have a perfect balanced binaray tree containing n = 2k integers, so the depth is logβ(n) = k.
The best and worst case is, as you say, O(1) and O(log(n)).
Short way
Lets pick a random integer X (uniform distributed) from the binary tree. The last row the tree contains the same number of integers as the first k-1 rows together. With probability 1/2 X is in the frist k-1 rows, so we need at most O(k-1) = O(log(n)-1) steps to find it. And also with probability 1/2 X is in the last row, where we need O(k) = O(log(n)) steps.
In total we get
E[X] β€ P(row of X β€ k-1)β
O(log(n)-1) + P(row of X = k)β
O(log(n))
= 1/2β
O(log(n)-1) + 1/2β
O(log(n))
= 1/2β
O(log(n)-1) + 1/2β
O(log(n)-1)
= O(log(n)-1)
Notice: This is a little ugly but in O-notation O(x) and O(xΒ±c) is the same for any constant value c.
Long way
Now lets try to calculate the average case for a random (uniform distributed) integer X containd in the tree and lets name the set of integers on the i-th "row" of the tree Ti. Ti contains 2i Elements. T0 denotes the root.
The probability of picking an integer in the i-th row is P(X β Ti) = 2i/n = 2i-k.
To find an integer on row i it take O(2i) = O(i) steps.
So the expected number of steps is
E[X] = Ξ£i=0,...,k-1 O(i)β
2i-k.
To simplify this we use
O(i)β
2i-k + O(i+1)β
2i+1-k β€ O(i)β
2i+1-k + O(i+1)β
2i+1-k β€ O(i+1)β
2i+2-k
This leads us to
E[X] = Ξ£i=0,...,k-1 O(i)β
2i-k β€ O(k-1)β
2β°
Since k = log(n), we see that the average case is in O(log(n)-1) = O(log(n)).
Values not in the tree
If the value is not in the tree, you have to walk through the whole tree. After log(n) steps you have found a leaf. If the value equals your input, you have found what you seached for. If not, you know, that the value you searched for is not containd in the tree. So if you seach for a value that is not in the tree it will take O(log(n)).
Here is an algorithm for finding kth smallest number in n element array using partition algorithm of Quicksort.
small(a,i,j,k)
{
if(i==j) return(a[i]);
else
{
m=partition(a,i,j);
if(m==k) return(a[m]);
else
{
if(m>k) small(a,i,m-1,k);
else small(a,m+1,j,k);
}
}
}
Where i,j are starting and ending indices of array(j-i=n(no of elements in array)) and k is kth smallest no to be found.
I want to know what is the best case,and average case of above algorithm and how in brief. I know we should not calculate termination condition in best case and also partition algorithm takes O(n). I do not want asymptotic notation but exact mathematical result if possible.
First of all, I'm assuming the array is sorted - something you didn't mention - because that code wouldn't otherwise work. And, well, this looks to me like a regular binary search.
Anyway...
The best case scenario is when either the array is one element long (you return immediately because i == j), or, for large values of n, if the middle position, m, is the same as k; in that case, no recursive calls are made and it returns immediately as well. That makes it O(1) in best case.
For the general case, consider that T(n) denotes the time taken to solve a problem of size n using your algorithm. We know that:
T(1) = c
T(n) = T(n/2) + c
Where c is a constant time operation (for example, the time to compare if i is the same as j, etc.). The general idea is that to solve a problem of size n, we consume some constant time c (to decide if m == k, if m > k, to calculate m, etc.), and then we consume the time taken to solve a problem of half the size.
Expanding the recurrence can help you derive a general formula, although it is pretty intuitive that this is O(log(n)):
T(n) = T(n/2) + c = T(n/4) + c + c = T(n/8) + c + c + c = ... = T(1) + c*log(n) = c*(log(n) + 1)
That should be the exact mathematical result. The algorithm runs in O(log(n)) time. An average case analysis is harder because you need to know the conditions in which the algorithm will be used. What is the typical size of the array? The typical size of k? What is the mos likely position for k in the array? If it's in the middle, for example, the average case may be O(1). It really depends on how you use this.