What if we split the array in merge sort into 4 parts?? or eight parts? - algorithm

I came across this question in one of the slides of Stanford, that what would be the effect on the complexity of the code of merge sort if we split the array into 4 or 8 instead of 2.

It would be the same: O(n log n). You will have a shorter tree and the base of the logarithm will change, but that doesn't matter for big-oh, because a logarithm in a base a differs from a logarithm in base b by a constant:
log_a(x) = log_b(x) / log_b(a)
1 / log_b(a) = constant
And big-oh ignores constants.
You will still have to do O(n) work per tree level in order to merge the 4 or 8 or however many parts, which, combined with more recursive calls, might just make the whole thing even slower in practice.

In general, you can split your array into equal size subarrays of any size and then sort the subarrays recursively, and then use a min-heap to keep extracting the next smallest element from the collection of sorted subarrays. If the number of subarrays you break into is constant, then the execution time for each min-heap per operation is constant, so you arrive at the same O(n log n) time.

Intuitively it would be the same as there is no much difference between splitting the array into two parts and then doing it again or splitting it to 4 parts from the beginning.
A more official proof by induction based on this (I'll assume that the array is split into k):
Definitions:
Let T(N) - number of array stores to mergesort of input of size N
Then mergesort recurrence T(N) = k*T(N/k) + N (for N > 1, T(1) = 0)
Claim:
If T(N) satisfies the recurrence above then T(N) = Nlg(N)
Note - all the logarithms below are on base k
Proof:
Base case: N=1
Inductive hypothesis: T(N) = NlgN
Goal: show that T(kN) = kN(lg(kN))
T(kN) = kT(N) + kN [mergesort recurrence]
= kNlgN + kN [inductive hypothesis]
= kN(lg[(kN/k)] [algebra]
= kN(lg(kN) - lgk) [algebra]
= kN(lg(kN) - 1) + kN [algebra - for base k, lg(k )= 1]
= kNlg(kN) [QED]

Related

Big Oh Complexity of Merge Sort

I had a lecture on Big Oh for Merge Sort and I'm confused.
What was shown is:
0 Merges [<----- n -------->] = n
1 Merge [<--n/2--][-n/2--->] = (n/2 + n/2) = n
2 Merges [n/4][n/4][n/4][n/4] = 2(n/4 + n/4) = n
....
log(n) merges = n
Total = (n + n + n + ... + n) = lg n
= O(n log n)
I don't understand why (n + n + ... + n) can also be expressed as log base 2 of n and how they got for 2 merges = 2(n/4 + n/4)
In the case of 1 merge, you have two sub arrays to be sorted where each sub-array will take time proportional to n/2 to be sorted. In that sense, to sort those two sub-arrays you need a time proportional to n.
Similarly, when you are doing 2 merges, there are 4 sub arrays to be sorted where each will be taking a time proportional to n/4 which will again sum up to n.
Similarly, if you have n merges, it will take a time proportional to n to sort all the sub-arrays. In that sense, we can write the time taken by merge sort as follows.
T(n) = 2 * T(n/2) + n
You will understand that this recursive call can go deep (say to a height of h) until n/(2^h) = 1. By taking log here, we get h=log(n). That is how log(n) came to the scene. Here log is taken from base 2.
Since you have log(n) steps where each step takes a time proportional to n, total time taken can be expressed as,
n * log(n)
In big O notations, we give this as an upper bound O(nlog(n)). Hope you got the idea.
Following image of the recursion tree will enlighten you further.
The last line of the following part written in your question,
0 Merges [<----- n -------->] = n
1 Merge [<--n/2--][-n/2--->] = (n/2 + n/2) = n
2 Merges [n/4][n/4][n/4][n/4] = 2(n/4 + n/4) = n
....
n merges = n --This line is incorrect!
is wrong. You will not have total n merges of size n, but Log n merges of size n.
At every level, you divide the problem size into 2 problems of half the size. As you continue diving, the total divisions that you can do is Log n. (How? Let's say total divisions possible is x. Then n = 2x or x = Log2n.)
Since at each level you do a total work of O(n), therefore for Log n levels, the sum total of all work done will be O(n Log n).
You've got a deep of log(n) and a width of n for your tree. :)
The log portion is the result of "how many times can I split my data in two before I have only one element left?" This is the depth of your recursion tree. The multiple of n comes from the fact that for each of those levels in the tree you'll look at every element in your data set once after all merge steps at that level.
recurse downwards:
n unsorted elements
[n/2][n/2] split until singletons...
...
merge n elements at each step when recursing back up
[][][]...[][][]
[ ] ... [ ]
...
[n/2][n/2]
n sorted elements
It's very simple. Each merge takes O(n) as you demonstrated. The number of merges you need to do is log n (base 2), because each merge doubles the size of the sorted sections.

Why is the Fibonacci Sequence Big O(2^n) instead of O(logn)?

I took discrete math (in which I learned about master theorem, Big Theta/Omega/O) a while ago and I seem to have forgotten the difference between O(logn) and O(2^n) (not in the theoretical sense of Big Oh). I generally understand that algorithms like merge and quick sort are O(nlogn) because they repeatedly divide the initial input array into sub arrays until each sub array is of size 1 before recursing back up the tree, giving a recursion tree that is of height logn + 1. But if you calculate the height of a recursive tree using n/b^x = 1 (when the size of the subproblem has become 1 as was stated in an answer here) it seems that you always get that the height of the tree is log(n).
If you solve the Fibonacci sequence using recursion, I would think that you would also get a tree of size logn, but for some reason, the Big O of the algorithm is O(2^n). I was thinking that maybe the difference is because you have to remember all of the fib numbers for each subproblem to get the actual fib number meaning that the value at each node has to be recalled, but it seems that in merge sort, the value of each node has to be used (or at least sorted) as well. This is unlike binary search, however, where you only visit certain nodes based on comparisons made at each level of the tree so I think this is where the confusion is coming from.
So specifically, what causes the Fibonacci sequence to have a different time complexity than algorithms like merge/quick sort?
The other answers are correct, but don't make it clear - where does the large difference between the Fibonacci algorithm and divide-and-conquer algorithms come from? Indeed, the shape of the recursion tree for both classes of functions is the same - it's a binary tree.
The trick to understand is actually very simple: consider the size of the recursion tree as a function of the input size n.
In the Fibonacci recursion, the input size n is the height of the tree; for sorting, the input size n is the width of the tree. In the former case, the size of the tree (i.e. the complexity) is an exponent of the input size, in the latter: it is input size multiplied by the height of the tree, which is usually just a logarithm of the input size.
More formally, start by these facts about binary trees:
The number of leaves n is a binary tree is equal to the the number of non-leaf nodes plus one. The size of a binary tree is therefore 2n-1.
In a perfect binary tree, all non-leaf nodes have two children.
The height h for a perfect binary tree with n leaves is equal to log(n), for a random binary tree: h = O(log(n)), and for a degenerate binary tree h = n-1.
Intuitively:
For sorting an array of n elements with a recursive algorithm, the recursion tree has n leaves. It follows that the width of the tree is n, the height of the tree is O(log(n)) on the average and O(n) in the worst case.
For calculating a Fibonacci sequence element k with the recursive algorithm, the recursion tree has k levels (to see why, consider that fib(k) calls fib(k-1), which calls fib(k-2), and so on). It follows that height of the tree is k. To estimate a lower-bound on the width and number of nodes in the recursion tree, consider that since fib(k) also calls fib(k-2), therefore there is a perfect binary tree of height k/2 as part of the recursion tree. If extracted, that perfect subtree would have 2k/2 leaf nodes. So the width of the recursion tree is at least O(2^{k/2}) or, equivalently, 2^O(k).
The crucial difference is that:
for divide-and-conquer algorithms, the input size is the width of the binary tree.
for the Fibonnaci algorithm, the input size is it the height of the tree.
Therefore the number of nodes in the tree is O(n) in the first case, but 2^O(n) in the second. The Fibonacci tree is much larger compared to the input size.
You mention Master theorem; however, the theorem cannot be applied to analyze the complexity of Fibonacci because it only applies to algorithms where the input is actually divided at each level of recursion. Fibonacci does not divide the input; in fact, the functions at level i produce almost twice as much input for the next level i+1.
To address the core of the question, that is "why Fibonacci and not Mergesort", you should focus on this crucial difference:
The tree you get from Mergesort has N elements for each level, and there are log(n) levels.
The tree you get from Fibonacci has N levels because of the presence of F(n-1) in the formula for F(N), and the number of elements for each level can vary greatly: it can be very low (near the root, or near the lowest leaves) or very high. This, of course, is because of repeated computation of the same values.
To see what I mean by "repeated computation", look at the tree for the computation of F(6):
Fibonacci tree picture from: http://composingprograms.com/pages/28-efficiency.html
How many times do you see F(3) being computed?
Consider the following implementation
int fib(int n)
{
if(n < 2)
return n;
return fib(n-1) + fib(n-2)
}
Let's denote T(n) the number of operations that fib performs to calculate fib(n). Because fib(n) is calling fib(n-1) and fib(n-2), it means that T(n) is at least T(n-1) + T(n-2). This in turn means that T(n) > fib(n). There is a direct formula of fib(n) which is some constant to the power of n. Therefore T(n) is at least exponential. QED.
To my understanding, the mistake in your reasoning is that using a recursive implementation to evaluate f(n) where f denotes the Fibonacci sequence, the input size is reduced by a factor of 2 (or some other factor), which is not the case. Each call (except for the 'base cases' 0 and 1) uses exactly 2 recursive calls, as there is no possibility to re-use previously calculated values. In the light of the presentation of the master theorem on Wikipedia, the recurrence
f(n) = f (n-1) + f(n-2)
is a case for which the master theorem cannot be applied.
With the recursive algo, you have approximately 2^N operations (additions) for fibonacci (N). Then it is O(2^N).
With a cache (memoization), you have approximately N operations, then it is O(N).
Algorithms with complexity O(N log N) are often a conjunction of iterate over every item (O(N)) , split recurse, and merge ... Split by 2 => you do log N recursions.
Merge sort time complexity is O(n log(n)). Quick sort best case is O(n log(n)), worst case O(n^2).
The other answers explain why naive recursive Fibonacci is O(2^n).
In case you read that Fibonacci(n) can be O(log(n)), this is possible if calculated using iteration and repeated squaring either using matrix method or lucas sequence method. Example code for lucas sequence method (note that n is divided by 2 on each loop):
/* lucas sequence method */
int fib(int n) {
int a, b, p, q, qq, aq;
a = q = 1;
b = p = 0;
while(1) {
if(n & 1) {
aq = a*q;
a = b*q + aq + a*p;
b = b*p + aq;
}
n /= 2;
if(n == 0)
break;
qq = q*q;
q = 2*p*q + qq;
p = p*p + qq;
}
return b;
}
As opposed to answers master theorem can be applied. But master theorem for decreasing functions needs to be applied instead of master theorem for dividing functions. Without theorem with following recurrence relation with substitution it can be solved,
f(n) = f(n-1) + f(n-2)
f(n) = 2*f(n-1) + c
let assume c is equal 1 since it is constant and doesn't affect the complexity
f(n) = 2*f(n-1) + 1
and substitute this function k times
f(n) = 2*[2*f(n-2) +1 ] + 1
f(n) = 2^2*f(n-2) + 2 + 1
f(n) = 2^2*[2*f(n-3) + 1] +2 + 1
f(n) = 2^3*f(n-3) + 4 + 2 + 1
.
.
.
f(n) = 2^k*f(n-k) + 2^k-1 + 2^k-2 + ... + 4 + 2 + 1
now let's assume n=k
f(n) = 2^n*f(0) + 2^n-1 + 2^n-2 + ... + 4 + 2 + 1
f(n) = 2^n+1 thus complexity is O(2^n)
Check this video for master theorem for decreasing functions.

Logarithmic function in time complexity

How does a program's worst case or average case dependent on log function? How does the base of log come in play?
The log factor appears when you split your problem to k parts, of size n/k each and then "recurse" (or mimic recursion) on some of them.
A simple example is the following loop:
foo(n):
while n > 0:
n = n/2
print n
The above will print n, n/2, n/4, .... , 1 - and there are O(logn) such values.
the complexity of the above program is O(logn), since each printing requires constant amount of time, and number of values n will get along the way is O(logn)
If you are looking for "real life" examples, in quicksort (and for simplicity let's assume splitting to exactly two halves), you split the array of size n to two subarrays of size n/2, and then you recurse on both of them - and invoke the algorithm on each half.
This makes the complexity function of:
T(n) = 2T(n/2) + O(n)
From master theorem, this is in Theta(nlogn).
Similarly, on binary search - you split the problem to two parts, and recurse only on one of them:
T(n) = T(n/2) + 1
Which will be in Theta(logn)
The base is not a factor in big O complexity, because
log_k(n) = log_2(n)/log_2(k)
and log_2(k) is constant, for any constant k.

Prove 3-Way Quicksort Big-O Bound

For 3-way Quicksort (dual-pivot quicksort), how would I go about finding the Big-O bound? Could anyone show me how to derive it?
There's a subtle difference between finding the complexity of an algorithm and proving it.
To find the complexity of this algorithm, you can do as amit said in the other answer: you know that in average, you split your problem of size n into three smaller problems of size n/3, so you will get, in รจ log_3(n)` steps in average, to problems of size 1. With experience, you will start getting the feeling of this approach and be able to deduce the complexity of algorithms just by thinking about them in terms of subproblems involved.
To prove that this algorithm runs in O(nlogn) in the average case, you use the Master Theorem. To use it, you have to write the recursion formula giving the time spent sorting your array. As we said, sorting an array of size n can be decomposed into sorting three arrays of sizes n/3 plus the time spent building them. This can be written as follows:
T(n) = 3T(n/3) + f(n)
Where T(n) is a function giving the resolution "time" for an input of size n (actually the number of elementary operations needed), and f(n) gives the "time" needed to split the problem into subproblems.
For 3-Way quicksort, f(n) = c*n because you go through the array, check where to place each item and eventually make a swap. This places us in Case 2 of the Master Theorem, which states that if f(n) = O(n^(log_b(a)) log^k(n)) for some k >= 0 (in our case k = 0) then
T(n) = O(n^(log_b(a)) log^(k+1)(n)))
As a = 3 and b = 3 (we get these from the recurrence relation, T(n) = aT(n/b)), this simplifies to
T(n) = O(n log n)
And that's a proof.
Well, the same prove actually holds.
Each iteration splits the array into 3 sublists, on average the size of these sublists is n/3 each.
Thus - number of iterations needed is log_3(n) because you need to find number of times you do (((n/3) /3) /3) ... until you get to one. This gives you the formula:
n/(3^i) = 1
Which is satisfied for i = log_3(n).
Each iteration is still going over all the input (but in a different sublist) - same as quicksort, which gives you O(n*log_3(n)).
Since log_3(n) = log(n)/log(3) = log(n) * CONSTANT, you get that the run time is O(nlogn) on average.
Note, even if you take a more pessimistic approach to calculate the size of the sublists, by taking minimum of uniform distribution - it will still get you first sublist of size 1/4, 2nd sublist of size 1/2, and last sublist of size 1/4 (minimum and maximum of uniform distribution), which will again decay to log_k(n) iterations (with a different k>2) - which will yield O(nlogn) overall - again.
Formally, the proof will be something like:
Each iteration takes at most c_1* n ops to run, for each n>N_1, for some constants c_1,N_1. (Definition of big O notation, and the claim that each iteration is O(n) excluding recursion. Convince yourself why this is true. Note that in here - "iteration" means all iterations done by the algorithm in a certain "level", and not in a single recursive invokation).
As seen above, you have log_3(n) = log(n)/log(3) iterations on average case (taking the optimistic version here, same principles for pessimistic can be used)
Now, we get that the running time T(n) of the algorithm is:
for each n > N_1:
T(n) <= c_1 * n * log(n)/log(3)
T(n) <= c_1 * nlogn
By definition of big O notation, it means T(n) is in O(nlogn) with M = c_1 and N = N_1.
QED

Average Runtime of Quickselect

Wikipedia states that the average runtime of quickselect algorithm (Link) is O(n). However, I could not clearly understand how this is so. Could anyone explain to me (via recurrence relation + master method usage) as to how the average runtime is O(n)?
Because
we already know which partition our desired element lies in.
We do not need to sort (by doing partition on) all the elements, but only do operation on the partition we need.
As in quick sort, we have to do partition in halves *, and then in halves of a half, but this time, we only need to do the next round partition in one single partition (half) of the two where the element is expected to lie in.
It is like (not very accurate)
n + 1/2 n + 1/4 n + 1/8 n + ..... < 2 n
So it is O(n).
Half is used for convenience, the actual partition is not exact 50%.
To do an average case analysis of quickselect one has to consider how likely it is that two elements are compared during the algorithm for every pair of elements and assuming a random pivoting. From this we can derive the average number of comparisons. Unfortunately the analysis I will show requires some longer calculations but it is a clean average case analysis as opposed to the current answers.
Let's assume the field we want to select the k-th smallest element from is a random permutation of [1,...,n]. The pivot elements we choose during the course of the algorithm can also be seen as a given random permutation. During the algorithm we then always pick the next feasible pivot from this permutation therefore they are chosen uniform at random as every element has the same probability of occurring as the next feasible element in the random permutation.
There is one simple, yet very important, observation: We only compare two elements i and j (with i<j) if and only if one of them is chosen as first pivot element from the range [min(k,i), max(k,j)]. If another element from this range is chosen first then they will never be compared because we continue searching in a sub-field where at least one of the elements i,j is not contained in.
Because of the above observation and the fact that the pivots are chosen uniform at random the probability of a comparison between i and j is:
2/(max(k,j) - min(k,i) + 1)
(Two events out of max(k,j) - min(k,i) + 1 possibilities.)
We split the analysis in three parts:
max(k,j) = k, therefore i < j <= k
min(k,i) = k, therefore k <= i < j
min(k,i) = i and max(k,j) = j, therefore i < k < j
In the third case the less-equal signs are omitted because we already consider those cases in the first two cases.
Now let's get our hands a little dirty on calculations. We just sum up all the probabilities as this gives the expected number of comparisons.
Case 1
Case 2
Similar to case 1 so this remains as an exercise. ;)
Case 3
We use H_r for the r-th harmonic number which grows approximately like ln(r).
Conclusion
All three cases need a linear number of expected comparisons. This shows that quickselect indeed has an expected runtime in O(n). Note that - as already mentioned - the worst case is in O(n^2).
Note: The idea of this proof is not mine. I think that's roughly the standard average case analysis of quickselect.
If there are any errors please let me know.
In quickselect, as specified, we apply recursion on only one half of the partition.
Average Case Analysis:
First Step: T(n) = cn + T(n/2)
where, cn = time to perform partition, where c is any constant(doesn't matter). T(n/2) = applying recursion on one half of the partition.Since it's an average case we assume that the partition was the median.
As we keep on doing recursion, we get the following set of equation:
T(n/2) = cn/2 + T(n/4) T(n/4) = cn/2 + T(n/8) .. . T(2) = c.2 + T(1) T(1) = c.1 + ...
Summing the equations and cross-cancelling like values produces a linear result.
c(n + n/2 + n/4 + ... + 2 + 1) = c(2n) //sum of a GP
Hence, it's O(n)
I also felt very conflicted at first when I read that the average time complexity of quickselect is O(n) while we break the list in half each time (like binary search or quicksort). It turns out that breaking the search space in half each time doesn't guarantee an O(log n) or O(n log n) runtime. What makes quicksort O(n log n) and quickselect is O(n) is that we always need to explore all branches of the recursive tree for quicksort and only a single branch for quickselect. Let's compare the time complexity recurrence relations of quicksort and quickselect to prove my point.
Quicksort:
T(n) = n + 2T(n/2)
= n + 2(n/2 + 2T(n/4))
= n + 2(n/2) + 4T(n/4)
= n + 2(n/2) + 4(n/4) + ... + n(n/n)
= 2^0(n/2^0) + 2^1(n/2^1) + ... + 2^log2(n)(n/2^log2(n))
= n (log2(n) + 1) (since we are adding n to itself log2 + 1 times)
Quickselect:
T(n) = n + T(n/2)
= n + n/2 + T(n/4)
= n + n/2 + n/4 + ... n/n
= n(1 + 1/2 + 1/4 + ... + 1/2^log2(n))
= n (1/(1-(1/2))) = 2n (by geometric series)
I hope this convinces you why the average runtime of quickselect is O(n).

Resources