Sorting algorithm proof and running-time - algorithm

Hydrosort is a sorting algorithm. Below is the pseudocode.
*/A is arrary to sort, i = start index, j = end index */
Hydrosort(A, i, j): // Let T(n) be the time to find where n = j-1+1
n = j – i + 1 O(1)
if (n < 10) { O(1)
sort A[i…j] by insertion-sort O(n^2) //insertion sort = O(n^2) worst-case
return O(1)
}
m1 = i + 3 * n / 4 O(1)
m2 = i + n / 4 O(1)
Hydrosort(A, i, m1) T(n/2)
Hydrosort(A, m2, j) T(n/2)
Hydrosort(A, i, m1) T(n/2)
T(n) = O(n^2) + 3T(n/2), so T(n) is O(n^2). I used the 3rd case of the Master Theorem to solve this recurrence.
I have 2 questions:
Have I calculated the worst-case running time here correctly?
how would I prove that Hydrosort(A, 1, n) correctly sorts an array A of n elements?

Have I calculated the worst-case running time here correctly?
I am afraid not.
The complexity function is:
T(n) = 3T(3n/4) + CONST
This is because:
You have three recursive calls for a problem of size 3n/4
The constant modifier here is O(1), since all non recursive calls operations are bounded to a constant size (Specifically, insertion sort for n<=10 is O(1)).
If you go on and calculate it, you'll get worse than O(n^2) complexity
how would I prove that Hydrosort(A, 1, n) correctly sorts an array A
of n elements?
By induction. Assume your algorithm works for problems of size n, and let's examine a problem of size n+1. For n+1<10 it is trivial, so we ignore this case.
After first recursive call, you are guaranteed that first 3/4 of the
array is sorted, and in particular you are guaranteed that the first n/4 of the elements are the smallest ones in this part. This means, they cannot be in the last n/4 of the array, because there are at least n/2 elements bigger than them. This means, the n/4 biggest elements are somewhere between m2 and j.
After second recursive call, since it is guaranteed to be invoked on the n/4 biggest elements, it will place these elements at the end of the array. This means the part between m1 and j is now sorted properly. This also means, 3n/4 smallest elements are somewhere in between i and m1.
Third recursive call sorts the 3n/4 elements properly, and the n/4 biggest elements are already in place, so the array is now sorted.

Related

time complexity of some recursive and none recursive algorithm

I have two pseudo-code algorithms:
RandomAlgorithm(modVec[0 to n − 1])
b = 0;
for i = 1 to n do
b = 2.b + modVec[n − i];
for i = 1 to b do
modVec[i mod n] = modVec[(i + 1) mod n];
return modVec;
Second:
AnotherRecursiveAlgo(multiplyVec[1 to n])
if n ≤ 2 do
return multiplyVec[1] × multiplyVec[1];
return
multiplyVec[1] × multiplyVec[n] +
AnotherRecursiveAlgo(multiplyVec[1 to n/3]) +
AnotherRecursiveAlgo(multiplyVec[2n/3 to n]);
I need to analyse the time complexity for these algorithms:
For the first algorithm i got the first loop is in O(n),the second loop has a best case and a worst case , best case is we have O(1) the loop runs once, the worst case is we have a big n on the first loop, but i don't know how to write this idea as a time complexity cause i usually get b=sum(from 1 to n-1) of 2^n-1 . modVec[n-1] and i get stuck here.
For the second loop i just don't get how to solve the time complexity of this one, we usually have it dependant on n , so we need the formula i think.
Thanks for the help.
The first problem is a little strange, all right.
If it helps, envision modVec as an array of 1's and 0's.
In this case, the first loop converts this array to a value.
This is O(n)
For instance, (1, 1, 0, 1, 1) will yield b = 27.
Your second loop runs b times. The dominating term for the value of b is 2^(n-1), a.k.a. O(2^n). The assignment you do inside the loop is O(1).
The second loop does depend on n. Your base case is a simple multiplication, O(1). The recursion step has three terms:
simple multiplication
recur on n/3 elements
recur on n/3 elements (from 2n/3 to the end is n/3 elements)
Just as your binary partitions result in log[2] complexities, this one will result in log[3]. The base doesn't matter; the coefficient (two recursive calls) doesn't' matter. 2*O(log3) is still O(log N).
Does that push you to a solution?
First Loop
To me this boils down to the O(First-For-Loop) + O(Second-For-Loop).
O(First-For-Loop) is simple = O(n).
O(Second-For-Loop) interestingly depends on n. Therefore, to me it's can be depicted as O(f(n)), where f(n) is some function of n. Not completely sure if I understand the f(n) based on the code presented.
The answer consequently becomes O(n) + O(f(n)). This could boil down to O(n) or O(f(n)) depending upon which one is larger and more dominant (since the lower order terms don't matter in the big-O notation.
Second Loop
In this case, I see that each call to the function invokes 3 additional calls...
The first call seems to be an O(1) call. So it won't matter.
The second and the third calls seems to recurses the function.
Therefore each function call is resulting in 2 additional recursions.
Consequently , the time complexity on this would be O(2^n).

Worst case time complexity for this stupid sort?

The code looks like:
for (int i = 1; i < N; i++) {
if (a[i] < a[i-1]) {
swap(i, i-1);
i = 0;
}
}
After trying out a few things i figure the worst case is when the input array is in descending order. Then looks like the compares will be maximum and hence we will consider only compares. Then it seems it would be a sum of sums, i.e sum of ... {1+2+3+...+(n-1)}+{1+2+3+...+(n-2)}+{1+2+3+...+(n-3)}+ .... + 1 if so what would be O(n) ?
If I am not on the right path can someone point out what O(n) would be and how can it be derived? cheers!
For starters, the summation
(1 + 2 + 3 + ... + n) + (1 + 2 + 3 + ... + n - 1) + ... + 1
is not actually O(n). Instead, it's O(n3). You can see this because the sum 1 + 2 + ... + n = O(n2, and there are n copies of each of them. You can more properly show that this summation is Θ(n3) by looking at the first n / 2 of these terms. Each of those terms is at least 1 + 2 + 3 + ... + n / 2 = Θ(n2), so there are n / 2 copies of something that's Θ(n2), giving a tight bound of Θ(n3).
We can upper-bound the total runtime of this algorithm at O(n3) by noting that every swap decreases the number of inversions in the array by one (an inversion is a pair of elements out of place). There can be at most O(n2) inversions in an array and a sorted array has no inversions in it (do you see why?), so there are at most O(n2) passes over the array and each takes at most O(n) work. That collectively gives a bound of O(n3).
Therefore, the Θ(n3) worst-case runtime you've identified is asymptotically tight, so the algorithm runs in time O(n3) and has worst-case runtime Θ(n3).
Hope this helps!
It does one iteration of the list per swap. The maximum number of swaps necessary is O(n * n) for a reversed list. Doing each iteration is O(n).
Therefore the algorithm is O(n * n * n).
This is one half of the infamous Bubble Sort, which has a O(N^2). This partial sort has O(N) because the For loop goes from 1 to N. After one iteration, you will end up with the largest element at the end of the list and the rest of the list in some changed order. To be a proper Bubble Sort, it needs another loop inside this one to iterate j from 1 to N-i and do the same thing. The If goes inside the inner loop.
Now you have two loops, one inside the other, and they both go from 1 to N (sort of). You will have N * N or N^2 iterations. Thus O(N^2) for the Bubble Sort.
Now you have take your next step as a programmer: Finish writing the Bubble Sort and make it work correctly. Try it with different lengths of list a and see how long it takes. Then never use it again. ;-)

Running time of merge sort on linked lists?

i came across this piece of code to perform merge sort on a link list..
the author claims that it runs in time O(nlog n)..
here is the link for it...
http://www.geeksforgeeks.org/merge-sort-for-linked-list/
my claim is that it takes atleast O(n^2) time...and here is my argument...
look, you divide the list(be it array or linked list), log n times(refer to recursion tree), during each partition, given a list of size i=n, n/2, ..., n/2^k, we would take O(i) time to partition the original/already divided list..since sigma O(i)= O(n),we can say , we take O(n) time to partition for any given call of partition(sloppily), so given the time taken to perform a single partition, the question now arises as to how many partitions are going to happen all in all, we observe that the number of partitions at each level i is equal to 2^i , so summing 2^0+2^1+....+2^(lg n ) gives us [2(lg n)-1] as the sum which is nothing but (n-1) on simplification , implying that we call partition n-1, (let's approximate it to n), times so , the complexity is atleast big omega of n^2..
if i am wrong, please let me know where...thanks:)
and then after some retrospection , i applied master method to the recurrence relation where i replaced theta of 1 which is there for the conventional merge sort on arrays with theta of n for this type of merge sort (because the divide and combine operations take theta of n time each), the running time turned out to be theta of (n lg n)...
also i noticed that the cost at each level is n (because 2 power i * (n/(2pow i)))...is the time taken for each level...so its theta of n at each level* lg n levels..implying that its theta of (n lg n).....did i just solve my own question??pls help i am kinda confused myself..
The recursive complexity definition for an input list of size n is
T(n) = O(n) + 2 * T(n / 2)
Expanding this we get:
T(n) = O(n) + 2 * (O(n / 2) + 2 * T(n / 4))
= O(n) + O(n) + 4 * T(n / 4)
Expanding again we get:
T(n) = O(n) + O(n) + O(n) + 8 * T(n / 8)
Clearly there is a pattern here. Since we can repeat this expansion exactly O(log n) times, we have
T(n) = O(n) + O(n) + ... + O(n) (O(log n) terms)
= O(n log n)
You are performing a sum twice for some weird reason.
To split and merge a linked list of size n, it takes O(n) time. The depth of recursion is O(log n)
Your argument was that a splitting step takes O(i) time and sum of split steps become O(n) And then you call it the time taken to perform only one split.
Instead, lets consider this, a problem of size n forms two n/2 problems, four n/4 problems eight n/8 and so on until 2^log n n/2^logn subproblems are formed. Sum these up you get O(nlogn) to perform splits.
Another O(nlogn) to combine sub problems.

O(n) - the next permutation lexicographically

i'm just wondering what is efficiency (O(n)) of this algorithm:
Find the largest index k such that a[k] < a[k + 1]. If no such index exists, the permutation is the last permutation.
Find the largest index l such that a[k] < a[l]. Since k + 1 is such an index, l is well defined and satisfies k < l.
Swap a[k] with a[l].
Reverse the sequence from a[k + 1] up to and including the final element a[n].
As I understand the worst case O(n) = n (when k is the first element of previous permutation), best case O(n) = 1 (when k is last element of previous permutation).
Can I say that O(n) = n/2 ?
O(n) = n/2 makes no sense. Let f(n) = n be the running time of your algorithm. Then the right way to say it is that f(n) is in O(n). O(n) is a set of functions that are at most asymptotically linear in n.
Your optimization makes the expected running time g(n) = n/2. g(n) is also in O(n). In fact O(n) = O(n/2) so your saving of half of the time does not change the asymptotic complexity.
All steps in the algorithm takes O(n) asymptotically.
Your averaging is incorrect. Just because best case is O(1) and worst case is O(n), you can't say the algorithm takes O(n)=n/2. Big O notation is simply for the upper bound of the algorithm.
So the algorithm is still O(n) irrespective of the best case scenario.
There is no such thing as O(n) = n/2.
When you do O(n) calculations you're just trying to find the functional dependency, you don't care about coefficients. So there's no O(n)= n/2 just like there's no O(n) = 5n
Asymptotically, O(n) is the same as O(n/2). In any case, the algorithm is performed for each of the n! permutations, so the order is much greater than your estimate (on the order of n!).

Average Runtime of Quickselect

Wikipedia states that the average runtime of quickselect algorithm (Link) is O(n). However, I could not clearly understand how this is so. Could anyone explain to me (via recurrence relation + master method usage) as to how the average runtime is O(n)?
Because
we already know which partition our desired element lies in.
We do not need to sort (by doing partition on) all the elements, but only do operation on the partition we need.
As in quick sort, we have to do partition in halves *, and then in halves of a half, but this time, we only need to do the next round partition in one single partition (half) of the two where the element is expected to lie in.
It is like (not very accurate)
n + 1/2 n + 1/4 n + 1/8 n + ..... < 2 n
So it is O(n).
Half is used for convenience, the actual partition is not exact 50%.
To do an average case analysis of quickselect one has to consider how likely it is that two elements are compared during the algorithm for every pair of elements and assuming a random pivoting. From this we can derive the average number of comparisons. Unfortunately the analysis I will show requires some longer calculations but it is a clean average case analysis as opposed to the current answers.
Let's assume the field we want to select the k-th smallest element from is a random permutation of [1,...,n]. The pivot elements we choose during the course of the algorithm can also be seen as a given random permutation. During the algorithm we then always pick the next feasible pivot from this permutation therefore they are chosen uniform at random as every element has the same probability of occurring as the next feasible element in the random permutation.
There is one simple, yet very important, observation: We only compare two elements i and j (with i<j) if and only if one of them is chosen as first pivot element from the range [min(k,i), max(k,j)]. If another element from this range is chosen first then they will never be compared because we continue searching in a sub-field where at least one of the elements i,j is not contained in.
Because of the above observation and the fact that the pivots are chosen uniform at random the probability of a comparison between i and j is:
2/(max(k,j) - min(k,i) + 1)
(Two events out of max(k,j) - min(k,i) + 1 possibilities.)
We split the analysis in three parts:
max(k,j) = k, therefore i < j <= k
min(k,i) = k, therefore k <= i < j
min(k,i) = i and max(k,j) = j, therefore i < k < j
In the third case the less-equal signs are omitted because we already consider those cases in the first two cases.
Now let's get our hands a little dirty on calculations. We just sum up all the probabilities as this gives the expected number of comparisons.
Case 1
Case 2
Similar to case 1 so this remains as an exercise. ;)
Case 3
We use H_r for the r-th harmonic number which grows approximately like ln(r).
Conclusion
All three cases need a linear number of expected comparisons. This shows that quickselect indeed has an expected runtime in O(n). Note that - as already mentioned - the worst case is in O(n^2).
Note: The idea of this proof is not mine. I think that's roughly the standard average case analysis of quickselect.
If there are any errors please let me know.
In quickselect, as specified, we apply recursion on only one half of the partition.
Average Case Analysis:
First Step: T(n) = cn + T(n/2)
where, cn = time to perform partition, where c is any constant(doesn't matter). T(n/2) = applying recursion on one half of the partition.Since it's an average case we assume that the partition was the median.
As we keep on doing recursion, we get the following set of equation:
T(n/2) = cn/2 + T(n/4) T(n/4) = cn/2 + T(n/8) .. . T(2) = c.2 + T(1) T(1) = c.1 + ...
Summing the equations and cross-cancelling like values produces a linear result.
c(n + n/2 + n/4 + ... + 2 + 1) = c(2n) //sum of a GP
Hence, it's O(n)
I also felt very conflicted at first when I read that the average time complexity of quickselect is O(n) while we break the list in half each time (like binary search or quicksort). It turns out that breaking the search space in half each time doesn't guarantee an O(log n) or O(n log n) runtime. What makes quicksort O(n log n) and quickselect is O(n) is that we always need to explore all branches of the recursive tree for quicksort and only a single branch for quickselect. Let's compare the time complexity recurrence relations of quicksort and quickselect to prove my point.
Quicksort:
T(n) = n + 2T(n/2)
= n + 2(n/2 + 2T(n/4))
= n + 2(n/2) + 4T(n/4)
= n + 2(n/2) + 4(n/4) + ... + n(n/n)
= 2^0(n/2^0) + 2^1(n/2^1) + ... + 2^log2(n)(n/2^log2(n))
= n (log2(n) + 1) (since we are adding n to itself log2 + 1 times)
Quickselect:
T(n) = n + T(n/2)
= n + n/2 + T(n/4)
= n + n/2 + n/4 + ... n/n
= n(1 + 1/2 + 1/4 + ... + 1/2^log2(n))
= n (1/(1-(1/2))) = 2n (by geometric series)
I hope this convinces you why the average runtime of quickselect is O(n).

Resources