Worst case time complexity for this stupid sort? - algorithm

The code looks like:
for (int i = 1; i < N; i++) {
if (a[i] < a[i-1]) {
swap(i, i-1);
i = 0;
}
}
After trying out a few things i figure the worst case is when the input array is in descending order. Then looks like the compares will be maximum and hence we will consider only compares. Then it seems it would be a sum of sums, i.e sum of ... {1+2+3+...+(n-1)}+{1+2+3+...+(n-2)}+{1+2+3+...+(n-3)}+ .... + 1 if so what would be O(n) ?
If I am not on the right path can someone point out what O(n) would be and how can it be derived? cheers!

For starters, the summation
(1 + 2 + 3 + ... + n) + (1 + 2 + 3 + ... + n - 1) + ... + 1
is not actually O(n). Instead, it's O(n3). You can see this because the sum 1 + 2 + ... + n = O(n2, and there are n copies of each of them. You can more properly show that this summation is Θ(n3) by looking at the first n / 2 of these terms. Each of those terms is at least 1 + 2 + 3 + ... + n / 2 = Θ(n2), so there are n / 2 copies of something that's Θ(n2), giving a tight bound of Θ(n3).
We can upper-bound the total runtime of this algorithm at O(n3) by noting that every swap decreases the number of inversions in the array by one (an inversion is a pair of elements out of place). There can be at most O(n2) inversions in an array and a sorted array has no inversions in it (do you see why?), so there are at most O(n2) passes over the array and each takes at most O(n) work. That collectively gives a bound of O(n3).
Therefore, the Θ(n3) worst-case runtime you've identified is asymptotically tight, so the algorithm runs in time O(n3) and has worst-case runtime Θ(n3).
Hope this helps!

It does one iteration of the list per swap. The maximum number of swaps necessary is O(n * n) for a reversed list. Doing each iteration is O(n).
Therefore the algorithm is O(n * n * n).

This is one half of the infamous Bubble Sort, which has a O(N^2). This partial sort has O(N) because the For loop goes from 1 to N. After one iteration, you will end up with the largest element at the end of the list and the rest of the list in some changed order. To be a proper Bubble Sort, it needs another loop inside this one to iterate j from 1 to N-i and do the same thing. The If goes inside the inner loop.
Now you have two loops, one inside the other, and they both go from 1 to N (sort of). You will have N * N or N^2 iterations. Thus O(N^2) for the Bubble Sort.
Now you have take your next step as a programmer: Finish writing the Bubble Sort and make it work correctly. Try it with different lengths of list a and see how long it takes. Then never use it again. ;-)

Related

Sorting algorithm proof and running-time

Hydrosort is a sorting algorithm. Below is the pseudocode.
*/A is arrary to sort, i = start index, j = end index */
Hydrosort(A, i, j): // Let T(n) be the time to find where n = j-1+1
n = j – i + 1 O(1)
if (n < 10) { O(1)
sort A[i…j] by insertion-sort O(n^2) //insertion sort = O(n^2) worst-case
return O(1)
}
m1 = i + 3 * n / 4 O(1)
m2 = i + n / 4 O(1)
Hydrosort(A, i, m1) T(n/2)
Hydrosort(A, m2, j) T(n/2)
Hydrosort(A, i, m1) T(n/2)
T(n) = O(n^2) + 3T(n/2), so T(n) is O(n^2). I used the 3rd case of the Master Theorem to solve this recurrence.
I have 2 questions:
Have I calculated the worst-case running time here correctly?
how would I prove that Hydrosort(A, 1, n) correctly sorts an array A of n elements?
Have I calculated the worst-case running time here correctly?
I am afraid not.
The complexity function is:
T(n) = 3T(3n/4) + CONST
This is because:
You have three recursive calls for a problem of size 3n/4
The constant modifier here is O(1), since all non recursive calls operations are bounded to a constant size (Specifically, insertion sort for n<=10 is O(1)).
If you go on and calculate it, you'll get worse than O(n^2) complexity
how would I prove that Hydrosort(A, 1, n) correctly sorts an array A
of n elements?
By induction. Assume your algorithm works for problems of size n, and let's examine a problem of size n+1. For n+1<10 it is trivial, so we ignore this case.
After first recursive call, you are guaranteed that first 3/4 of the
array is sorted, and in particular you are guaranteed that the first n/4 of the elements are the smallest ones in this part. This means, they cannot be in the last n/4 of the array, because there are at least n/2 elements bigger than them. This means, the n/4 biggest elements are somewhere between m2 and j.
After second recursive call, since it is guaranteed to be invoked on the n/4 biggest elements, it will place these elements at the end of the array. This means the part between m1 and j is now sorted properly. This also means, 3n/4 smallest elements are somewhere in between i and m1.
Third recursive call sorts the 3n/4 elements properly, and the n/4 biggest elements are already in place, so the array is now sorted.

Complexity Analysis of the following loops

I have some exercises of complexity analysis of double loops, and I don't know if I'm doing them correctly.
for i = 1 to n do
j = i
while j < n do
j = 2∗j
end while
end for
My answer on this is O(n^2), because the first loop is running O(n) times and the inner one is doing O(n/2) iterations for the "worst" iteration of the outer loop. So O(n) * O(n/2) = O(n^2).
Also looking a bit further, I think I can say that the inner loops is doing a partial sum that is O(n/2) + O(n-1) + ... + O(1), and this is also O(n)
for i = 1 to n do
j = n
while i∗i < j do
j = j − 1
end while
end for
Again the outer loop is O(n), and the inner loop is doing O(sqrt(n)) in the worst iteration, so here I think it's O(n*sqrt(n)) but I'm unsure about this one.
for i = 1 to n do
j = 2
while j < i do
j = j ∗j
end while
end for
Here the outer loop is O(n) and the inner loop is doing O(logn) work for the worst case. Hence I think this is O(nlogn)
i = 2
while (i∗i < n) and (n mod i != 0) do
i = i + 1
end while
Finally, I don't know how to make sense of this one. Because of the modulus operator.
My questions are:
Did I do anything wrong in the first 3 examples?
Is the "worst-case approach" for the inner loops I'm doing correct?
How should I approach the last exercise?
First Question:
The inner loop takes log(n/i) time. an upper bound is O(log(n)) giving a total time of O(n*log(n)). a lower bound is log(n/2) and sum only on the last n/2 terms, giving a total complexity of n/2 * log(n/2) = n/2*log(n) - n/2 = O(n * log(n)) and we get that the bound O(n* log(n)) is tight (we have a theta bound).
Second Question:
The inner loop takes n - i^2 time (and O(1) if i^2 >= n). Notice that for i >= sqrt(n) the inner loop takes O(1) time so we can run the outer loop only for i in 1:sqrt(n) and add O(n) to the result. An upper bound is n for the inner loop, giving a total time of O(n * sqrt(n) + n) = O(n ^ (3/2)). A lower bound is 3/4 * n for the inner loop and summing only for i's up to sqrt(n) / 2 (so that i^2 < n / 4 and n - i ^ 2 > 3/4 * n ) and we get a total time of Ω(sqrt(n) / 2 * n * 3/4 + n) = Ω(n^(3/2)) thus the bound O(n * sqrt(n)) is indeed tight.
Third Question:
In this one j is starting from 2 and we square it until it reaches i. after t steps of the inner loop, j is equal to 2^(2^t). we reach i when j = 2 ^ (log(i)) = 2 ^ (2 ^ log(log(i))), i.e., after t = log(log(i)) steps. We can again give an upper bound and lower bound similarly to the previous questions, and get the tight bound O(n * log(log(n))).
Forth Question:
The complexity can vary between 2 = O(1) and sqrt(n), depending on the factorization of n. In the worst case, n is a perfect square, giving a complexity of O(sqrt(n)
To answer your questions at the end:
1. Yes, you have done some things wrong. You have reached wrong answers in 1 and 3 and in 2 your result is right but the reasoning is flawed; the inner loop is not O(sqrt(n)), as you have already seen in my analysis.
2. Considering the "worst case" for the inner loop is good, as it's giving you an upper bound (which is mostly accurate in this kind of questions), but to establish a tight bound you must also show a lower bound, usually by taking only the higher terms and lowering them to the first, as I did in some of the examples. Another way to prove tight bounds is to use formulas of known series such as 1 + ... + n = n * (n + 1) / 2, giving an immediate bound of O(n^2) instead of getting the lower bound by 1 + ... + n >= n/2 + ... + n >= n/2 + ... + n/2 = n/2 * n/2 = n^/4 = Ω(n^2).
3. Answered above.
For the first one in the inner loop we have:
i, 2*i, 4*i, ... , (2^k)*i where (2^k)*i < n. So k < logn - logi. The outer loop as you said repeats n+1 times. In total we have this sum:
Which equals to
Therefore I think the complexity should be O(nlogn).
For the second one we have:
For third one:
So I think it should be O(log(n!))
For the last one, if n is even, it will be O(1) because we don't enter the loop. But the worst case is when n is odd and is not divisible by any of the square numbers, then I think it should be

Time complexity of the following algorithm?

I'm learning Big-O notation right now and stumbled across this small algorithm in another thread:
i = n
while (i >= 1)
{
for j = 1 to i // NOTE: i instead of n here!
{
x = x + 1
}
i = i/2
}
According to the author of the post, the complexity is Θ(n), but I can't figure out how. I think the while loop's complexity is Θ(log(n)). The for loop's complexity from what I was thinking would also be Θ(log(n)) because the number of iterations would be halved each time.
So, wouldn't the complexity of the whole thing be Θ(log(n) * log(n)), or am I doing something wrong?
Edit: the segment is in the best answer of this question: https://stackoverflow.com/questions/9556782/find-theta-notation-of-the-following-while-loop#=
Imagine for simplicity that n = 2^k. How many times x gets incremented? It easily follows this is Geometric series
2^k + 2^(k - 1) + 2^(k - 2) + ... + 1 = 2^(k + 1) - 1 = 2 * n - 1
So this part is Θ(n). Also i get's halved k = log n times and it has no asymptotic effect to Θ(n).
The value of i for each iteration of the while loop, which is also how many iterations the for loop has, are n, n/2, n/4, ..., and the overall complexity is the sum of those. That puts it at roughly 2n, which gets you your Theta(n).

What is the complexity of an arithmetic progression?

I dont really understand how to calculate the complexity of a code. I was told that i need to look on the number of actions that are done on each item in my code. So when I have a loop that runs over an array and based on the idea of arithmetic progression (I want to calculate the sum from every index till the end of the array) which means at first i pass over n cells and the second time n-1 cells and so on... why is the complexity considerd O(N^2) and not O(n) ?
As I see it, n + n-1 +n-2 + n-c.. is xn -c , In other words O(n). SO WHY am i wrong ?
As I see it, n + n-1 +n-2 + n-c.. is xn -c , In other words O(n). SO WHY am i wrong ?
Actually, it is not true. The sum of this arithmetic progression is n*(n-1)/2 = O(n^2)
P.S I have read your task : you need only one loop over an array using the previous results, so you can solve this one with O(n) complexity.
for i=1 to n
result[i] = a[i]+result[i-1]
What your code is telling to do is the following :-
traverse array from 1 to n
traverse array from 2 to n
... similarly after total n-1 iterations
traverse array's nth element
As you can notice that array traversing of cells is decreasing in order of 1.
Each traversal is being guided by loop which is increasing upto value of i. The whole code is wrapped under a function of n.
The concrete idea for number of actions performed on each item of the array is :-
for ( i = 1 to n )
for ( j = i to n )
traverse array[j] ;
Hence, complexity of your code = O(n^2) and the order is clearly in AP as it forms the series n + (n-1) + ... + 1 with a common difference of 1.
I hope it is clear...
The time complexity is: 1 + 2 + ... + n.
This is equal to n(n+1)/2.
For example, for n = 3: 1 + 2 + 3 = 6
and 3(4)/2 = 12/2 = 6
n(n+1)/2 = (n^2 + n) / 2 which is O(n^2) because we can remove constant factors and lower order terms.
As an arithmetic progression has a closed form solution, its efficient computation is o(1): that is its computation time does not depend on the number of elements.
If you were to use a loop then it would be o(n) as the execution time would be linear on the number of elements.
You're adding up n numbers whose average value is (n/2) because they range from 1 to n. Thus n times (n/2) = n^2 / 2. We don't care about the constant multiple, so O(n^2).
You are getting it wrong somewhere! The sum of an arithmetic progression is of the order of n^{2}
To clear your doubts on arithmetic progression, visit this link: http://www.mathsisfun.com/algebra/sequences-sums-arithmetic.html
And as you said, you face difficulty in finding the complexity of any code, you can read from these two links:
http://discrete.gr/complexity/
http://www.cs.cmu.edu/~adamchik/15-121/lectures/Algorithmic%20Complexity/complexity.html
Good enough to get you going and help you understand how to find the complexity of most of the algorithms.

Average Runtime of Quickselect

Wikipedia states that the average runtime of quickselect algorithm (Link) is O(n). However, I could not clearly understand how this is so. Could anyone explain to me (via recurrence relation + master method usage) as to how the average runtime is O(n)?
Because
we already know which partition our desired element lies in.
We do not need to sort (by doing partition on) all the elements, but only do operation on the partition we need.
As in quick sort, we have to do partition in halves *, and then in halves of a half, but this time, we only need to do the next round partition in one single partition (half) of the two where the element is expected to lie in.
It is like (not very accurate)
n + 1/2 n + 1/4 n + 1/8 n + ..... < 2 n
So it is O(n).
Half is used for convenience, the actual partition is not exact 50%.
To do an average case analysis of quickselect one has to consider how likely it is that two elements are compared during the algorithm for every pair of elements and assuming a random pivoting. From this we can derive the average number of comparisons. Unfortunately the analysis I will show requires some longer calculations but it is a clean average case analysis as opposed to the current answers.
Let's assume the field we want to select the k-th smallest element from is a random permutation of [1,...,n]. The pivot elements we choose during the course of the algorithm can also be seen as a given random permutation. During the algorithm we then always pick the next feasible pivot from this permutation therefore they are chosen uniform at random as every element has the same probability of occurring as the next feasible element in the random permutation.
There is one simple, yet very important, observation: We only compare two elements i and j (with i<j) if and only if one of them is chosen as first pivot element from the range [min(k,i), max(k,j)]. If another element from this range is chosen first then they will never be compared because we continue searching in a sub-field where at least one of the elements i,j is not contained in.
Because of the above observation and the fact that the pivots are chosen uniform at random the probability of a comparison between i and j is:
2/(max(k,j) - min(k,i) + 1)
(Two events out of max(k,j) - min(k,i) + 1 possibilities.)
We split the analysis in three parts:
max(k,j) = k, therefore i < j <= k
min(k,i) = k, therefore k <= i < j
min(k,i) = i and max(k,j) = j, therefore i < k < j
In the third case the less-equal signs are omitted because we already consider those cases in the first two cases.
Now let's get our hands a little dirty on calculations. We just sum up all the probabilities as this gives the expected number of comparisons.
Case 1
Case 2
Similar to case 1 so this remains as an exercise. ;)
Case 3
We use H_r for the r-th harmonic number which grows approximately like ln(r).
Conclusion
All three cases need a linear number of expected comparisons. This shows that quickselect indeed has an expected runtime in O(n). Note that - as already mentioned - the worst case is in O(n^2).
Note: The idea of this proof is not mine. I think that's roughly the standard average case analysis of quickselect.
If there are any errors please let me know.
In quickselect, as specified, we apply recursion on only one half of the partition.
Average Case Analysis:
First Step: T(n) = cn + T(n/2)
where, cn = time to perform partition, where c is any constant(doesn't matter). T(n/2) = applying recursion on one half of the partition.Since it's an average case we assume that the partition was the median.
As we keep on doing recursion, we get the following set of equation:
T(n/2) = cn/2 + T(n/4) T(n/4) = cn/2 + T(n/8) .. . T(2) = c.2 + T(1) T(1) = c.1 + ...
Summing the equations and cross-cancelling like values produces a linear result.
c(n + n/2 + n/4 + ... + 2 + 1) = c(2n) //sum of a GP
Hence, it's O(n)
I also felt very conflicted at first when I read that the average time complexity of quickselect is O(n) while we break the list in half each time (like binary search or quicksort). It turns out that breaking the search space in half each time doesn't guarantee an O(log n) or O(n log n) runtime. What makes quicksort O(n log n) and quickselect is O(n) is that we always need to explore all branches of the recursive tree for quicksort and only a single branch for quickselect. Let's compare the time complexity recurrence relations of quicksort and quickselect to prove my point.
Quicksort:
T(n) = n + 2T(n/2)
= n + 2(n/2 + 2T(n/4))
= n + 2(n/2) + 4T(n/4)
= n + 2(n/2) + 4(n/4) + ... + n(n/n)
= 2^0(n/2^0) + 2^1(n/2^1) + ... + 2^log2(n)(n/2^log2(n))
= n (log2(n) + 1) (since we are adding n to itself log2 + 1 times)
Quickselect:
T(n) = n + T(n/2)
= n + n/2 + T(n/4)
= n + n/2 + n/4 + ... n/n
= n(1 + 1/2 + 1/4 + ... + 1/2^log2(n))
= n (1/(1-(1/2))) = 2n (by geometric series)
I hope this convinces you why the average runtime of quickselect is O(n).

Resources