Time complexity for Shell sort? - algorithm

First, here's my Shell sort code (using Java):
public char[] shellSort(char[] chars) {
int n = chars.length;
int increment = n / 2;
while(increment > 0) {
int last = increment;
while(last < n) {
int current = last - increment;
while(current >= 0) {
if(chars[current] > chars[current + increment]) {
//swap
char tmp = chars[current];
chars[current] = chars[current + increment];
chars[current + increment] = tmp;
current -= increment;
}
else { break; }
}
last++;
}
increment /= 2;
}
return chars;
}
Is this a correct implementation of Shell sort (forgetting for now about the most efficient gap sequence - e.g., 1,3,7,21...)? I ask because I've heard that the best-case time complexity for Shell Sort is O(n). (See http://en.wikipedia.org/wiki/Sorting_algorithm). I can't see this level of efficiency being realized by my code. If I added heuristics to it, then yeah, but as it stands, no.
That being said, my main question now - I'm having difficulty calculating the Big O time complexity for my Shell sort implementation. I identified that the outer-most loop as O(log n), the middle loop as O(n), and the inner-most loop also as O(n), but I realize the inner two loops would not actually be O(n) - they would be much less than this - what should they be? Because obviously this algorithm runs much more efficiently than O((log n) n^2).
Any guidance is much appreciated as I'm very lost! :P

The worst-case of your implementation is Θ(n^2) and the best-case is O(nlogn) which is reasonable for shell-sort.
The best case ∊ O(nlogn):
The best-case is when the array is already sorted. The would mean that the inner if statement will never be true, making the inner while loop a constant time operation. Using the bounds you've used for the other loops gives O(nlogn). The best case of O(n) is reached by using a constant number of increments.
The worst case ∊ O(n^2):
Given your upper bound for each loop you get O((log n)n^2) for the worst-case. But add another variable for the gap size g. The number of compare/exchanges needed in the inner while is now <= n/g. The number of compare/exchanges of the middle while is <= n^2/g. Add the upper-bound of the number of compare/exchanges for each gap together: n^2 + n^2/2 + n^2/4 + ... <= 2n^2 ∊ O(n^2). This matches the known worst-case complexity for the gaps you've used.
The worst case ∊ Ω(n^2):
Consider the array where all the even positioned elements are greater than the median. The odd and even elements are not compared until we reach the last increment of 1. The number of compare/exchanges needed for the last iteration is Ω(n^2).

Insertion Sort
If we analyse
static void sort(int[] ary) {
int i, j, insertVal;
int aryLen = ary.length;
for (i = 1; i < aryLen; i++) {
insertVal = ary[i];
j = i;
/*
* while loop exits as soon as it finds left hand side element less than insertVal
*/
while (j >= 1 && ary[j - 1] > insertVal) {
ary[j] = ary[j - 1];
j--;
}
ary[j] = insertVal;
}
}
Hence in case of average case the while loop will exit in middle
i.e 1/2 + 2/2 + 3/2 + 4/2 + .... + (n-1)/2 = Theta((n^2)/2) = Theta(n^2)
You saw here we achieved (n^2)/2 even though divide by two doesn't make more difference.
Shell Sort is nothing but insertion sort by using gap like n/2, n/4, n/8, ...., 2, 1
mean it takes advantage of Best case complexity of insertion sort (i.e while loop exit) starts happening very quickly as soon as we find small element to the left of insert element, hence it adds up to the total execution time.
n/2 + n/4 + n/8 + n/16 + .... + n/n = n(1/2 + 1/4 + 1/8 + 1/16 + ... + 1/n) = nlogn (Harmonic Series)
Hence its time complexity is some thing close to n(logn)^2

Related

Time Complexity for Sieve of Eratosthenes: why is it not linear? [duplicate]

From Wikipedia:
The complexity of the algorithm is
O(n(logn)(loglogn)) bit operations.
How do you arrive at that?
That the complexity includes the loglogn term tells me that there is a sqrt(n) somewhere.
Suppose I am running the sieve on the first 100 numbers (n = 100), assuming that marking the numbers as composite takes constant time (array implementation), the number of times we use mark_composite() would be something like
n/2 + n/3 + n/5 + n/7 + ... + n/97 = O(n^2)
And to find the next prime number (for example to jump to 7 after crossing out all the numbers that are multiples of 5), the number of operations would be O(n).
So, the complexity would be O(n^3). Do you agree?
Your n/2 + n/3 + n/5 + … n/97 is not O(n), because the number of terms is not constant. [Edit after your edit: O(n2) is too loose an upper bound.] A loose upper-bound is n(1+1/2+1/3+1/4+1/5+1/6+…1/n) (sum of reciprocals of all numbers up to n), which is O(n log n): see Harmonic number. A more proper upper-bound is n(1/2 + 1/3 + 1/5 + 1/7 + …), that is sum of reciprocals of primes up to n, which is O(n log log n). (See here or here.)
The "find the next prime number" bit is only O(n) overall, amortized — you will move ahead to find the next number only n times in total, not per step. So this whole part of the algorithm takes only O(n).
So using these two you get an upper bound of O(n log log n) + O(n) = O(n log log n) arithmetic operations. If you count bit operations, since you're dealing with numbers up to n, they have about log n bits, which is where the factor of log n comes in, giving O(n log n log log n) bit operations.
That the complexity includes the loglogn term tells me that there is a sqrt(n) somewhere.
Keep in mind that when you find a prime number P while sieving, you don't start crossing off numbers at your current position + P; you actually start crossing off numbers at P^2. All multiples of P less than P^2 will have been crossed off by previous prime numbers.
The inner loop does n/i steps, where i is prime => the whole
complexity is sum(n/i) = n * sum(1/i). According to prime harmonic
series, the sum (1/i) where i is prime is log (log n). In
total, O(n*log(log n)).
I think the upper loop can be optimized by replacing n with sqrt(n) so overall time complexity will O(sqrt(n)loglog(n)):
void isPrime(int n){
int prime[n],i,j,count1=0;
for(i=0; i < n; i++){
prime[i] = 1;
}
prime[0] = prime[1] = 0;
for(i=2; i <= n; i++){
if(prime[i] == 1){
printf("%d ",i);
for(j=2; (i*j) <= n; j++)
prime[i*j] = 0;
}
}
}
int n = 100;
int[] arr = new int[n+1];
for(int i=2;i<Math.sqrt(n)+1;i++) {
if(arr[i] == 0) {
int maxJ = (n/i) + 1;
for(int j=2;j<maxJ;j++)
{
arr[i*j]= 1;
}
}
}
for(int i=2;i<=n;i++) {
if(arr[i]==0) {
System.out.println(i);
}
}
For all i>2, Ti = sqrt(i) * (n/i) => Tk = sqrt(k) * (n/k) => Tk = n/sqrt(k)
Loop stops when k=sqrt(n) => n[ 1/sqrt(2) + 1/sqrt(3) + ...] = n * log(log(n)) => O(nloglogn)
see take the above explanation the inner loop is harmonic sum of all prime numbers up to sqrt(n). So, the actual complexity of is O(sqrt(n)*log(log(sqrt(n))))

Complexity Algorithm Analysis with if

I have the following code. What time complexity does it have?
I have tried to write a recurrence relation for it but I can't understand when will the algorithm add 1 to n or divide n by 4.
void T(int n) {
for (i = 0; i < n; i++);
if (n == 1 || n == 0)
return;
else if (n%2 == 1)
T(n + 1);
else if (n%2 == 0)
T(n / 4);
}
You can view it like this: you always divide by four only if you have odd you add 1 to n before division. So, you should count how many times 1 was added. If there no increments then you have log4n recursive calls. Let's assume that you always have to add 1 before division. Then can rewrite it like this:
void T(int n) {
for (i = 0; i < n; i++);
if (n == 1 || n == 0)
return;
else if (n%2 == 0)
T(n / 4 + 1);
}
But n/4 + 1 < n/2, and in case of recursive call T(n/2), running time is O(log(n,4)), but base of logarithm doesn't impact running time in big-O notation because it's just like constant factor. So running time is O(log(n)).
EDIT:
As ALB pointed in a comment, there is cycle of length n. So, with accordance with master theorem running time is Theta(n). You can see it in another way as sum of n * (1 + 1/2 + 1/4 + 1/8 + ...) = 2 * n.
Interesting question. Be aware that even though your for loop is doing nothing, since it is not an optimized solution (see Dukeling's comment), it will be considered in your time complexity as if computing time was taken to iterate through it.
First part
The first section is definitely O(n).
Second part
Let's suppose for the sake of simplicity that half the time n will be odd and the other half time it will be even. Hence, the recursion is looping (n+1) half the time and (n/4) the other half.
Conclusion
For each time T(n) is called, the recursion will implicitly loop n times. Hence, The first half of the time, we will have a complexity of n * (n+1) = n^2 + n. The other half of the time, we will deal with a n * (n/4) = (1/4)n^2.
For Big O notation, we care more about the upper bound than its precise behavior. Hence, your algorithm would be bound by O(n^2).

Is my understanding of big-O correct for these Java functions incorrect?

My approach (might be incorrect) is formulaic. If there is a loop then (n+1) if there is a nested loop (n^2) if a statement then O(1). If division then log(n).
Here are some example and my reasoning to solving, not sure at all if this approach is problematic or if any of them are correct. I need help with this.
Example1:
i = n; // I think O(1) because it's a statment
while (i > 1) // I think O(n) because it's a loop
i = i/4; // O(n) because it's in a loop and log_4(n) b/c division
// I think Overall if we combine the O(n) from earlier and the log_4(n)
// Therefore, I think overall O(nlog(n))
Example2:
for (i = 1; i < n; i = i + i) // I think this is O(n+1) thus, O(n)
System.out.println("Hello World"); // O(n) because it's in a loop
// Therefore, overall I think O(n)
Example3:
for (i = 0; i < n; i = i + 1) // I think O(n+1), thus O(n)
for (j = 1; j < n; j++) // I think O(n^2) because in a nested loop
System.out.println("Hello Universe!"); // O(n^2) because in a nested
// Therefore, overall I think O(n^2)
Example4:
for (i = 1; i < (n * n + 3 * n + 17) / 4; i = i + 1) // O((20+n^3)/4) thus, O(n^3)
System.out.println("Hello Again!'); // O(n) because it's in a loop
// Therefore, overall I think O(n^3) because largest Big-O in the code
Thank you
Example1: Your result is wrong. Because the loop happens log_4(n) times and also the division takes O(1) (division by 4 just needs a bitwise shift). Thus the overall time is only O(log(n))
Example2: It's wrong too. In each iteration you duplicate the loop variable. So the loop happens O(log(n)) times. The print command takes O(1) and total time is O(log(n)).
Example3: Your answer is correct. Because you have two nested O(n) loops. Note that this loop is different from two previous examples.
Example4: I think you made a writing mistake. Does ((n * n + 3 * n + 17) / 4) equal O((20+n^3)/4)??? It is O(n^2). Thus, according to my previous explanations, the overall time is O(n^2).

Why Bubble sort complexity is O(n^2)?

As I understand, the complexity of an algorithm is a maximum number of operations performed while sorting. So, the complexity of Bubble sort should be a sum of arithmmetic progression (from 1 to n-1), not n^2.
The following implementation counts number of comparisons:
public int[] sort(int[] a) {
int operationsCount = 0;
for (int i = 0; i < a.length; i++) {
for(int j = i + 1; j < a.length; j++) {
operationsCount++;
if (a[i] > a[j]) {
int temp = a[i];
a[i] = a[j];
a[j] = temp;
}
}
}
System.out.println(operationsCount);
return a;
}
The ouput for array with 10 elements is 45, so it's a sum of arithmetic progression from 1 to 9.
So why Bubble sort's complexity is n^2, not S(n-1) ?
This is because big-O notation describes the nature of the algorithm. The major term in the expansion (n-1) * (n-2) / 2 is n^2. And so as n increases all other terms become insignificant.
You are welcome to describe it more precisely, but for all intents and purposes the algorithm exhibits behaviour that is of the order n^2. That means if you graph the time complexity against n, you will see a parabolic growth curve.
Let's do a worst case analysis.
In the worst case, the if (a[i] > a[j]) test will always be true, so the next 3 lines of code will be executed in each loop step. The inner loop goes from j=i+1 to n-1, so it will execute Sum_{j=i+1}^{n-1}{k} elementary operations (where k is a constant number of operations that involve the creation of the temp variable, array indexing, and value copying). If you solve the summation, it gives a number of elementary operations that is equal to k(n-i-1). The external loop will repeat this k(n-i-1) elementary operations from i=0 to i=n-1 (ie. Sum_{i=0}^{n-1}{k(n-i-1)}). So, again, if you solve the summation you see that the final number of elementary operations is proportional to n^2. The algorithm is quadratic in the worst case.
As you are incrementing the variable operationsCount before running any code in the inner loop, we can say that k (the number of elementary operations executed inside the inner loop) in our previous analysis is 1. So, solving Sum_{i=0}^{n-1}{n-i-1} gives n^2/2 - n/2, and substituting n with 10 gives a final result of 45, just the same result that you got by running the code.
Worst case scenario:
indicates the longest running time performed by an algorithm given any input of size n
so we will consider the completely backward list for this worst-case scenario
int[] arr= new int[]{9,6,5,3,2};
Number of iteration or for loops required to completely sort it = n-1 //n - number of elements in the list
1st iteration requires (n-1) swapping + 2nd iteration requires (n-2) swapping + ……….. + (n-1)th iteration requires (n-(n-1)) swapping
i.e. (n-1) + (n-2) + ……….. +1 = n/2(a+l) //sum of AP
=n/2((n-1)+1)=n^2/2
so big O notation = O(n^2)

Complexity of algorithm

What is the complexity given for the following problem is O(n). Shouldn't it be
O(n^2)? That is because the outer loop is O(n) and inner is also O(n), therefore n*n = O(n^2)?
The answer sheet of this question states that the answer is O(n). How is that possible?
public static void q1d(int n) {
int count = 0;
for (int i = 0; i < n; i++) {
count++;
for (int j = 0; j < n; j++) {
count++;
}
}
}
The complexity for the following problem is O(n^2), how can you obtain that? Can someone please elaborate?
public static void q1E(int n) {
int count = 0;
for (int i = 0; i < n; i++) {
count++;
for (int j = 0; j < n/2; j++) {
count++;
}
}
}
Thanks
The first example is O(n^2), so it seems they've made a mistake. To calculate (informally) the second example, we can do n * (n/2) = (n^2)/2 = O(n^2). If this doesn't make sense, you need to go and brush up what the meaning of something being O(n^k) is.
The complexity of both code is O(n*n)
FIRST
The outer loop runs n times and the inner loop varies from 0 to n-1 times
so
total = 1 + 2 + 3 + 4 ... + n
which if you add the arithmetic progression is n * ( n + 1 ) / 2 is O(n*n)
SECOND
The outer loop runs n times and the inner loop varies from 0 to n-1/2 times
so
total = 1 + 1/2 + 3/2 + 4/2 ... + n/2
which if you add the arithmetic progression is n * ( n + 1 ) / 4 is also O(n*n)
First case is definitely O(n^2)
The second is O(n^2) as well because you omit constants when calculate big O
Your answer sheet is wrong, the first algorithm is clearly O(n^2).
Big-Oh notation is "worst case" so when calculating the Big-Oh value, we generally ignore multiplications / divisions by constants.
That being said, your second example is also O(n^2) in the worst case because, although the inner loop is "only" 1/2 n, the n is the clear bounding factor. In practice the second algorithm will be less than O(n^2) operations -- but Big-Oh is intended to be a "worst case" (ie. maximal bounding) measurement, so the exact number of operations is ignored in favor of focusing on how the algorithm behaves as n approaches infinity.
Both are O(n^2). Your answer is wrong. Or you may have written the question incorrectly.

Resources