Time complexity of Sieve of Eratosthenes algorithm - algorithm

From Wikipedia:
The complexity of the algorithm is
O(n(logn)(loglogn)) bit operations.
How do you arrive at that?
That the complexity includes the loglogn term tells me that there is a sqrt(n) somewhere.
Suppose I am running the sieve on the first 100 numbers (n = 100), assuming that marking the numbers as composite takes constant time (array implementation), the number of times we use mark_composite() would be something like
n/2 + n/3 + n/5 + n/7 + ... + n/97 = O(n^2)
And to find the next prime number (for example to jump to 7 after crossing out all the numbers that are multiples of 5), the number of operations would be O(n).
So, the complexity would be O(n^3). Do you agree?

Your n/2 + n/3 + n/5 + … n/97 is not O(n), because the number of terms is not constant. [Edit after your edit: O(n2) is too loose an upper bound.] A loose upper-bound is n(1+1/2+1/3+1/4+1/5+1/6+…1/n) (sum of reciprocals of all numbers up to n), which is O(n log n): see Harmonic number. A more proper upper-bound is n(1/2 + 1/3 + 1/5 + 1/7 + …), that is sum of reciprocals of primes up to n, which is O(n log log n). (See here or here.)
The "find the next prime number" bit is only O(n) overall, amortized — you will move ahead to find the next number only n times in total, not per step. So this whole part of the algorithm takes only O(n).
So using these two you get an upper bound of O(n log log n) + O(n) = O(n log log n) arithmetic operations. If you count bit operations, since you're dealing with numbers up to n, they have about log n bits, which is where the factor of log n comes in, giving O(n log n log log n) bit operations.

That the complexity includes the loglogn term tells me that there is a sqrt(n) somewhere.
Keep in mind that when you find a prime number P while sieving, you don't start crossing off numbers at your current position + P; you actually start crossing off numbers at P^2. All multiples of P less than P^2 will have been crossed off by previous prime numbers.

The inner loop does n/i steps, where i is prime => the whole
complexity is sum(n/i) = n * sum(1/i). According to prime harmonic
series, the sum (1/i) where i is prime is log (log n). In
total, O(n*log(log n)).
I think the upper loop can be optimized by replacing n with sqrt(n) so overall time complexity will O(sqrt(n)loglog(n)):
void isPrime(int n){
int prime[n],i,j,count1=0;
for(i=0; i < n; i++){
prime[i] = 1;
}
prime[0] = prime[1] = 0;
for(i=2; i <= n; i++){
if(prime[i] == 1){
printf("%d ",i);
for(j=2; (i*j) <= n; j++)
prime[i*j] = 0;
}
}
}

int n = 100;
int[] arr = new int[n+1];
for(int i=2;i<Math.sqrt(n)+1;i++) {
if(arr[i] == 0) {
int maxJ = (n/i) + 1;
for(int j=2;j<maxJ;j++)
{
arr[i*j]= 1;
}
}
}
for(int i=2;i<=n;i++) {
if(arr[i]==0) {
System.out.println(i);
}
}
For all i>2, Ti = sqrt(i) * (n/i) => Tk = sqrt(k) * (n/k) => Tk = n/sqrt(k)
Loop stops when k=sqrt(n) => n[ 1/sqrt(2) + 1/sqrt(3) + ...] = n * log(log(n)) => O(nloglogn)

see take the above explanation the inner loop is harmonic sum of all prime numbers up to sqrt(n). So, the actual complexity of is O(sqrt(n)*log(log(sqrt(n))))

Related

Time Complexity for Sieve of Eratosthenes: why is it not linear? [duplicate]

From Wikipedia:
The complexity of the algorithm is
O(n(logn)(loglogn)) bit operations.
How do you arrive at that?
That the complexity includes the loglogn term tells me that there is a sqrt(n) somewhere.
Suppose I am running the sieve on the first 100 numbers (n = 100), assuming that marking the numbers as composite takes constant time (array implementation), the number of times we use mark_composite() would be something like
n/2 + n/3 + n/5 + n/7 + ... + n/97 = O(n^2)
And to find the next prime number (for example to jump to 7 after crossing out all the numbers that are multiples of 5), the number of operations would be O(n).
So, the complexity would be O(n^3). Do you agree?
Your n/2 + n/3 + n/5 + … n/97 is not O(n), because the number of terms is not constant. [Edit after your edit: O(n2) is too loose an upper bound.] A loose upper-bound is n(1+1/2+1/3+1/4+1/5+1/6+…1/n) (sum of reciprocals of all numbers up to n), which is O(n log n): see Harmonic number. A more proper upper-bound is n(1/2 + 1/3 + 1/5 + 1/7 + …), that is sum of reciprocals of primes up to n, which is O(n log log n). (See here or here.)
The "find the next prime number" bit is only O(n) overall, amortized — you will move ahead to find the next number only n times in total, not per step. So this whole part of the algorithm takes only O(n).
So using these two you get an upper bound of O(n log log n) + O(n) = O(n log log n) arithmetic operations. If you count bit operations, since you're dealing with numbers up to n, they have about log n bits, which is where the factor of log n comes in, giving O(n log n log log n) bit operations.
That the complexity includes the loglogn term tells me that there is a sqrt(n) somewhere.
Keep in mind that when you find a prime number P while sieving, you don't start crossing off numbers at your current position + P; you actually start crossing off numbers at P^2. All multiples of P less than P^2 will have been crossed off by previous prime numbers.
The inner loop does n/i steps, where i is prime => the whole
complexity is sum(n/i) = n * sum(1/i). According to prime harmonic
series, the sum (1/i) where i is prime is log (log n). In
total, O(n*log(log n)).
I think the upper loop can be optimized by replacing n with sqrt(n) so overall time complexity will O(sqrt(n)loglog(n)):
void isPrime(int n){
int prime[n],i,j,count1=0;
for(i=0; i < n; i++){
prime[i] = 1;
}
prime[0] = prime[1] = 0;
for(i=2; i <= n; i++){
if(prime[i] == 1){
printf("%d ",i);
for(j=2; (i*j) <= n; j++)
prime[i*j] = 0;
}
}
}
int n = 100;
int[] arr = new int[n+1];
for(int i=2;i<Math.sqrt(n)+1;i++) {
if(arr[i] == 0) {
int maxJ = (n/i) + 1;
for(int j=2;j<maxJ;j++)
{
arr[i*j]= 1;
}
}
}
for(int i=2;i<=n;i++) {
if(arr[i]==0) {
System.out.println(i);
}
}
For all i>2, Ti = sqrt(i) * (n/i) => Tk = sqrt(k) * (n/k) => Tk = n/sqrt(k)
Loop stops when k=sqrt(n) => n[ 1/sqrt(2) + 1/sqrt(3) + ...] = n * log(log(n)) => O(nloglogn)
see take the above explanation the inner loop is harmonic sum of all prime numbers up to sqrt(n). So, the actual complexity of is O(sqrt(n)*log(log(sqrt(n))))

Recursion, inner loop and time complexity

Consider the following function:
int testFunc(int n){
if(n < 3) return 0;
int num = 7;
for(int j = 1; j <= n; j *= 2) num++;
for(int k = n; k > 1; k--) num++;
return testFunc(n/3) + num;
}
I get that the first loop is O(logn) while the second loop gives O(n) which gives a time complexity of O(n) in total. But due to the recursive calls I thought the time complexity would be O(nlogn), but apperantly it is only O(n). Can anyone explain why?
The recursive call pretty much gives the following for the complexity(denoting the complexity for input n by T(n)):
T(n) = log(n) + n + T(n/3)
First observation as you correctly noted is that you can ignore the logarithm as it is dominated by n. Now we are only left with T(n) = n + T(n/3). Try writing this up to 0 for instance. We have:
T(n) = n + n/3 + n/9+....
You can easily prove that the above sum is always less than 2*n. In fact better limits can be proven but this one is enough to state that overall complexity is O(n).
For procedures using a recursive algorithm such as the following:
procedure T( n : size of problem ) defined as:
if n < base_case then exit
Do work of amount f(n) // In this case, the O(n) for loop
T(n/b)
T(n/b)
... a times... // In this case, b = 3, and a = 1
T(n/b)
end procedure
Applying the Master theorem to find the time complexity, the f(n) in this case is O(n) (due to the second for loop, like you said). This makes c = 1.
Now, logba = log31 = 0, making this the 3rd case of the theorem, according to which the time complexity T(n) = Θ(f(n)) = Θ(n).

Complexity of algorithm

What is the complexity given for the following problem is O(n). Shouldn't it be
O(n^2)? That is because the outer loop is O(n) and inner is also O(n), therefore n*n = O(n^2)?
The answer sheet of this question states that the answer is O(n). How is that possible?
public static void q1d(int n) {
int count = 0;
for (int i = 0; i < n; i++) {
count++;
for (int j = 0; j < n; j++) {
count++;
}
}
}
The complexity for the following problem is O(n^2), how can you obtain that? Can someone please elaborate?
public static void q1E(int n) {
int count = 0;
for (int i = 0; i < n; i++) {
count++;
for (int j = 0; j < n/2; j++) {
count++;
}
}
}
Thanks
The first example is O(n^2), so it seems they've made a mistake. To calculate (informally) the second example, we can do n * (n/2) = (n^2)/2 = O(n^2). If this doesn't make sense, you need to go and brush up what the meaning of something being O(n^k) is.
The complexity of both code is O(n*n)
FIRST
The outer loop runs n times and the inner loop varies from 0 to n-1 times
so
total = 1 + 2 + 3 + 4 ... + n
which if you add the arithmetic progression is n * ( n + 1 ) / 2 is O(n*n)
SECOND
The outer loop runs n times and the inner loop varies from 0 to n-1/2 times
so
total = 1 + 1/2 + 3/2 + 4/2 ... + n/2
which if you add the arithmetic progression is n * ( n + 1 ) / 4 is also O(n*n)
First case is definitely O(n^2)
The second is O(n^2) as well because you omit constants when calculate big O
Your answer sheet is wrong, the first algorithm is clearly O(n^2).
Big-Oh notation is "worst case" so when calculating the Big-Oh value, we generally ignore multiplications / divisions by constants.
That being said, your second example is also O(n^2) in the worst case because, although the inner loop is "only" 1/2 n, the n is the clear bounding factor. In practice the second algorithm will be less than O(n^2) operations -- but Big-Oh is intended to be a "worst case" (ie. maximal bounding) measurement, so the exact number of operations is ignored in favor of focusing on how the algorithm behaves as n approaches infinity.
Both are O(n^2). Your answer is wrong. Or you may have written the question incorrectly.

O(n log log n) time complexity

I have a short program here:
Given any n:
i = 0;
while (i < n) {
k = 2;
while (k < n) {
sum += a[j] * b[k]
k = k * k;
}
i++;
}
The asymptotic running time of this is O(n log log n). Why is this the case? I get that the entire program will at least run n times. But I'm not sure how to find log log n. The inner loop is depending on k * k, so it's obviously going to be less than n. And it would just be n log n if it was k / 2 each time. But how would you figure out the answer to be log log n?
For mathematical proof, inner loop can be written as:
T(n) = T(sqrt(n)) + 1
w.l.o.g assume 2 ^ 2 ^ (t-1)<= n <= 2 ^ (2 ^ t)=>
we know 2^2^t = 2^2^(t-1) * 2^2^(t-1)
T(2^2^t) = T(2^2^(t-1)) + 1=T(2^2^(t-2)) + 2 =....= T(2^2^0) + t =>
T(2^2^(t-1)) <= T(n) <= T(2^2^t) = T(2^2^0) + log log 2^2^t = O(1) + loglogn
==> O(1) + (loglogn) - 1 <= T(n) <= O(1) + loglog(n) => T(n) = Teta(loglogn).
and then total time is O(n loglogn).
Why inner loop is T(n)=T(sqrt(n)) +1?
first see when inner loop breaks, when k>n, means before that k was at least sqrt(n), or in two level before it was at most sqrt(n), so running time will be T(sqrt(n)) + 2 ≥ T(n) ≥ T(sqrt(n)) + 1.
Time Complexity of a loop is O(log log n) if the loop variables is reduced / increased exponentially by a constant amount. If the loop variable is divided / multiplied by a constant amount then complexity is O(Logn).
Eg: in your case value of k is as follow. Let i in parenthesis denote the number of times the loop has been executed.
2 (0) , 2^2 (1), 2^4 (2), 2^8 (3), 2^16(4), 2^32 (5) , 2^ 64 (6) ...... till n (k) is reached.
The value of k here will be O(log log n) which is the number of times the loop has executed.
For the sake of assumption lets assume that n is 2^64. Now log (2^64) = 64 and log 64 = log (2^6) = 6. Hence your program ran 6 times when n is 2^64.
I think if the codes are like this, it should be n*log n;
i = 0;
while (i < n) {
k = 2;
while (k < n) {
sum += a[j] * b[k]
k *= c;// c is a constant bigger than 1 and less than k;
}
i++;
}
Okay, So let's break this down first -
Given any n:
i = 0;
while (i < n) {
k = 2;
while (k < n) {
sum += a[j] * b[k]
k = k * k;
}
i++;
}
while( i<n ) will run for n+1 times but we'll round it off to n times.
now here comes the fun part, k<n will not run for n times instead it will run for log log n times because here instead of incrementing k by 1,in each loop we are incrementing it by squaring it. now this means it'll take only log log n time for the loop. you'll understand this when you learn design and analysis of algorithm
Now we combine all the time complexity and we get n.log log n time here I hope you get it now.

Tricky Big-O complexity

public void foo(int n, int m) {
int i = m;
while (i > 100) {
i = i / 3;
}
for (int k = i ; k >= 0; k--) {
for (int j = 1; j < n; j *= 2) {
System.out.print(k + "\t" + j);
}
System.out.println();
}
}
I figured the complexity would be O(logn).
That is as a product of the inner loop, the outer loop -- will never be executed more than 100 times, so it can be omitted.
What I'm not sure about is the while clause, should it be incorporated into the Big-O complexity? For very large i values it could make an impact, or arithmetic operations, doesn't matter on what scale, count as basic operations and can be omitted?
The while loop is O(log m) because you keep dividing m by 3 until it is below or equal to 100.
Since 100 is a constant in your case, it can be ignored, yes.
The inner loop is O(log n) as you said, because you multiply j by 2 until it exceeds n.
Therefore the total complexity is O(log n + log m).
or arithmetic operations, doesn't matter on what scale, count as basic operations and can be omitted?
Arithmetic operations can usually be omitted, yes. However, it also depends on the language. This looks like Java and it looks like you're using primitive types. In this case it's ok to consider arithmetic operations O(1), yes. But if you use big integers for example, that's not really ok anymore, as addition and multiplication are no longer O(1).
The complexity is O(log m + log n).
The while loop executes log3(m) times - a constant (log3(100)). The outer for loop executes a constant number of times (around 100), and the inner loop executes log2(n) times.
The while loop divides the value of m by a factor of 3, therefore the number of such operations will be log(base 3) m
For the for loops you could think of the number of operations as 2 summations -
summation (k = 0 to i) [ summation (j = 0 to lg n) (1)]
summation (k = 0 to i) [lg n + 1]
(lg n + 1) ( i + 1) will be total number of operations, of which the log term dominates.
That's why the complexity is O(log (base3) m + lg n)
Here the lg refers to log to base 2

Resources