Algorithm Analysis: Big-O explanation - algorithm

I'm currently taking a class in algorithms. The following is a question I got wrong from a quiz: Basically, we have to indicate the worst case running time in Big O notation:
int foo(int n)
{
m = 0;
while (n >=2)
{
n = n/4;
m = m + 1;
}
return m;
}
I don't understand how the worst case time for this just isn't O(n). Would appreciate an explanation. Thanks.

foo calculates log4(n) by dividing n by 4 and counting number of 4's in n using m as a counter. At the end, m is going to be the number of 4's in n. So it is linear in the final value of m, which is equal to log base 4 of n. The algorithm is then O(logn), which is also O(n).

Let's suppose that the worst case is O(n). That implies that the function takes at least n steps.
Now let's see the loop, n is being divided by 4 (or 2²) at each step. So, in the first iteration n is reduced to n/4, in the second, to n/8. It isn't being reduced linearly. It's being reduced by a power of two so, in the worst case, it's running time is O(log n).

The computation can be expressed as a recurrence formula:
f(r) = 4*f(r+1)
The solution is
f(r) = k * 4 ^(1-r)
Where ^ means exponent. In our case we can say f(0) = n
So f(r) = n * 4^(-r)
Solving for r on the end condition we have: 2 = n * 4^(-r)
Using log in both sides, log(2) = log(n) - r* log(4) we can see
r = P * log(n);
Not having more branches or inner loops, and assuming division and addition are O(1) we can confidently say the algorithm, runs P * log(n) steps therefore is a O((log(n)).
http://www.wolframalpha.com/input/?i=f%28r%2B1%29+%3D+f%28r%29%2F4%2C+f%280%29+%3D+n
Nitpickers corner: A C int usually means the largest value is 2^32 - 1 so in practice it means only max 15 iterations, which is of course O(1). But I think your teacher really means O(log(n)).

Related

Time complexity of an algorithm that runs 1+2+...+n times;

To start off I found this stackoverflow question that references the time complexity to be O(n^2), but it doesn't answer the question of why O(n^2) is the time complexity but instead asks for an example of such an algorithm. From my understanding an algorithm that runs 1+2+3+...+n times would be
less than O(n^2). For example, take this function
function(n: number) {
let sum = 0;
for(let i = 0; i < n; i++) {
for(let j = 0; j < i+1; j++) {
sum += 1;
}
}
return sum;
}
Here are some input and return values
num
sum
1
1
2
3
3
6
4
10
5
15
6
21
7
28
From this table you can see that this algorithm runs in less than O(n^2) but more than O(n). I also realize than algorithm that runs 1+(1+2)+(1+2+3)+...+(1+2+3+...+n) is true O(n^2) time complexity. For the algorithm stated in the problem, do we just say it runs in O(n^2) because it runs more than O(log n) times?
It's known that 1 + 2 + ... + n has a short form of n * (n + 1) / 2. Even if you didn't know that, you have to consider that, when i gets to n, the inner loop runs at most n times. So you have exactly n times (for outer loop i), each running at most n times (for inner loop j), so the O(n^2) becomes more apparent.
I agree that the complexity would be exactly n^2 if the inner loop also ran from 0 to n, so you have your reasons to think that a loop i from 0 to n and another loop j from 0 to i has to perform better and that's true, but with big Oh notation you're actually measuring the degree of algorithm's complexity, not the exact number of operations.
p.s. O(log n) is usually achieved when you split the main problem into sub-problems.
I think you should interpret the table differently. The O(N^2) complexity says that if you double the input N, the runtime should quadruple (take 4 times as long). In this case, the function(n: number) returns a number mirroring its runtime. I use f(N) as a short for it.
So say N goes from 1 to 2, which means the input has doubled (2/1 = 2). The runtime then has gone from f(1) to f(2), which means it has increased f(2)/f(1) = 3/1 = 3 times. That is not 4 times, but the Big-O complexity measure is asymptotic, dealing with the situation where N approaches infinity. If we test another input doubling from the table, we have f(6)/f(3) = 21/6 = 3.5. It is already closer to 4.
Let us now stray outside the table and try more doublings with bigger N. For example we have f(200)/f(100) = 20100/5050 = 3.980 and f(5000)/f(2500) = 12502500/3126250 = 3.999. The trend is clear. As N approaches infinity, a doubled input tends toward a quadrupled runtime. And that is the hallmark of O(N^2).

Understanding Big O complexity

I'm having tough time in understanding Big O time complexity.
Formal definition of Big O :
f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.
And worst time complexity of insertion sort is O(n^2).
I want to understand what is f(n), g(n), c and k here in case of insertion sort.
Explanation
It is not that easy to formalize an algorithm such that you can apply Big-O formally, it is a mathematical concept and does not easily translate to algorithms. Typically you would measure the amount of "computation steps" needed to perform the operation based on the size of the input.
So f is the function that measures how many computation steps the algorithm performs.
n is the size of the input, for example 5 for a list like [4, 2, 9, 8, 2].
g is the function you measure against, so g = n^2 if you check for O(n^2).
c and k heavily depend on the specific algorithm and how exactly f looks like.
Example
The biggest issue with formalizing an algorithm is that you can not really tell exactly how many computation steps are performed. Let's say we have the following Java code:
public static void printAllEven(int n) {
for (int i = 0; i < n; i++) {
if (i % 2 == 0) {
System.out.println(i);
}
}
}
How many steps does it perform? How deep should we go? What about for (int i = 0; i < count; i++)? Those are multiple statements which are executed during the loop. What about i % 2? Can we assume this is a "single operation"? On which level, one CPU cycle? One assembly line? What about the println(i), how many computation steps does it need, 1 or 5 or maybe 200?
This is not practical. We do not know the exact amount, we have to abstract and say it is a constant A, B and C amount of steps, which is okay since it runs in constant time.
After simplifying the analysis, we can say that we are effectively only interested in how often println(i) is called.
This leads to the observation that we call it precisely n / 2 times (since we have so many even numbers between 0 and n.
The exact formula for f using aboves constants would yield something like
n * A + n * B + n/2 * C
But since constants do not really play any role (they vanish in c), we could also just ignore this and simplify.
Now you are left with proving that n / 2 is in O(n^2), for example. By doing that, you will also get concrete numbers for c and k. Example:
n / 2 <= n <= 1 * n^2 // for all n >= 0
So by choosing c = 1 and k = 0 you have proven the claim. Other values for c and k work as well, example:
n / 2 <= 100 * n <= 5 * n^2 // for all n >= 20
Here we have choosen c = 5 and k = 20.
You could play the same game with the full formula as well and get something like
n * A + n * B + n/2 * C
<= n * (A + B + C)
= D * n
<= D * n^2 // for all n > 0
with c = D and k = 0.
As you see, it does not really play any role, the constants just vanish in c.
In case of insertion sort f(n) is the actual number of operations your processor will do to perform a sort. g(n)=n2. Minimal values for c and k will be implementation-defined, but they are not as important. The main idea this Big-O notation gives is that if you double the size of the array the time it takes for insertion sort to work will grow approximately by a factor of 4 (22). (For insertion sort it can be smaller, but Big-O only gives upper bound)

what is the time complexity of this method which checks if a number k can be represented as n^p

Time complexity of below method? I'm calculating it as log(n)*log(n)= log(n)
public int isPower(int A) {
if (A == 1)
return 1;
for (int i = (int)Math.sqrt(A); i > 1; i--){
int p = A;
while (p % i == 0) {
p = p / i;
}
if (p == 1)
return 1;
}
return 0;
}
Worst-case complexity:
for(..) runs sqrt(A) times
Then while(..) depends on prime factorization of A=p_1^e1*p_2^e_2*..*p_n^e_n, so it is Max(e_1,e_2,..,e_n) worst-case, or roughly Max(log_p_1(A),log_p_2(A),..)
At most while(..) will execute log(A) times roughly.
so total rough worst-case complexity = sqrt(A)*log(A) leaving out constant factors
Worst-case complexity happens for numbers A which are products of different integers ie A = n_1^e_1*n_2^e_2*..
Average-case complexity:
Given than numbers which are products of different integers are more numerous than numbers which are simply powers of a single integer, in a given range, then choosing a number at random, is more likely to be product of different integers, ie A = n_1^e_1*n_2^e_2... Thus average-case complexity is roughly the same as worst-case complexity ie sqrt(A)*log(A)
Best-case complexity:
Best-case complexity happens when the number A is indeed a power of a single integer/prime ie A = n^e. Then the algorithm in this case takes less time. I leave it as an exercise to compute best-case complexity.
PS. Another way to see this is to understand that checking if a number is a power of a prime/integer, effectively one has to factor the number to its prime factorisation (which is what is done in this algorithm), which is effectively of the same complexity (see for example complexity of factoring by trial division).
SO should have mathjax support as cs.stackexchange has :p !
You iterate from sqrt(A) to 2. Then u tried to factorize. For prime number your code iterate sqrt(A) times . its best case. if number is 2^30 then ur code execute
sqrt(2^30) * 30 means sqrt(n) * log(n) times.
So your code complexity: sqrt(n) * log(n)

Is complexity O(log(n)) equivalent to O(sqrt(n))?

My professor just taught us that any operation that halves the length of the input has an O(log(n)) complexity as a thumb rule. Why is it not O(sqrt(n)), aren't both of them equivalent?
They are not equivalent: sqrt(N) will increase a lot more quickly than log2(N). There is no constant C so that you would have sqrt(N) < C.log(N) for all values of N greater than some minimum value.
An easy way to grasp this, is that log2(N) will be a value close to the number of (binary) digits of N, while sqrt(N) will be a number that has itself half the number of digits that N has. Or, to state that with an equality:
        log2(N) = 2log2(sqrt(N))
So you need to take the logarithm(!) of sqrt(N) to bring it down to the same order of complexity as log2(N).
For example, for a binary number with 11 digits, 0b10000000000 (=210), the square root is 0b100000, but the logarithm is only 10.
Assuming natural logarithms (otherwise just multiply by a constant), we have
lim {n->inf} log n / sqrt(n) = (inf / inf)
= lim {n->inf} 1/n / 1/(2*sqrt(n)) (by L'Hospital)
= lim {n->inf} 2*sqrt(n)/n
= lim {n->inf} 2/sqrt(n)
= 0 < inf
Refer to https://en.wikipedia.org/wiki/Big_O_notation for alternative defination of O(.) and thereby from above we can say log n = O(sqrt(n)),
Also compare the growth of the functions below, log n is always upper bounded by sqrt(n) for all n > 0.
Just compare the two functions:
sqrt(n) ---------- log(n)
n^(1/2) ---------- log(n)
Plug in Log
log( n^(1/2) ) --- log( log(n) )
(1/2) log(n) ----- log( log(n) )
It is clear that: const . log(n) > log(log(n))
No, It's not equivalent.
#trincot gave one excellent explanation with example in his answer. I'm adding one more point. Your professor taught you that
any operation that halves the length of the input has an O(log(n)) complexity
It's also true that,
any operation that reduces the length of the input by 2/3rd, has a O(log3(n)) complexity
any operation that reduces the length of the input by 3/4th, has a O(log4(n)) complexity
any operation that reduces the length of the input by 4/5th, has a O(log5(n)) complexity
So on ...
It's even true for all reduction of lengths of the input by (B-1)/Bth. It then has a complexity of O(logB(n))
N:B: O(logB(n)) means B based logarithm of n
One way to approach the problem can be to compare the rate of growth of O()
and O( )
As n increases we see that (2) is less than (1). When n = 10,000 eq--1 equals 0.005 while eq--2 equals 0.0001
Hence is better as n increases.
No, they are not equivalent; you can even prove that
O(n**k) > O(log(n, base))
for any k > 0 and base > 1 (k = 1/2 in case of sqrt).
When talking on O(f(n)) we want to investigate the behaviour for large n,
limits is good means for that. Suppose that both big O are equivalent:
O(n**k) = O(log(n, base))
which means there's a some finite constant C such that
O(n**k) <= C * O(log(n, base))
starting from some large enough n; put it in other terms (log(n, base) is not 0 starting from large n, both functions are continuously differentiable):
lim(n**k/log(n, base)) = C
n->+inf
To find out the limit's value we can use L'Hospital's Rule, i.e. take derivatives for numerator and denominator and divide them:
lim(n**k/log(n)) =
lim([k*n**(k-1)]/[ln(base)/n]) =
ln(base) * k * lim(n**k) = +infinity
so we can conclude that there's no constant C such that O(n**k) < C*log(n, base) or in other words
O(n**k) > O(log(n, base))
No, it isn't.
When we are dealing with time complexity, we think of input as a very large number. So let's take n = 2^18. Now for sqrt(n) number of operation will be 2^9 and for log(n) it will be equal to 18 (we consider log with base 2 here). Clearly 2^9 much much greater than 18.
So, we can say that O(log n) is smaller than O(sqrt n).
To prove that sqrt(n) grows faster than lgn(base2) you can take the limit of the 2nd over the 1st and proves it approaches 0 as n approaches infinity.
lim(n—>inf) of (lgn/sqrt(n))
Applying L’Hopitals Rule:
= lim(n—>inf) of (2/(sqrt(n)*ln2))
Since sqrt(n) and ln2 will increase infinitely as n increases, and 2 is a constant, this proves
lim(n—>inf) of (2/(sqrt(n)*ln2)) = 0

Time and Space Complexity of an Algorithm - Big O Notation

I am trying to analyze the Big-O-Notation of a simple algorithm and it has been a while I've worked with it. So I've come with an analysis and trying to figure out if this is correct one according to rules for the following code:
public int Add()
{
int total = 0; //Step 1
foreach(var item in list) //Step 2
{
if(item.value == 1) //Step 3
{
total += 1; //Step 4
}
}
return total;
}
If you assign a variable or set, in this case the complexity is determined according to the rules of Big O is O(1). So the first phase will be O(1) - This means whatever the input size is, the program will execute for the same time and memory space.
The second step comes up with foreach loop. One thing is pretty clear in the loop. According to the input, the loop iterates or runs. As an example, for input 10, loop iterates 10 times and for 20, 20 times. Totally depends on the input. In accordance with the rules of the Big O, the complexity would be O(n) - n is the number of inputs. So in the above code, the loop iterates depending upon the number of items in the list.
In this step, we define a variable that determines a condition check (See Step 3 in the coding). In that case, the complexity is O(1) according to the Big O rule.
In the same way, in step 4, there is also no change (See Step 4 in the coding). If the condition check is true, then total variable increments a value by 1. So we write - complexity O(1).
So if the above calculations are perfect, then the final complexity stands as the following:
O(1) + O(n) + O(1) + O(1) or (O(1) + O(n) * O(1) + O(1))
I am not sure if this is correct. But I guess, I would expect some clarification on this if this isn't the perfect one. Thanks.
Big O notation to describe the asymptotic behavior of functions. Basically, it tells you how fast a function grows or declines
For example, when analyzing some algorithm, one might find that the time (or the number of steps) it takes to complete a problem of size n is given by
T(n) = 4 n^2 - 2 n + 2
If we ignore constants (which makes sense because those depend on the particular hardware the program is run on) and slower growing terms, we could say "T(n)" grows at the order of n^2 " and write:T(n) = O(n^2)
For the formal definition, suppose f(x) and g(x) are two functions defined on some subset of the real numbers. We write
f(x) = O(g(x))
(or f(x) = O(g(x)) for x -> infinity to be more precise) if and only if there exist constants N and C such that
|f(x)| <= C|g(x)| for all x>N
Intuitively, this means that f does not grow faster than g
If a is some real number, we write
f(x) = O(g(x)) for x->a
if and only if there exist constants d > 0 and C such that
|f(x)| <= C|g(x)| for all x with |x-a| < d
So for your case it would be
O(n) as |f(x)| > C|g(x)|
Reference from http://web.mit.edu/16.070/www/lecture/big_o.pdf
int total = 0;
for (int i = n; i < n - 1; i++) { // --> n loop
for (int j = 0; j < n; j++) { // --> n loop
total = total + 1; // -- 1 time
}
}
}
Big O Notation gives an assumption when value is very big outer loop will run n times and inner loop is running n times
Assume n -> 100 than total n^2 10000 run times
Your analysis is not exactly correct.
Step 1 indeed takes O(1) operations
Step 2 indeed takes O(n) operations
Step 3 takes O(1) operations, but it is executed n times, so its whole contribution to complexity is O(1*n)=O(n)
Step 4 takes O(1) operations, but it is executed up to n times, so its whole contribution to complexity is also O(1*n)=O(n)
The whole complexity is O(1)+O(n)+O(n)+O(n) = O(n).
Your calculation for step 3 and 4 are incorrect as both these steps are inside the for loop.
so step 2,3 and 4 complexity will be O(n)*(O(1) +O(1))=O(n)
and when clubbed with step 1 it will be O(1)+O(n)=O(n).

Resources