Find the efficiency in Big-O notation - algorithm

I was having problem with the following question
Consider the following nested loop construct. Categorize its efficiency in terms of the
variable n using "big-o" notation. Suppose the statements represented by the ellipsis
(...) require four main memory accesses (each requiring one microsecond) and two
disk file accesses (each requiring one millisecond). Express in milliseconds the amount
of time this construct would require to execute if n were 1000.
x = 1;
do
{
y = n;
while (y > 0)
{
...
y--;
}
x *= 2;
} while (x < n*n);

Inner loop with y is O(n).
Outer loop runs with x = 1, 2, 2^2, 2^3, ... 2^k < n * n. Hence it runs in O(log(n*n)) which is O(2 * log(n))
Hence complexity is O(n * log(n))

Just to add some more explanation to theother answer, a notable part of code is the x *= 2; ie a doubling. So this part is not linear. So you should be thinking log2.
Therefore, x will reach n*n in log2(n*n). = log2(n^2) = 2 x log2(n).
The y countdown is linear - so that is O(n)
There is a loop within a loop so you multiply both operations as in:
n * 2 x log2(n) = O(n * 2 * log2(n)). Then you take out constant factors to get:
O(n * log2(n))

Related

Understanding Big O complexity

I'm having tough time in understanding Big O time complexity.
Formal definition of Big O :
f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.
And worst time complexity of insertion sort is O(n^2).
I want to understand what is f(n), g(n), c and k here in case of insertion sort.
Explanation
It is not that easy to formalize an algorithm such that you can apply Big-O formally, it is a mathematical concept and does not easily translate to algorithms. Typically you would measure the amount of "computation steps" needed to perform the operation based on the size of the input.
So f is the function that measures how many computation steps the algorithm performs.
n is the size of the input, for example 5 for a list like [4, 2, 9, 8, 2].
g is the function you measure against, so g = n^2 if you check for O(n^2).
c and k heavily depend on the specific algorithm and how exactly f looks like.
Example
The biggest issue with formalizing an algorithm is that you can not really tell exactly how many computation steps are performed. Let's say we have the following Java code:
public static void printAllEven(int n) {
for (int i = 0; i < n; i++) {
if (i % 2 == 0) {
System.out.println(i);
}
}
}
How many steps does it perform? How deep should we go? What about for (int i = 0; i < count; i++)? Those are multiple statements which are executed during the loop. What about i % 2? Can we assume this is a "single operation"? On which level, one CPU cycle? One assembly line? What about the println(i), how many computation steps does it need, 1 or 5 or maybe 200?
This is not practical. We do not know the exact amount, we have to abstract and say it is a constant A, B and C amount of steps, which is okay since it runs in constant time.
After simplifying the analysis, we can say that we are effectively only interested in how often println(i) is called.
This leads to the observation that we call it precisely n / 2 times (since we have so many even numbers between 0 and n.
The exact formula for f using aboves constants would yield something like
n * A + n * B + n/2 * C
But since constants do not really play any role (they vanish in c), we could also just ignore this and simplify.
Now you are left with proving that n / 2 is in O(n^2), for example. By doing that, you will also get concrete numbers for c and k. Example:
n / 2 <= n <= 1 * n^2 // for all n >= 0
So by choosing c = 1 and k = 0 you have proven the claim. Other values for c and k work as well, example:
n / 2 <= 100 * n <= 5 * n^2 // for all n >= 20
Here we have choosen c = 5 and k = 20.
You could play the same game with the full formula as well and get something like
n * A + n * B + n/2 * C
<= n * (A + B + C)
= D * n
<= D * n^2 // for all n > 0
with c = D and k = 0.
As you see, it does not really play any role, the constants just vanish in c.
In case of insertion sort f(n) is the actual number of operations your processor will do to perform a sort. g(n)=n2. Minimal values for c and k will be implementation-defined, but they are not as important. The main idea this Big-O notation gives is that if you double the size of the array the time it takes for insertion sort to work will grow approximately by a factor of 4 (22). (For insertion sort it can be smaller, but Big-O only gives upper bound)

How to analize the complexity of this Algorithm? In term of T(n) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Analyze the complexity of the following algorithms. Said T(n) the running time of the algorithm, determine a function f (n) such that T(n) = O(f(n)). Also, let's say if it also applies T(n) = Θ(f(n)). The answers must be motivated.
I never do this kind of exercise.
Could someone explain what I have to analyze and how can I do it?
j=1,k=0;
while j<=n do
for l=1 to n-j do
k=k+j;
end for
j=j*4;
end while
Thank you.
Step 1
Following on from the comments, the value of j can be written as a power of 4. Therefore the code can be re-written in the following way:
i=0,k=0; // new loop variable i
while (j=pow(4,i)) <= n do // equivalent loop condition which calculates j
for l=1 to n-j do
k=k+j;
end for
i=i+1; // equivalent to j=j*4
end while
The value of i increases as 0, 1, 2, 3, 4 ..., and the value of j as 1, 4, 16, 64, 256 ... (i.e. powers of 4).
Step 2
What is the maximum value of i, i.e. how many times does the outer loop run? Inverting the equivalent loop condition:
pow(4,i) <= n // loop condition inequality
--> i <= log4(n) // take logarithm base-4 of both sides
--> max(i) = floor(log4(n)) // round down
Now that the maximum value of i is known, it's time to re-write the code again:
i=0,k=0;
m=floor(log4(n)) // maximum value of i
while i<=m do // equivalent loop condition in terms of i only
j=pow(4,i) // value of j for each i
for l=1 to n-j do
k=k+j;
end for
i=i+1;
end while
Step 3
You have correctly deduced that the inner loop runs for n - j times for every outer loop. This can be summed over all values of j to give the total time complexity:
j≤n
T(n) = ∑ (n - j)
j
i≤m
= ∑ (n - pow(4,i)) // using the results of steps 1) and 2)
i=0
i≤m
= (m+1)*n - ∑ pow(4,i) // separate the sum into two parts
i=0
\_____/ \_________/
A B
The term A is obviously O(n log n), because m=floor(log4(n)). What about B?
Step 4
B is a geometric series, for which there is a standard formula (source – Wikipedia):
Substituting the relevant numbers "a" = 1, "n" = m+1, "r" = 4:
B = (pow(4,m+1) - 1) / (4 - 1)
= 3 * pow(4, floor(log4(n))+1) - 3
If a number is rounded down (floor), the result is always greater than the original value minus 1. Therefore m can be asymptotically written as:
m = log4(n) + O(1)
--> B = 3 * pow(4, log4(n) + O(1)) - 3
= 3 * pow(4, O(1)) * n - 3
----------------
this is O(1)
= O(n)
Step 5
A = O(n log n), B = O(n), so asymptotically A overshadows B.
The total time complexity is O(n log n).
Consider the number of times each instruction is executed depending on n (the variable input). Let's call that the cost of each instruction. Typically, some parts of the algorithm are run a significantly greater number of times more often than other parts. Also typically, this "significantly greater number" is such that it asymptotically dominates all others, meaning that as n grows larger, the cost of all other instructions become negligible. Once you understand that, you simply have to figure out the cost of the significant instruction, or at least what it is proportional to.
In your example, two instructions are potentially costly; let k=k+j; cost x, and j=j*4; cost y.
j=1,k=0; // Negligible
while j<=n do
for l=1 to n-j do
k=k+j; // Run x times
end for
j=j*4; // Run y times
end while
Being tied to only one loop, y is easier to determine. The outer loop runs for j from 1 to n, with j growing exponentially: its value follows the sequence [1, 4, 16, 64, ...] (the i-th term is 4^i, starting at 0). That simply means that y is proportional to the logarithm of n (of any base, because all logarithms are proportional). So y = O(log n).
Now for x: we know it is a multiple of y since it is tied to an inner loop. For each time the outer loop runs, this inner loop runs for l from 1 to n-j, with l growing linearly (it's a for loop). That means it simply runs n-j-1 times, or n-1 - 4^i with i being the index of the current outer loop, starting at 0.
Since y = O(log n), x is proportional to the sum of n - 1 - 4^i, for i from 0 to log n, or
(n-1 - 4^0) + (n-1 - 4^1) + (n-1 - 4^2) + ... =
((log n)-1) * (n-1) - (1-4^(log n))/(1-4) =
O(log n * n) + O(n) =
O(n log n)
And here is your answer: x = O(n log n), which dominates all other costs, so the total complexity of the algorithm is O(n log n).
You need to calculate how many times each line will execute.
j=1,k=0; // 1
while j<=n do //n+1
for l=1 to n-j do // ∑n
k=k+j; //∑n-1
end for
j=j*4; //n
end while
total complexity [add execution time of all lines]
= 1+(n+1)+ ∑ n + ∑ (n-1) + n
= 2n+2+ n^2/2 + n/2 + (n-1)^2/2 + (n-1)/2
take max term of above and skip constant factors then you will left with n^2
total runtime complexity will be o(n^2)
Looks like a homework question, but to give you a hint: The coplexity can be calculated by the amount of loops. One loop means O(n) two loops O(n^2) and three loops O(n^3).
This only goes for neste loops:
while () {
while () {
while() {
}
}
}
this is O(n^3)
But...
while () {
}
while() {
}
Still is O(n), because the loopsdo not run over each other and will stop after n iterations.
EDIT
The correct answer should be O(n*log(n)), beacuse of the inner for-loop the amount of iterations depends on the value of j. Which can be different every iteration.

How do I evaluate and explain the running time for this algorithm?

Pseudo code :
s <- 0
for i=1 to n do
if A[i]=1 then
for j=1 to n do
{constant number of elementary operations}
endfor
else
s <- s + a[i]
endif
endfor
where A[i] is an array of n integers, each of which is a random value between 1 and 6.
I'm at loss here...picking from my notes and some other sources online, I get
T(n) = C1(N) + C2 + C3(N) + C4(N) + C5
where C1(N) and C3(N) = for loops, and C4(N) = constant number of elementary operations. Though I have a strong feeling that I'm wrong.
You are looping from 1..n, and each loop ALSO loops from 1..n (in the worst case). O(n^2) right?
Put another way: You shouldn't be adding C3(N). That would be the case if you had two independent for loops from 1..n. But you have a nested loop. You will run the inner loop N times, and the outer loop N times. N*N = O(n^2).
Let's think for a second about what this algorithm does.
You are basically looping through every element in the array at least once (outer for, the one with i). Sometimes, if the value A[i] is 1, you are also looping again through the whole loop with the j for.
In your worst case scenario, you are running against an array of all 1's.
In that case, your complexity is:
time to init s
n * (time to test a[i] == 1 + n * time of {constant ...})
Which means: T = T(s) + n * (T(test) + n * T(const)) = n^2 * T(const) + n*T(test) + T(s).
Asymptotically, this is a O(n^2).
But this was a worst-case analysis (the one you should perform most of the times). What if we want an average case analysis?
In that case, assuming an uniform distribution of values in A, you are going to access the for loop of j, on average, 1/6 of the times.
So you'd get:
- time to init s
- n * (time to test a[i] == 1 + 1/6 * n * time of {constant ...} + 5/6 * T(increment s)
Which means: T = T(s) + n * (T(test) + 1/6 * n * T(const) + 5/6 * T(inc s)) = 1/6* n^2 * T(const) + n * (5/6 * T(inc s) + T(test)) + T(s).
Again, asymptotically this is still O(n^2), but according to the value of T(inc s) this could be larger or lower than the other case.
Fun exercise: can you estimate the expected average run time for a generic distribution of values in A, instead of an uniform one?

derivation of algorithm complexity

Refreshing up on algorithm complexity, I was looking at this example:
int x = 0;
for ( int j = 1; j <= n; j++ )
for ( int k = 1; k < 3*j; k++ )
x = x + j;
I know this loops ends up being O(n^2). I'm believing inner loop is executed 3*n times( 3(1+2+...n) ), and the outer loop executes n times. So, O(3n*n) = O(3n^2) = O(n^2).
However, the source I'm looking at expands the execution of the inner loop to: 3(1+2+3+...+n) = 3n^2/2 + 3n/2. Can anyone explain the 3n^2/2 + 3n/2 execution times?
for each J you have to execute J * 3 iterations of internal loop, so you command x=x+j will be finally executed n * 3 * (1 + 2 + 3 ... + n) times, sum of Arithmetic progression is n*(n+1)/2, so you command will be executed:
3 * n * (n+1)/2 which is equals to (3*n^2)/2 + (3*n)/2
but big O is not how much iterations will be, it is about assymptotic measure, so in expression 3*n*(n+1)/2 needs to remove consts (set them all to 0 or 1), so we have 1*n*(n+0)/1 = n^2
Small update about big O calculation for this case: to make big O from the 3n(n+1)/2, for big O you can imagine than N is infinity, so:
infinity + 1 = infinity
3*infinity = infinity
infinity/2 = infinity
infinity*infinity = infinity^2
so you after this you have N^2
The sum of integers from 1 to m is m*(m+1)/2. In the given problem, j goes from 1 to n, and k goes from 1 to 3*j. So the inner loop on k is executed 3*(1+2+3+4+5+...+n) times, with each term in that series representing one value of j. That gives 3n(n+1)/2. If you expand that, you get 3n^2/2+3n/2. The whole thing is still O(n^2), though. You don't care if your execution time is going up both quadratically and linearly, since the linear gets swamped by the quadratic.
Big O notation gives an upper bound on the asymptotic running time of an algorithm. It does not take into account the lower order terms or the constant factors. Therefore O(10n2) and O(1000n2 + 4n + 56) is still O(n2).
What you are doing is try to count the number the number of operations in your algorithm. However Big O does not say anything about the exact number of operations. It simply provides you an upper bound on the worst case running time that may occur with an unfavorable input.
The exact precision of your algorithm can be found using Sigma notation like this:
It's been empirically verified.

Tricky Big-O complexity

public void foo(int n, int m) {
int i = m;
while (i > 100) {
i = i / 3;
}
for (int k = i ; k >= 0; k--) {
for (int j = 1; j < n; j *= 2) {
System.out.print(k + "\t" + j);
}
System.out.println();
}
}
I figured the complexity would be O(logn).
That is as a product of the inner loop, the outer loop -- will never be executed more than 100 times, so it can be omitted.
What I'm not sure about is the while clause, should it be incorporated into the Big-O complexity? For very large i values it could make an impact, or arithmetic operations, doesn't matter on what scale, count as basic operations and can be omitted?
The while loop is O(log m) because you keep dividing m by 3 until it is below or equal to 100.
Since 100 is a constant in your case, it can be ignored, yes.
The inner loop is O(log n) as you said, because you multiply j by 2 until it exceeds n.
Therefore the total complexity is O(log n + log m).
or arithmetic operations, doesn't matter on what scale, count as basic operations and can be omitted?
Arithmetic operations can usually be omitted, yes. However, it also depends on the language. This looks like Java and it looks like you're using primitive types. In this case it's ok to consider arithmetic operations O(1), yes. But if you use big integers for example, that's not really ok anymore, as addition and multiplication are no longer O(1).
The complexity is O(log m + log n).
The while loop executes log3(m) times - a constant (log3(100)). The outer for loop executes a constant number of times (around 100), and the inner loop executes log2(n) times.
The while loop divides the value of m by a factor of 3, therefore the number of such operations will be log(base 3) m
For the for loops you could think of the number of operations as 2 summations -
summation (k = 0 to i) [ summation (j = 0 to lg n) (1)]
summation (k = 0 to i) [lg n + 1]
(lg n + 1) ( i + 1) will be total number of operations, of which the log term dominates.
That's why the complexity is O(log (base3) m + lg n)
Here the lg refers to log to base 2

Resources