Do these two pseudo-algorithms have different execution times? - algorithm

// -- Algorithm A
int a = 1, b = 2;
for (int n = 0; n < 100; n++)
{
int c = a + b + n;
int d = a - b - n;
}
// -- Algorithm B
int a = 1, b = 2;
for (int n = 0; n < 100; n++)
{
int c = a + b + n;
}
for (int n = 0; n < 100; n++)
{
int d = a - b - n;
}
Should I try to use existing loops to make necessary operations? Or in the end the result is the same?

In O(n) notation they will be the same. According to this:
you will firs have a Sum:
O(n) + O(n) = O(2n)
And then Multiplication by constant:
O(2n) = O(n)
so in the end it will be O(n)

Complexity-wise, both algorithms are O(n). Even if you consider multiplicative constants, you could say that one is n * 2 and the other one n + n, which is exactly the same.
In reality, though, it depends. One could argue that, since the second one performs twice as many branches, the performance will probably be worse (see this famous question), but ultimately it depends on the compiler, the particular input, the OS, etc.

In your current implementation
int a = 1, b = 2;
for (int n = 0; n < 100; n++)
{
int c = a + b + n;
int d = a - b - n;
}
you're doing nothing: both c and d are local vairables, which exist
within for loop scope only; if optimizer is smart enough to find out that
there's no possibility of integer overflow (both 1 + 2 + 100 and
1 - 2 - 100 are within [int.MinValue..int.MaxValue]) it can well
eliminate the entire loop(s) with warning to developer.
Real world example is
for (int n = 0; n < N; n++)
{
f(n);
g(n);
}
Versus
for (int n = 0; n < N; n++)
f(n);
for (int n = 0; n < N; n++)
g(n);
where both f(n) and g(n) don't have side effects and N is large enough.
So far so good, in the 1st case the execution time is
T = f(0) + g(0) +
f(1) + g(1) +
...
f(N - 2) + g(N - 2) +
f(N - 1) + g(N - 1)
In the 2nd case
T = f(0) + f(1) + ... f(N - 2) + f(N - 1) +
g(0) + g(1) + ... g(N - 2) + g(N - 1)
As you can see, the execution times are the same (not only O(...)).
In real life, it can be miniscule difference between two implementations:
loop initialization and implementation details, CPU register utilizations etc.

Definitely the first algo will be faster, but since the complexity is only increasing linearly, the second one is not bad. As far as you don't go quadratic both are good,
But if you end up writing n such loops then you have n^2 complexity which is bad

Related

How to find the time complexity of these two programs? [duplicate]

int sum = 0;
for(int i = 1; i < n; i++) {
for(int j = 1; j < i * i; j++) {
if(j % i == 0) {
for(int k = 0; k < j; k++) {
sum++;
}
}
}
}
I don't understand how when j = i, 2i, 3i... the last for loop runs n times. I guess I just don't understand how we came to that conclusion based on the if statement.
Edit: I know how to compute the complexity for all the loops except for why the last loop executes i times based on the mod operator... I just don't see how it's i. Basically, why can't j % i go up to i * i rather than i?
Let's label the loops A, B and C:
int sum = 0;
// loop A
for(int i = 1; i < n; i++) {
// loop B
for(int j = 1; j < i * i; j++) {
if(j % i == 0) {
// loop C
for(int k = 0; k < j; k++) {
sum++;
}
}
}
}
Loop A iterates O(n) times.
Loop B iterates O(i2) times per iteration of A. For each of these iterations:
j % i == 0 is evaluated, which takes O(1) time.
On 1/i of these iterations, loop C iterates j times, doing O(1) work per iteration. Since j is O(i2) on average, and this is only done for 1/i iterations of loop B, the average cost is O(i2 / i) = O(i).
Multiplying all of this together, we get O(n × i2 × (1 + i)) = O(n × i3). Since i is on average O(n), this is O(n4).
The tricky part of this is saying that the if condition is only true 1/i of the time:
Basically, why can't j % i go up to i * i rather than i?
In fact, j does go up to j < i * i, not just up to j < i. But the condition j % i == 0 is true if and only if j is a multiple of i.
The multiples of i within the range are i, 2*i, 3*i, ..., (i-1) * i. There are i - 1 of these, so loop C is reached i - 1 times despite loop B iterating i * i - 1 times.
The first loop consumes n iterations.
The second loop consumes n*n iterations. Imagine the case when i=n, then j=n*n.
The third loop consumes n iterations because it's executed only i times, where i is bounded to n in the worst case.
Thus, the code complexity is O(n×n×n×n).
I hope this helps you understand.
All the other answers are correct, I just want to amend the following.
I wanted to see, if the reduction of executions of the inner k-loop was sufficient to reduce the actual complexity below O(n⁴). So I wrote the following:
for (int n = 1; n < 363; ++n) {
int sum = 0;
for(int i = 1; i < n; ++i) {
for(int j = 1; j < i * i; ++j) {
if(j % i == 0) {
for(int k = 0; k < j; ++k) {
sum++;
}
}
}
}
long cubic = (long) Math.pow(n, 3);
long hypCubic = (long) Math.pow(n, 4);
double relative = (double) (sum / (double) hypCubic);
System.out.println("n = " + n + ": iterations = " + sum +
", n³ = " + cubic + ", n⁴ = " + hypCubic + ", rel = " + relative);
}
After executing this, it becomes obvious, that the complexity is in fact n⁴. The last lines of output look like this:
n = 356: iterations = 1989000035, n³ = 45118016, n⁴ = 16062013696, rel = 0.12383254507467704
n = 357: iterations = 2011495675, n³ = 45499293, n⁴ = 16243247601, rel = 0.12383580700180696
n = 358: iterations = 2034181597, n³ = 45882712, n⁴ = 16426010896, rel = 0.12383905075183874
n = 359: iterations = 2057058871, n³ = 46268279, n⁴ = 16610312161, rel = 0.12384227647628734
n = 360: iterations = 2080128570, n³ = 46656000, n⁴ = 16796160000, rel = 0.12384548432498857
n = 361: iterations = 2103391770, n³ = 47045881, n⁴ = 16983563041, rel = 0.12384867444612208
n = 362: iterations = 2126849550, n³ = 47437928, n⁴ = 17172529936, rel = 0.1238518469862343
What this shows is, that the actual relative difference between actual n⁴ and the complexity of this code segment is a factor asymptotic towards a value around 0.124... (actually 0.125). While it does not give us the exact value, we can deduce, the following:
Time complexity is n⁴/8 ~ f(n) where f is your function/method.
The wikipedia-page on Big O notation states in the tables of 'Family of Bachmann–Landau notations' that the ~ defines the limit of the two operand sides is equal. Or:
f is equal to g asymptotically
(I chose 363 as excluded upper bound, because n = 362 is the last value for which we get a sensible result. After that, we exceed the long-space and the relative value becomes negative.)
User kaya3 figured out the following:
The asymptotic constant is exactly 1/8 = 0.125, by the way; here's the exact formula via Wolfram Alpha.
Remove if and modulo without changing the complexity
Here's the original method:
public static long f(int n) {
int sum = 0;
for (int i = 1; i < n; i++) {
for (int j = 1; j < i * i; j++) {
if (j % i == 0) {
for (int k = 0; k < j; k++) {
sum++;
}
}
}
}
return sum;
}
If you're confused by the if and modulo, you can just refactor them away, with j jumping directly from i to 2*i to 3*i ... :
public static long f2(int n) {
int sum = 0;
for (int i = 1; i < n; i++) {
for (int j = i; j < i * i; j = j + i) {
for (int k = 0; k < j; k++) {
sum++;
}
}
}
return sum;
}
To make it even easier to calculate the complexity, you can introduce an intermediary j2 variable, so that every loop variable is incremented by 1 at each iteration:
public static long f3(int n) {
int sum = 0;
for (int i = 1; i < n; i++) {
for (int j2 = 1; j2 < i; j2++) {
int j = j2 * i;
for (int k = 0; k < j; k++) {
sum++;
}
}
}
return sum;
}
You can use debugging or old-school System.out.println in order to check that i, j, k triplet is always the same in each method.
Closed form expression
As mentioned by others, you can use the fact that the sum of the first n integers is equal to n * (n+1) / 2 (see triangular numbers). If you use this simplification for every loop, you get :
public static long f4(int n) {
return (n - 1) * n * (n - 2) * (3 * n - 1) / 24;
}
It is obviously not the same complexity as the original code but it does return the same values.
If you google the first terms, you can notice that 0 0 0 2 11 35 85 175 322 546 870 1320 1925 2717 3731 appear in "Stirling numbers of the first kind: s(n+2, n).", with two 0s added at the beginning. It means that sum is the Stirling number of the first kind s(n, n-2).
Let's have a look at the first two loops.
The first one is simple, it's looping from 1 to n. The second one is more interesting. It goes from 1 to i squared. Let's see some examples:
e.g. n = 4
i = 1
j loops from 1 to 1^2
i = 2
j loops from 1 to 2^2
i = 3
j loops from 1 to 3^2
In total, the i and j loops combined have 1^2 + 2^2 + 3^2.
There is a formula for the sum of first n squares, n * (n+1) * (2n + 1) / 6, which is roughly O(n^3).
You have one last k loop which loops from 0 to j if and only if j % i == 0. Since j goes from 1 to i^2, j % i == 0 is true for i times. Since the i loop iterates over n, you have one extra O(n).
So you have O(n^3) from i and j loops and another O(n) from k loop for a grand total of O(n^4)

Time Complexity log(n) vs Big O (root n)

Trying to analyze the below code snippet.
For the below code can the time complexity be Big O(log n)?. I am new to asymptotic analysis. In the tutorial it says its O( root n).
int p = 0;
for(int i =1;p<=n;i++){
p = p +i;
}
,,,
Variable p is going to take the successive values 1, 1+2, 1+2+3, etc.
This sequence is called the sequence of triangular numbers; you can read more about it on Wikipedia or OEIS.
One thing to be noted is the formula:
1 + 2 + ... + i = i*(i+1)/2
Hence your code could be rewritten under the somewhat equivalent form:
int p = 0;
for (int i = 1; p <= n; i++)
{
p = i * (i + 1) / 2;
}
Or, getting rid of p entirely:
for (int i = 1; (i - 1) * i / 2 <= n; i++)
{
}
Hence your code runs while (i-1)*i <= 2n. You can make the approximation (i-1)*i ≈ i^2 to see that the loop runs for about sqrt(2n) operations.
If you are not satisfied with this approximation, you can solve for i the quadratic equation:
i^2 - i - 2n == 0
You will find that the loop runs while:
i <= (1 + sqrt(1 + 8n)) / 2 == 0.5 + sqrt(2n + 0.125)

Big-O analysis for a loop

I've got to analyze this loop, among others, and determine its running time using Big-O notation.
for ( int i = 0; i < n; i += 4 )
for ( int j = 0; j < n; j++ )
for ( int k = 1; k < j*j; k *= 2 )`
Here's what I have so far:
for ( int i = 0; i < n; i += 4 ) = n
for ( int j = 0; j < n; j++ ) = n
for ( int k = 1; k < j*j; k *= 2 ) = log^2 n
Now the problem I'm coming to is the final running time of the loop. My best guess is O(n^2), however I am uncertain if this correct. Can anyone help?
Edit: sorry about the Oh -> O thing. My textbook uses "Big-Oh"
First note that the outer loop is independent from the remaining two - it simply adds a (n/4)* multiplier. We will consider that later.
Now let's consider the complexity of
for ( int j = 0; j < n; j++ )
for ( int k = 1; k < j*j; k *= 2 )
We have the following sum:
0 + log2(1) + log2(2 * 2) + ... + log2(n*n)
It is good to note that log2(n^2) = 2 * log2(n). Thus we re-factor the sum to:
2 * (0 + log2(1) + log2(2) + ... + log2(n))
It is not very easy to analyze this sum but take a look at this post. Using Sterling's approximation one can that it is belongs to O(n*log(n)). Thus the overall complexity is O((n/4)*2*n*log(n))= O(n^2*log(n))
In terms of j, the inner loop is O(log_2(j^2)) time, but sine
log_2(j^2)=2log(j), it is actually O(log(j)).
For each iteration of middle loop, it takes O(log(j)) time (to do the
inner loop), so we need to sum:
sum { log(j) | j=1,..., n-1 } log(1) + log(2) + ... + log(n-1) = log((n-1)!)
And since log((n-1)!) is in O((n-1)log(n-1)) = O(nlogn), we can conclude middle middle loop takes O(nlogn) operations .
Note that both middle and inner loop are independent of i, so to
get the total complexity, we can just multiply n/4 (number of
repeats of outer loop) with complexity of middle loop, and get:
O(n/4 * nlogn) = O(n^2logn)
So, total complexity of this code is O(n^2 * log(n))
Time Complexity of a loop is considered as O(n) if the loop variables is incremented / decremented by a constant amount (which is c in examples below):
for (int i = 1; i <= n; i += c) {
// some O(1) expressions
}
for (int i = n; i > 0; i -= c) {
// some O(1) expressions
}
Time complexity of nested loops is equal to the number of times the innermost statement is executed. For example the following sample loops have O(n²) time complexity:
for (int i = 1; i <=n; i += c) {
for (int j = 1; j <=n; j += c) {
// some O(1) expressions
}
}
for (int i = n; i > 0; i += c) {
for (int j = i+1; j <=n; j += c) {
// some O(1) expressions
}
Time Complexity of a loop is considered as O(logn) if the loop variables is divided / multiplied by a constant amount:
for (int i = 1; i <=n; i *= c) {
// some O(1) expressions
}
for (int i = n; i > 0; i /= c) {
// some O(1) expressions
}
Now we have:
for ( int i = 0; i < n; i += 4 ) <----- runs n times
for ( int j = 0; j < n; j++ ) <----- for every i again runs n times
for ( int k = 1; k < j*j; k *= 2 )` <--- now for every j it runs logarithmic times.
So complexity is O(n²logm) where m is n² which can be simplified to O(n²logn) because n²logm = n²logn² = n² * 2logn ~ n²logn.

running time of algorithm does not match the reality

I have the following algorithm:
I analyzed this algoritm as follow:
Since the outer for loop goes from i to n it iterates at most n times,
and the loop on j iterates again from i to n which we can say at most n times,
if we do the same with the whole algorithm we have 4 nested for loop so the running time would be O(n^4).
But when I run this code for different input size I get the following result:
As you can see the result is much closer to n^3? can anyone explain why does this happen or what is wrong with my analysis that I get a loose bound?
Formally, you may proceed like the following, using Sigma Notation, to obtain the order of growth complexity of your algorithm:
Moreover, the equation obtained tells the exact number of iterations executed inside the innermost loop:
int sum = 0;
for( i=0 ; i<n ; i++ )
for( j=i ; j<n ; j++ )
for( k=0 ; k<j ; k++ )
for( h=0 ; h<i ; h++ )
sum ++;
printf("\nsum = %d", sum);
When T(10) = 1155, sum = 1155 also.
I'm sure there's a conceptual way to see why, but you can prove by induction the above has (n + 2) * (n + 1) * n * (n - 1) / 24 loops. Proof left to the reader.
In other words, it is indeed O(n^4).
Edit: You're count increases too frequently. Simply try this code to count number of loops:
for (int n = 0; n < 30; n++) {
int sum = 0;
for (int i = 0; i < n; i++) {
for (int j = i; j < n; j++) {
for(int k = 0; k < j; k++) {
for (int h = k; h < i; h++) {
sum++;
}
}
}
}
System.out.println(n + ": " + sum + " = " + (n + 2) * (n + 1) * n * (n - 1) / 24);
}
You are having a rather complex algorithm. The number of operations is clearly less than n^4, but it isn't at all obvious how much less than n^4, and whether it is O (n^3) or not.
Checking the values n = 1 to 9 and making a guess based on the results is rather pointless.
To get a slightly better idea, assume that the number of steps is either c * n^3 or d * n^4, and make a table of the values c and d for 1 <= n <= 1,000. That might give you a better idea. It's not a foolproof method; there are algorithms changing their behaviour dramatically much later than at n = 1,000.
Best method is of course a proof. Just remember that O (n^4) doesn't mean "approximately n^4 operations", it means "at most c * n^4 operations, for some c". Sometimes c is small.

Number of assignments necessary to find the minimum value in an array?

Someone asked me a brainteaser, and I don't know; my knowledge slows down after amortized analysis, and in this case, this is O(n).
public int findMax(array) {
int count = 0;
int max = array[0];
for (int i=0; i<array.length; i++) {
if (array[i] > max) {
count++;
max = array[i];
}
}
return count;
}
What's the expected value of count for an array of size n?
Numbers are randomly picked from a uniform distribution.
Let f(n) be the average number of assignments.
Then if the last element is not the largest, f(n) = f(n-1).
If the last element is the largest, then f(n) = f(n-1) + 1.
Since the last number is largest with probability 1/n, and not the largest with probability (n-1)/n, we have:
f(n) = (n-1)/n*f(n-1) + 1/n*(f(n-1) + 1)
Expand and collect terms to get:
f(n) = f(n-1) + 1/n
And f(1) = 0. So:
f(1) = 0
f(2) = 0 + 1/2
f(3) = 0 + 1/2 + 1/3
f(4) = 0 + 1/2 + 1/3 + 1/4
That is, f(n) is the n_th "Harmonic number", which you can get in closed form only approximately. (Well, one less than the n_th Harmonic number. The problem would be prettier if you initialized max to INT_MIN and just let the loop run, so that f(1) = 1.)
The above is not a rigorous proof, since I was sloppy about expected values versus actual values. But I believe the answer is right anyway :-).
I would like to comment on Nemo's answer, but I don't have the reputation to comment. His correct answer can be simplified:
The chance that the second number is larger than the first is 1/2. Regardless of that, the chance that the 3rd number is larger than two before, is 1/3. These are all independent chances and the total expectation is therefore
1/2 + 1/3 + 1/4 + .. + 1/n
You can actually take this analysis a step further when the value of each item comes from a finite set. Let E(N, M) be the expected number of assignments when finding the max of N elements that come uniformly from an alphabet of size M. Then we can say...
E(0, M) = E(N, 0) = 0
E(N, M) = 1 + SUM[SUM[E(j, i) * (N - 1 Choose j) * ((M - i) / M)^(N-j-1) * (i / M) ^ j : j from 0 to N - 1] : i from 0 to M - 1]
This is a bit hard to come up with a closed form for but we can be sure that E(N, M) is in O(log(min(N, M))). This is because E(N, INF) is in THETA(log(N)) as the harmonic series sum grows proportional to the log function and E(N, M) < E(N, M + 1). Likewise when M < N we have E(N, M) < E(M, INF) as there is at M unique values.
And here's some code to compute E(N, M) yourself. I wonder if anyone can get this to a closed form?
#define N 100
#define M 100
double NCR[N + 1][M + 1];
double E[N + 1][M + 1];
int main() {
NCR[0][0] = 1;
for(int i = 1; i <= N; i++) {
NCR[i][0] = NCR[i][i] = 1;
for(int j = 1; j < i; j++) {
NCR[i][j] = NCR[i - 1][j - 1] + NCR[i - 1][j];
}
}
for(int n = 1; n <= N; n++) {
for(int m = 1; m <= M; m++) {
E[n][m] = 1;
for(int i = 1; i < m; i++) {
for(int j = 1; j < n; j++) {
E[n][m] += NCR[n - 1][j] *
pow(1.0 * (m - i) / m, n - j - 1) *
pow(1.0 * i / m, j) * E[j][i] / m;
}
}
}
}
cout << E[N][M] << endl;
}
I am assuming all elements are distinct and counting the initial assignment to max outside the for loop.
If the array is sorted in increasing order, the variable max gets assigned to exactly n times (each time it gets a greater value).
If the array is sorted in decreasing order, the variable max gets assigned to exactly once (it gets assigned the first time and all subsequent values are smaller).
Edit:
My formulation for a randomly permuted array was actually wrong, as pointed out in the comments. I think #Nemo posts the correct answer to this.
I think just counting the number of assignments is not really a true measure of the cost of this function. whether or not we actually update the value of max, we are actually comparing it exactly n times. So, fewer assignments does not really imply less work done.
Also observe that there are actually no swaps being done. Only assignments and comparisons.

Resources