Algorithm bound analysis and using integrals - algorithm

I am supposed to get a lower and an upper bound of an alogrithm using integrals, but I don't understand how to do that. I know basic integration principles but I don't know how to figure out the integral from the algorithm.
Problem:
My first for loop starts at i = 5n ---> and goes to 6n cubed,
the next one inside would be starting at j=5 ---> and going to i,
then the final next for loop would be starting at k=j and going to i.
Naturally, my first step was to turn this into 3 summations. So I have my 3 summations set up and what I'm wanting to do is simplify these to just one summation if possible. That way if I have some variables to the right of my summation I can now take the integral.
In terms of the bounds I'm using for my integral, from Introduction to Algorithms by Cormen, Leiserson, etc. you can do approximation by integrals.
Nature of the integrals:
For the upper bound your bounds of the integral may be: upper bound n+1, lower bound m.
For the lower bound your bounds of your integral may be: upper bound n, lower bound m-1.
I want to know how to simplify my three summations into one if possible. If things are to one summation I can start to take the integral and go from there myself.
This is very rough pseudo code, but I tried my best to make it look similar to actual code.
for(i = 5n; i<6n^3; i++)
{
for(j =5; j<i; j++)
{
for(k=j; k < i; k++)
{
i - j + k;
}
}
}

Let any of int(i,j,f) or int(x=i,j,f(x)) or ∫(x=i,j,f(x)) denote the definite integral of f(x) as x ranges from i to j. If f(x) is the amount of work done (by a program) when x has a particular value, and if f is a monotonically increasing function, then as you point out in the question, int(m,n+1,f) is an upper bound, and int(m+1,n,f) a lower bound, on the work done as x takes the values m...n. In following, I will say that int(m,n,f) approximates the work, and you can add +1 terms where appropriate to get upper and lower bounds. Note, 6n^3-1 stands for 6*(n^3)-1, 5n for 5*n, etc.
Approximate work is:
int(i=5n, 6n^3-1, u(i))
where u(i) is
int(j=5, i-1, v(i,j))
where v(i,j) is
int(k=j, i-1, w(k))
where w(k) is 1. In following we use functions p, q, r to stand for indefinite integrals, and C for constants of integration that cancel out for definite integrals.
Let r(x) = ∫1dx = x + C.
Now v(i,j) = ∫(k=j, i-1, 1) = r(i-1)-r(j) = i-1-j.
Let q(x,i) = ∫(i-1-x)dx = x*(i-1)-x*x/2 + C.
Now u(i) = ∫(j=5, i-1, i-1-j) = q(i-1,i)-q(5,i)
which is some quadratic in i. You will need to work out the details for the upper and lower bound cases.
Let p(x) = ∫u(x)dx = ∫(q(x-1,x)-q(5,x)),
which is some cubic in x. The overall result is
p(6n^3-1)-p(5n)
and again you will need to work out the details. But note that when 6n^3-1 is substituted for x in p(x), the order is going to be (6n^3-1)^3, that is, O(n^9), so you should expect upper and lower bound expressions that are O(n^9). Note, you can also see the O(n^9) order by inspecting your loops: In for(i=5n; i<6n^3; i++), i will average about 3n^3. In for(j =5; j<i; j++), j will average about i/2, or some small multiple of n^3. In for(k=j; k < i; k++), k-j also will average a small multiple of n^3. Thus, expression i-j+k will be evaluated some small multiple of n^3*n^3*n^3, or n^9, times.

Related

How do you determine the average-case complexity of this algorithm?

It's usually easy to calculate the time complexity for the best case and the worst case, but when it comes to the average case especially when there's a probability p given, I don't know where to start.
Let's look at the following algorithm to compute the product of all the elements in a matrix:
int computeProduct(int[][] A, int m, int n) {
int product = 1;
for (int i = 0; i < m; i++ {
for (int j = 0; j < n; j++) {
if (A[i][j] == 0) return 0;
product = product * A[i][j];
}
}
return product;
}
Suppose p is the probability of A[i][j] being 0 (i.e. the algorithm terminates there, return 0); how do we derive the average case time complexity for this algorithm?
Let’s consider a related problem. Imagine you have a coin that flips heads with probability p. How many times, on expectation, do you need to flip the coin before it comes up heads? The answer is 1/p, since
There’s a p chance that you need one flip.
There’s a p(1-p) chance that you need two flips (the first flip has to go tails and the second has to go heads).
There’s a p(1-p)^2 chance that you need three flips (the first two flips need to go tails and the third has to go heads)
...
There’s a p(1-p)^(k-1) chance that you need k flips (the first k-1 flips need to go tails and the kth needs to go heads.)
So this means the expected value of the number of flips is
p + 2p(1 - p) + 3p(1 - p)^2 + 4p(1 - p)^3 + ...
= p(1(1 - p)^0 + 2(1 - p)^1 + 3(1 - p)^2 + ...)
So now we need to work out what this summation is. The general form is
p sum from k = 1 to infinity (k(1 - p)^k).
Rather than solving this particular summation, let's make this more general. Let x be some variable that, later, we'll set equal to 1 - p, but which for now we'll treat as a free value. Then we can rewrite the above summation as
p sum from k = 1 to infinity (kx^(k-1)).
Now for a cute trick: notice that the inside of this expression is the derivative of x^k with respect to x. Therefore, this sum is
p sum from k = 1 to infinity (d/dx x^k).
The derivative is a linear operator, so we can move it out to the front:
p d/dx sum from k = 1 to infinity (x^k)
That inner sum (x + x^2 + x^3 + ...) is the Taylor series for 1 / (1 - x) - 1, so we can simplify this to get
p d/dx (1 / (1 - x) - 1)
= p / (1 - x)^2
And since we picked x = 1 - p, this simplifies to
p / (1 - (1 - p))^2
= p / p^2
= 1 / p
Whew! That was a long derivation. But it shows that the expected number of coin tosses needed is 1/p.
Now, in your case, your algorithm can be thought of as tossing mn coins that come up heads with probability p and stopping if any of them come up heads. Surely, the expected number of coins you’d need to toss won’t be more than the case where you’re allowed to flip infinitely often, so your expected runtime is at most O(1 / p) (assuming p > 0).
If we assume that p is independent of m and n, then we can notice that at after some initial growth, each added term into our summation as we increase the number of flips is exponentially lower than the previous ones. More specifically, after adding in roughly logarithmically many terms into the sum we’ll be off from the total in the case of the infinite summation. Therefore, provided that mn is roughly larger than Θ(log p), the sum ends up being Θ(1 / p). So in a big-O sense, if mn is independent of p, the runtime is Θ(1 / p).

Big-oh notation of while loop with multiplication of 0.8

I came across a lot of samples around teaching about the big-oh notation of a while loop and the multiplication of variable inside the loop. I still can't get the understanding right.
Codes like this
for(int i = i; i <= n; i = i*2) is considered as lgn because it manipulate the value by multiple of 2.
I have codes like this too
while(i>N)
{
i/=2;
}
which is also considered as lgn as both the variable are being manipulated by 2. However, do it means the same if i changed the codes to something like
while(x > 0.01){
x = x* 0.8;
y = y + x;
}
The main concern is, safely say that the runtime complexity of this loop is log base 0.8 ?
Or is it suppose to be log base 1.25?
I do understand that log base 0.8 and log base 1.25 is not defined therefore the run time complexity of the while loop technically should be is O(n).
Number of loops n is given by
Thus the base is indeed 1.25. However changes in the base only means a multiplicative factor overall, which does not affect the complexity of the algorithm, so it will still be O(log n).

when we do the sum of n numbers using for loop **ex.for(i=1;i<=n;i++)**

when we do the sum of n numbers using for loop for(i=1;i<=n;i++)complexity of this is O(n), but if we do this same computation using the formula of arithmetic/geometric progression series n(n-1)/2 that time if we compute the time complexity, its O(n^2). How ? please solve my doubt.
You are confused by what the numbers are representing.
Basically we are counting the # of steps when we talking about complexity.
n(n+1)/2 is the answer of Summation(1..n), that's correct, but different way take different # of steps to compute it, and we are counting the # of such steps.
Compare the following:
int ans = 0;
for(int i=1; i<=n;i++) ans += i;
// this use n steps only
int ans2 = 0;
ans2 = n*(n+1)/2;
// this use 1 step!!
int ans3 = 0;
for(int i=1, mx = n*(n+1)/2; i<=mx; i++) ans3++;
// this takes n*(n+1)/2 step
// You were thinking the formula would look like this when translated into code!
All three answers give the same value!
So, you can see only the first method & the third method (which is of course not practical at all) is affected by n, different n will cause them take different steps, while the second method which uses the formula, always take 1 step no matter what is n
Being said, if you know the formula beforehand, it is always the best you just compute the answer directly with the formula
Your second formula has O(1) complexity, that is, it runs in constant time, independent of n.
There's no contradiction. The complexity is a measure of how long the algorithm takes to run. Different algorithms can compute the same result at different speeds.
[BTW the correct formula is n*(n+1)/2.]
Edit: Perhaps your confusion has to do with an algorithm that takes n*(n+1)/2 steps, which is (n^2 + n)/2 steps. We call that O(n^2) because it grows essentially (asymptotically) as n^2 when n gets large. That is, it grows on the order of n^2, the high order term of the polynomial.

General method to fit a number into a sequence

The general problem is as follows. Given an increasing sequence of positive integers 0 < s_1 < s_2 < s_3 < ... and a positive integer n, is there an efficient algorithm to find the (unique) index k such that s_k <= n < s_(k+1)?
A concrete example of this problem with a particular nice solution is to find the largest nonzero digit of a binary expansion, i.e. take s_i = 2^(i-1), and then k = log_2(n).
A slightly harder example is to find the largest nonzero digit in the factorial expansion, i.e. take s_i = i!.
The example that I have in mind that brings up this question is the following:
s_i = ith triangular number = 1 + 2 + ... + i = i(i+1)/2
I'd like a nice solution to this, meaning something better than the following
for(int i=1; ; ++i) {
if (triangle[i] > n)
break;
}
return i;
NOTE: One cannot use a binary search here since the sequence is infinite. Of course, there is the obvious constraint that k <= n, but this is a horrible bound in general. For example, if s_i = i!, then using a binary search on n=20 requires computing 20! when the answer is k=3, so one shouldn't need to compute beyond 4!.
A general approach: Try solving the equation n = s(x) and the set k = floor(x).
For s_i=2^(i-1) you get x=log2(n)+1. For s_i=i*(i+1)/2 you get x=(sqrt(1+8n)-1)/2.
In case that the equation is not solvable analytically, try an approximation (e.g. Newton's method), or simply use a binary search on the sequence.

Finding time complexity of partition by quick sort metod

Here is an algorithm for finding kth smallest number in n element array using partition algorithm of Quicksort.
small(a,i,j,k)
{
if(i==j) return(a[i]);
else
{
m=partition(a,i,j);
if(m==k) return(a[m]);
else
{
if(m>k) small(a,i,m-1,k);
else small(a,m+1,j,k);
}
}
}
Where i,j are starting and ending indices of array(j-i=n(no of elements in array)) and k is kth smallest no to be found.
I want to know what is the best case,and average case of above algorithm and how in brief. I know we should not calculate termination condition in best case and also partition algorithm takes O(n). I do not want asymptotic notation but exact mathematical result if possible.
First of all, I'm assuming the array is sorted - something you didn't mention - because that code wouldn't otherwise work. And, well, this looks to me like a regular binary search.
Anyway...
The best case scenario is when either the array is one element long (you return immediately because i == j), or, for large values of n, if the middle position, m, is the same as k; in that case, no recursive calls are made and it returns immediately as well. That makes it O(1) in best case.
For the general case, consider that T(n) denotes the time taken to solve a problem of size n using your algorithm. We know that:
T(1) = c
T(n) = T(n/2) + c
Where c is a constant time operation (for example, the time to compare if i is the same as j, etc.). The general idea is that to solve a problem of size n, we consume some constant time c (to decide if m == k, if m > k, to calculate m, etc.), and then we consume the time taken to solve a problem of half the size.
Expanding the recurrence can help you derive a general formula, although it is pretty intuitive that this is O(log(n)):
T(n) = T(n/2) + c = T(n/4) + c + c = T(n/8) + c + c + c = ... = T(1) + c*log(n) = c*(log(n) + 1)
That should be the exact mathematical result. The algorithm runs in O(log(n)) time. An average case analysis is harder because you need to know the conditions in which the algorithm will be used. What is the typical size of the array? The typical size of k? What is the mos likely position for k in the array? If it's in the middle, for example, the average case may be O(1). It really depends on how you use this.

Resources