Number of assignments necessary to find the minimum value in an array? - algorithm

Someone asked me a brainteaser, and I don't know; my knowledge slows down after amortized analysis, and in this case, this is O(n).
public int findMax(array) {
int count = 0;
int max = array[0];
for (int i=0; i<array.length; i++) {
if (array[i] > max) {
count++;
max = array[i];
}
}
return count;
}
What's the expected value of count for an array of size n?
Numbers are randomly picked from a uniform distribution.

Let f(n) be the average number of assignments.
Then if the last element is not the largest, f(n) = f(n-1).
If the last element is the largest, then f(n) = f(n-1) + 1.
Since the last number is largest with probability 1/n, and not the largest with probability (n-1)/n, we have:
f(n) = (n-1)/n*f(n-1) + 1/n*(f(n-1) + 1)
Expand and collect terms to get:
f(n) = f(n-1) + 1/n
And f(1) = 0. So:
f(1) = 0
f(2) = 0 + 1/2
f(3) = 0 + 1/2 + 1/3
f(4) = 0 + 1/2 + 1/3 + 1/4
That is, f(n) is the n_th "Harmonic number", which you can get in closed form only approximately. (Well, one less than the n_th Harmonic number. The problem would be prettier if you initialized max to INT_MIN and just let the loop run, so that f(1) = 1.)
The above is not a rigorous proof, since I was sloppy about expected values versus actual values. But I believe the answer is right anyway :-).

I would like to comment on Nemo's answer, but I don't have the reputation to comment. His correct answer can be simplified:
The chance that the second number is larger than the first is 1/2. Regardless of that, the chance that the 3rd number is larger than two before, is 1/3. These are all independent chances and the total expectation is therefore
1/2 + 1/3 + 1/4 + .. + 1/n

You can actually take this analysis a step further when the value of each item comes from a finite set. Let E(N, M) be the expected number of assignments when finding the max of N elements that come uniformly from an alphabet of size M. Then we can say...
E(0, M) = E(N, 0) = 0
E(N, M) = 1 + SUM[SUM[E(j, i) * (N - 1 Choose j) * ((M - i) / M)^(N-j-1) * (i / M) ^ j : j from 0 to N - 1] : i from 0 to M - 1]
This is a bit hard to come up with a closed form for but we can be sure that E(N, M) is in O(log(min(N, M))). This is because E(N, INF) is in THETA(log(N)) as the harmonic series sum grows proportional to the log function and E(N, M) < E(N, M + 1). Likewise when M < N we have E(N, M) < E(M, INF) as there is at M unique values.
And here's some code to compute E(N, M) yourself. I wonder if anyone can get this to a closed form?
#define N 100
#define M 100
double NCR[N + 1][M + 1];
double E[N + 1][M + 1];
int main() {
NCR[0][0] = 1;
for(int i = 1; i <= N; i++) {
NCR[i][0] = NCR[i][i] = 1;
for(int j = 1; j < i; j++) {
NCR[i][j] = NCR[i - 1][j - 1] + NCR[i - 1][j];
}
}
for(int n = 1; n <= N; n++) {
for(int m = 1; m <= M; m++) {
E[n][m] = 1;
for(int i = 1; i < m; i++) {
for(int j = 1; j < n; j++) {
E[n][m] += NCR[n - 1][j] *
pow(1.0 * (m - i) / m, n - j - 1) *
pow(1.0 * i / m, j) * E[j][i] / m;
}
}
}
}
cout << E[N][M] << endl;
}

I am assuming all elements are distinct and counting the initial assignment to max outside the for loop.
If the array is sorted in increasing order, the variable max gets assigned to exactly n times (each time it gets a greater value).
If the array is sorted in decreasing order, the variable max gets assigned to exactly once (it gets assigned the first time and all subsequent values are smaller).
Edit:
My formulation for a randomly permuted array was actually wrong, as pointed out in the comments. I think #Nemo posts the correct answer to this.
I think just counting the number of assignments is not really a true measure of the cost of this function. whether or not we actually update the value of max, we are actually comparing it exactly n times. So, fewer assignments does not really imply less work done.
Also observe that there are actually no swaps being done. Only assignments and comparisons.

Related

Time and Space algorithm complexity

I am coding brute force approach for one coding problem - I need to count the maximum score path in the array with maximum step k.
Input: nums = [1,-1,-2,4,-7,3], k = 2
Output: 7
Explanation: You can choose your jumps forming the subsequence [1,-1,4,3] (underlined above). The sum is 7.
And I encountered a problem with calculating complexity. My thought was that on each elemnt we may call function k times, so time and space are O(k^n), where n is length of the array. My second guess: for first element we call function at most 1 time, for second 2 times (that is if k > i) and so on. So we have sum 1 + 2 + ... + k + k + ... + k = ((1 + k) / 2)k + ((k + k) / 2) / (n-k) = O(k^2). I think the first one is correct, but I can't tell for sure why :/
Here's my Java code:
public int maxResult(int[] nums, int k) {
return maxResult(nums, k, nums.length - 1);
}
private int maxResult(int[] nums, int k, int index) {
if (index == 0)
return nums[0];
int max = Integer.MIN_VALUE;
int start = index - k < 0 ? 0 : index - k;
for ( int i = start; i < index; i++ ) {
int res = maxResult(nums, k, i);
System.out.println(i);
max = Math.max(res, max);
}
return max + nums[index];
}
The recurrence relation for your code for a particular k is
C(n) = sum(C(n-i) for i = 1...k) for n>k
C(n) = C(1) + C(2) + ... + C(n-1) for n <= k
C(1) = 1
These are the recurrence relations for the higher-order Fibonacci numbers, shifted by k-1 places. That is, C(n) = kFib(k, n+k-1). The k-Fibonacci numbers grow as Theta(alpha^n) where alpha is some constant based on k -- for k=2, alpha is the golden ratio, and as k increases, alpha gets closer and closer to 2. (Specifically, alpha is is the positive root of (x^k - x^(k-1) - ... - x - 1)).
Therefore C(n) = kFib(k, n+k-1) = Theta(alpha^(n+k)).
Because alpha is always less than 2, O(2^(n+k)) is a simple correct bound, although not a tight one.

What's the big O for this triple nested loop?

Outer loop is O(n), 2nd loop is O(n^2) and 3rd loop is also O(n^2), but the 3rd loop is conditional.
Does that mean the 3rd loop only happens 1/n (1 every n) times and therefore total big O is O(n^4)?
for (int i = 1; i < n; i++) {
for (int j = 1; j < (n*n); j++) {
if (j % i == 0) {
for (int k = 1; k < (n*n); k++) {
// Simple computation
}
}
}
}
For any given value of i between 1 and n, the complexity of this part:
for (int j = 1; j < (n*n); j++) {
if (j % i == 0) {
for (int k = 1; k < (n*n); k++) {
// Simple computation
}
}
}
is O(n4/i), because the if-condition is true one ith of the time. (Note: if i could be larger than n, then we'd need to write O(n4/i + n2) to include the cost of the loop iterations where the if-condition was false; but since i is known to be small enough that n4/i ≥ n2, we don't need to worry about that.)
So the total complexity of your code, adding together the different loop iterations across all values of i, is O(n4/1 + n4/2 + n4/3 + ⋯ + n4/n) = O(n4 · (1/1 + 1/2 + 1/3 + ⋯ + 1/n)) = O(n4 log n).
(That last bit relies on the fact that, since ln(n) is the integral of 1/x from 1 to n, and 1/x is decreasing over that interval, we have ln(n) < ln(n+1) < (1/1 + 1/2 + 1/3 + ⋯ + 1/n) < 1 + ln(n).)

Big O complexity on dependent nested loops

Can I get some help in understanding how to solve this tutorial question! I still do not understand my professors explanation. I am unsure of how to count the big 0 for the third/most inner loop. She explains that the answer for this algorithm is O(n^2) and that the 2nd and third loop has to be seen as one loop with the big 0 of O(n). Can someone please explain to me the big O notation for the 2nd / third loop in basic layman terms
Assuming n = 2^m
for ( int i = n; i > 0; i --) {
for (int j =1; j < n; j *= 2){
for (int k =0; k < j; k++){
}
}
}
As far as I understand, the first loop has a big O notation of O(n)
Second loop = log(n)
Third loop = log (n) (since the number of times it will be looped has been reduced by logn) * 2^(2^m-1)( to represent the increase in j? )
lets add print statement to the innermost loop.
for (int j =1; j < n; j *= 2){
for (int k =0; k < j; k++){
print(1)
}
}
output for
j = 1, 1 1
j = 2, 1 1 1
j = 4, 1 1 1 1 1
...
j = n, 1 1 1 1 1 ... n+1 times.
The question boils down to how many 1s will this print.
That number is
(2^0 + 1) + (2^1 + 1) + (2^2 + 1) + ... + (n + 1)
= (2^0 + 1) + (2^1 + 1) + (2^2 + 1) + ... + (n + 1)
= log n + (1 + 2 + 4 + ... + n)
= O(log n + n)
= O(n).
assuming you know why (1 + 2 + 4 + ... + n) = O(n)
O-notation is an upperbound. You can say it has O(n^2). For least upperbound, I believe it should be O(n*log(n)*log(n)) which belongs to O(n^2).
It’s because of the logarithm. If you have log(16) raised to the power 2 is 16. So log(n) raised to the power of 2 is n. That is why your teacher says to view the second and third loop as O(n) together.
If the max iterations for the second loop are O(log(n)) then the second and third loops will be: O(1 + 2 + 3 + ... + log(n)) = O(log(n)(log(n) + 1)/2) = O((log(n)^2 + log(n))/2) = O(n)
for ( int i = n; i > 0; i --) { // This runs n times
for (int j =1; j < n; j *= 2){ // This runs atmost log(n) times, i.e m times.
for (int k =0; k < j; k++){ // This will run atmost m times, when the value of j is m.
}
}
}
Hence, the overall complexity will be the product of all three, as mentioned in the comments under the question.
Upper bound can be loose or tight.
You can say that it is loosely bound under O(n^2) or tightly bound under O(n * m^2).

Number of ways to take k steps on a path of length N

We have a path of length N. At a time we can only take a unit step. How many ways we can take K steps while remaining inside the path. Initially we are at the 0th position.
example N =5
|---|---|---|---|---|
0 1 2 3 4 5
if k = 3 then we move like -
0->1->2->1
0->1->0->1
0->1->2->3
Can you please give some directions/links on how to approach this problem?
It's likely to be solvable using combinatorial methods rather than computational methods. But since you're asking on stackoverflow, I assume you want a computational solution.
There's a recurrence relation defining the number of paths ending at i:
P[N, 0, i] = 1 if i==0 otherwise 0
P[N, K, i] = 0 if i<0 or i>N
P[N, K, i] = P[N, K-1, i-1] + P[N, K-1, i+1]
We can iteratively compute the array of P[N, K, i] for i=0..N for a given K from the array P[N, K-1, i] for i=0..N.
Here's some Python code that does this. It uses a small trick of having an extra 0 at the end of the array so that r[-1] and r[N+1] are both zero.
def paths(N, K):
r = [1] + [0] * (N+1)
for _ in xrange(K):
r = [r[i-1]+r[i+1] for i in xrange(N+1)] + [0]
return sum(r)
print paths(5, 3)
This runs in O(NK) time.
A different (but related) solution is to let M be the (N+1) by (N+1) matrix consisting of 1's at positions (i+1,i) and (i,i+1) for i=0..N+1, and 0's elsewhere -- that is, 1's on the subdiagonal and superdiagonal. Then M^K (that is, M raised to the Kth power) contains at position (i, j) the number of paths from i to j in K steps. So sum(M^K[0,i] for i=0..N) is the total number of all paths starting at 0 of length K. This runs in O(N^3logK) time, so is better than the iterative method only if K is much larger than N.
Java implementation of first approach in accepted answer -
for (int i = 0; i <= K; i++) {
for (int j = 1; j <= N; j++) {
if (i > 0)
dp1[i][j] = (dp1[i - 1][j - 1] + dp1[i - 1][j + 1]) % 1000000007;
else
dp1[i][j] = 1;
}
}
System.out.println(dp1[K][N-1])
Complexity O(KN)
Java DP implementation, it computes answers for all starting positions and values 1-N and 1-K -
for (int i = 0; i <= K; i++) {
for (int j = 1; j <= N; j++) {
for (int k = 1; k <= j; k++) {
if (i > 0)
dp[k][j][i] =
(dp[k - 1][j][i - 1] + dp[k + 1][j][i - 1]) % 1000000007;
else
dp[k][j][i] = 1;
}
}
}
System.out.println(dp[1][5][3]);
O(KN^2)

running time of algorithm does not match the reality

I have the following algorithm:
I analyzed this algoritm as follow:
Since the outer for loop goes from i to n it iterates at most n times,
and the loop on j iterates again from i to n which we can say at most n times,
if we do the same with the whole algorithm we have 4 nested for loop so the running time would be O(n^4).
But when I run this code for different input size I get the following result:
As you can see the result is much closer to n^3? can anyone explain why does this happen or what is wrong with my analysis that I get a loose bound?
Formally, you may proceed like the following, using Sigma Notation, to obtain the order of growth complexity of your algorithm:
Moreover, the equation obtained tells the exact number of iterations executed inside the innermost loop:
int sum = 0;
for( i=0 ; i<n ; i++ )
for( j=i ; j<n ; j++ )
for( k=0 ; k<j ; k++ )
for( h=0 ; h<i ; h++ )
sum ++;
printf("\nsum = %d", sum);
When T(10) = 1155, sum = 1155 also.
I'm sure there's a conceptual way to see why, but you can prove by induction the above has (n + 2) * (n + 1) * n * (n - 1) / 24 loops. Proof left to the reader.
In other words, it is indeed O(n^4).
Edit: You're count increases too frequently. Simply try this code to count number of loops:
for (int n = 0; n < 30; n++) {
int sum = 0;
for (int i = 0; i < n; i++) {
for (int j = i; j < n; j++) {
for(int k = 0; k < j; k++) {
for (int h = k; h < i; h++) {
sum++;
}
}
}
}
System.out.println(n + ": " + sum + " = " + (n + 2) * (n + 1) * n * (n - 1) / 24);
}
You are having a rather complex algorithm. The number of operations is clearly less than n^4, but it isn't at all obvious how much less than n^4, and whether it is O (n^3) or not.
Checking the values n = 1 to 9 and making a guess based on the results is rather pointless.
To get a slightly better idea, assume that the number of steps is either c * n^3 or d * n^4, and make a table of the values c and d for 1 <= n <= 1,000. That might give you a better idea. It's not a foolproof method; there are algorithms changing their behaviour dramatically much later than at n = 1,000.
Best method is of course a proof. Just remember that O (n^4) doesn't mean "approximately n^4 operations", it means "at most c * n^4 operations, for some c". Sometimes c is small.

Resources