Expected runtime of Previous Larger Element algorithm - algorithm

The following algorithm returns the previous larger element of an array. It is from page 11 of these notes.
// Input: An array of numeric values a[1..n]
// Returns: An array p[1..n] where p[i] contains the index of the previous
// larger element to a[i], or 0 if no such element exists.
previousLarger(a[1..n])
for (i = 1 to n)
j = i-1;
while (j > 0 and a[j] <= a[i]) j--;
p[i] = j;
return p
My homework question is: Given input sequence {a1,...,an} is a random permutation of the set {1,...,n}, what is the expected running time?
I think this requires some sort of probabilistic analysis, but I need some hints since I have only done worst-case analysis in the past. I'm trying to find a formula for the cost of the j-loop for a given i (1 + the number of times we do operation j--), then sum that formula up from 1 to n.
What does "expected" mean? I don't really know how to interpret this.

Building on #Heuster's answer:
1) You know that the answer is between O(n) and O(n^2). This is just to check the final result.
2) The expected number of steps for element i would indeed be:
sum_{k=1}^i 1 / (k+1)
= O(log i)
3) You have to sum all those number over i. This gives you:
sum_{i=1}^n O(log i)
= O(n log n)
What I did is not rigorous at all but you can prove derive it. O(n log n) is between O(n) and O(n^2) so it seems a good candidate :)

For and arbitrary index i, what is the chance that a[i-1] > a[i] (in other words, the inner while loop will take one step)? That one is easy: all elements in a are different, so P(a[i-1] > a[i]) = P(a[i] > a[i-1]) = 1/2.
Now, look at the case that the inner while loop would need to take two steps. That is, a[i-2] > a[i] > a[i-1]. This is exactly one of the 6 permutations of 3 elements, so the chance is 1 / 3! = 1 / 6.
Let's generalize this and assume that the inner while loop would need to take k steps. We consider the sublist a[i-k], a[i-k+1], ..., a[i]. We know that a[i-k] is the maximum element of this sublist and a[i] the second largest (otherwise, the inner loop would have stopped sooner). The order of the elements in between is irrelevant. The chance that we take k steps is thus 1 / (k + 1) * 1 / k = 1 / (k * (k + 1)). Note that this indeed degeneralizes to 1/2 for k = 1 and 1/6 for k = 2.
The chance that no element before a[i] is larger is simply 1 / i (a[i] is the maximum element in that sublist). In that case, the inner loop would need i steps.
The expected number of steps for element i would be (sum of probability times value):
Sum[{k, 1, i} k * 1 / ((k * (k + 1))] + i / i
= Sum[{k, 1, i} 1 / (k + 1)] + 1
= H_{i+1}
where H_{i} is the ith harmonic number, which is the discrete variant of log i. That is, the number of steps for element i is Θ(i).
What is remaining now is sum over all i to find the expected running time. With the exact value (H_{i+1}) this doesn't lead to a nice expression, see Wolfram Alpha.
The standard way to proceed, however, is to continue with the approximated log i. Clearly, log 0 + log 1 + ... + log n-1 is less than n log n. Now, consider the last half of the sum:
log n/2 + log n/2+1 + ... + log n-1 > n/2 log n/2
= n/2 (log n - log 2)
= Ω(n log n)
Therefore, the expected running time is Θ(n log n).

Related

How does my randomly partitioned array look in the general case?

I have an array of n random integers
I choose a random integer and partition by the chosen random integer (all integers smaller than the chosen integer will be on the left side, all bigger integers will be on the right side)
What will be the size of my left and right side in the average case, if we assume no duplicates in the array?
I can easily see, that there is 1/n chance that the array is split in half, if we are lucky. Additionally, there is 1/n chance, that the array is split so that the left side is of length 1/2-1 and the right side is of length 1/2+1 and so on.
Could we derive from this observation the "average" case?
You can probably find a better explanation (and certainly the proper citations) in a textbook on randomized algorithms, but here's the gist of average-case QuickSort, in two different ways.
First way
Let C(n) be the expected number of comparisons required on average for a random permutation of 1...n. Since the expectation of the sum of the number of comparisons required for the two recursive calls equals the sum of the expectations, we can write a recurrence that averages over the n possible divisions:
C(0) = 0
1 n−1
C(n) = n−1 + ― sum (C(i) + C(n−1−i))
n i=0
Rather than pull the exact solution out of a hat (or peek at the second way), I'll show you how I'd get an asymptotic bound.
First, I'd guess the asymptotic bound. Obviously I'm familiar with QuickSort and my reasoning here is fabricated, but since the best case is O(n log n) by the Master Theorem, that's a reasonable place to start.
Second, I'd guess an actual bound: 100 n log (n + 1). I use a big constant because why not? It doesn't matter for asymptotic notation and can only make my job easier. I use log (n + 1) instead of log n because log n is undefined for n = 0, and 0 log (0 + 1) = 0 covers the base case.
Third, let's try to verify the inductive step. Assuming that C(i) ≤ 100 i log (i + 1) for all i ∈ {0, ..., n−1},
1 n−1
C(n) = n−1 + ― sum (C(i) + C(n−1−i)) [by definition]
n i=0
2 n−1
= n−1 + ― sum C(i) [by symmetry]
n i=0
2 n−1
≤ n−1 + ― sum 100 i log(i + 1) [by the inductive hypothesis]
n i=0
n
2 /
≤ n−1 + ― | 100 x log(x + 1) dx [upper Darboux sum]
n /
0
2
= n−1 + ― (50 (n² − 1) log (n + 1) − 25 (n − 2) n)
n
[WolframAlpha FTW, I forgot how to integrate]
= n−1 + 100 (n − 1/n) log (n + 1) − 50 (n − 2)
= 100 (n − 1/n) log (n + 1) − 49 n + 100.
Well that's irritating. It's almost what we want but that + 100 messes up the program a little bit. We can extend the base cases to n = 1 and n = 2 by inspection and then assume that n ≥ 3 to finish the bound:
C(n) = 100 (n − 1/n) log (n + 1) − 49 n + 100
≤ 100 n log (n + 1) − 49 n + 100
≤ 100 n log (n + 1). [since n ≥ 3 implies 49 n ≥ 100]
Once again, no one would publish such a messy derivation. I wanted to show how one could work it out formally without knowing the answer ahead of time.
Second way
How else can we derive how many comparisons QuickSort does in expectation? Another possibility is to exploit the linearity of expectation by summing over each pair of elements the probability that those elements are compared. What is that probability? We observe that a pair {i, j} is compared if and only if, at the leaf-most invocation where i and j exist in the array, either i or j is chosen as the pivot. This happens with probability 2/(j+1 − i), since the pivot must be i, j, or one of the j − (i+1) elements that compare between them. Therefore,
n n 2
C(n) = sum sum ―――――――
i=1 j=i+1 j+1 − i
n n+1−i 2
= sum sum ―
i=1 d=2 d
n
= sum 2 (H(n+1−i) − 1) [where H is the harmonic numbers]
i=1
n
= 2 sum H(i) − n
i=1
= 2 (n + 1) (H(n+1) − 1) − n. [WolframAlpha FTW again]
Since H(n) is Θ(log n), this is Θ(n log n), as expected.

How can I calculate the time function T(n) of the following code?

x=0;
for(int i=1 ; i<=n ; i++){
for(int j=1 ; j<=n ; j++){
x++;
n--;
}
}
By testing the code, the nested FOR loop recurs ⌈n/2⌉ times per steps of the first For loop.
But I don't know how to formulate these rules with sigmas. I would really appreciate if anyone can help me with this.
You can express T(n) as T(n-2)+1, i.e. T(n)=T(n-2)+1 Or its expected time complexity is O(n/2) => O(n).
Edit: T(n-2)+1 expression is evaluated as you can see if you increase n-2 by 2 means when n-2 became n, the number of times the loop will be executed is the 1 + number of time loop executed for n-2. 1 is because you are incrementing j and decrementing n simultaneously. it is exactly the same as incrementing j by 2.
Let's compute the exact value of x.
TL;DR: x(N) = N-[N/2^i+1], where i is the lowest number, satisfying the condition: (i+1) 2^i > N. As Mariano Demarchi said, T(N)=O(N).
First we will check how variables change after each inner loop. Let we have (n, i, x) values between 2 and 3 lines in code (before the inner loop):
How many iterations will happens? Each iteration increases j and decreases n, so the distance between them decreases by two. Start distance is n-1, and final, after the loop, is 0 (if n is odd) or -1 (if n is even). Thus if n=2k, the answer is k, otherwise k+1. So, the inner loop makes [(n+1)/2] = d iterations.
Thus x will increase by d, n becomes n-d and i becomes i+1.
(n, i, x) -> (n-d, i+1, x+d) or equal ([n/2], i+1, x + [(n+1)/2])
Now concentrate on values of n and i variables in the big loop:
They changes like that: (n, i) -> ([n/2], i+1)
The stop-condition is [N/2^i] < i+1, which is equals to (i+1)*2^i > N. Of course, we need minimal i, satisfying the condition.
So, i is the first number satisfying the condition and we DO NOT SUM further:
x = [(N+1)/2] + [([N/2]+1)/2] + [([N/2^2]+1)/2] + ... + [([N/2^i-1]+1)/2]
By the number theory magic (not related on this question) this series is equals to N (1-1/2^i+1). Particularly, if N is a power of 2 minus 1, we can see it easily.
So, this code returns exactly the same value in O(log(N)) time.
// finding i
unsigned long long i = 0;
while ((i + 1) * (1ll << i) < n)
++i;
// finding x
x = n - (n >> (i + 1));
In the inner loop, given that n decrements at the same time that j increments, n is going to be lower than j at the middle of the difference between both initial values, that is (n-1)/2.
That's why your tests show that the inner loop runs ⌈n/2⌉ times per each iteration of the outer loop.
Then the outer loop is going to stop when for the i that satisfies this equality n/2^i = i-1. This affects the outer loop stopping condition.
T(n)
=
n/2 + T(n/2)
=
n/2 + n/4 + T(n/4)
=
n (1/2 + 1/4 + ... + 1/(2^i))
This series converges to n so that algorithm is O(n).

Runtime Analysis of Insertion Sort

I am trying to compute the run-time analysis of this Insertion Sort algorithm:
1) n = length[A]
2) count = 0
3) for (i=1; i<=n; i++)
4) for (j=1; j<=i; j++)
5) if A[j] <= 100
6) for (k=j; k<=j+2*i; k++)
7) A[j] = A[j]-1
8) count = count+1
9) return (count)
I have watched some videoes on youtube like: https://www.youtube.com/watch?v=tmKUHLs21PU
I have also read by book and I cannot find anything online that is similair to this (because of the 3 nested for loops and and if statement).
Now I am pretty good up until about like 5.
I understand that the runtime for line 3 is n, and for 4 it is Σ j =1 to n (tj)
after that I am completely lost, I know that there are to 'Σ's involved with the if statement and 3rd for loop. Can somebody please explain in detail what to do next and why it is like that. Thank you.
This sounds a lot like a homework problem, and it wouldn't be doing you any favors to just give you all the answers, but here are some principles that can hopefully help you figure out the rest on your own.
Line 4 will happen once the first time through the outer loop, twice the second time, and so forth up to n times on the nth time through the loop.
1 + 2 + ... + n
If we rearrange these to put the first and last addend together, then the second and the second-to-last, we see a pattern:
1 + 2 + ... (n-1) + n
= (n + 1) + (n - 1 + 2) + ... + (n - n/2 + n/2 + 1)
= (n + 1) + (n + 1) + ... + (n + 1)
= (n + 1) * n/2
= n²/2 + n/2
In terms of asymptotic complexity, the constant 1/2 and the n are outweighed by the n², so the big-O of line 4 is n².
Line 5 will have to be evaluated as many times as line 4 runs, regardless of what it evaluates to, so that'll be n². But how many times the lines inside it are run will depend on the values in the array. This is where you start running into best-case and worst-case complexity.
In the best case, the value in the array will always be greater than 100, so the complexity of the entire algorithm is equal to the complexity of line 5.
In the worst case, the value in A[j] will always be less than or equal to 100, so the for loop on line 6 will be evaluated, increasing the complexity of the overall algorithm.
I'll leave you to figure out how the remaining lines will affect the overall complexity.
And by the way, this doesn't look like an insertion sort to me. It's not comparing array values to each other and swapping their positions in an array. It's comparing array values to a constant (100) and reducing their values based on their position in the array.

What is the running time of this modified selection sort algorithm?

I'm prepping for software developer interviews and reviewing algorithms. I'm stuck on a question that asks "Given a modified selection sort algorithm that returns in sorted order the k smallest elements in an array of size n what is its worst case running time in terms of n and k?"
Modified selection sort algorithm:
A = [1...n] //an array of size n
for i = 1 to k
smallest = A[i]
for j = i + 1 to n
if A[j] < A[smallest]
smallest = j
swap (A[i], A[smallest])
I'm guessing it's O(nk) but not sure why.
The outer loop runs k times. For each iteration of outer loop, the inner loop makes O(n) iterations.
More mathematically, the inner loop runs:
(n-1) + (n-2) + (n-3) + .... + (n-k) times
= n*k - k*(k+1)/2
= k* (n - k/2 -1/2)
~ k * n
Hence, Complexity = O(n*k)
O(nk)
Outer loop picks k elements and inner loop picks n elements

complexity of a randomized search algorithm

Consider the following randomized search algorithm on a sorted array a of length n (in increasing order). x can be any element of the array.
size_t randomized_search(value_t a[], size_t n, value_t x)
size_t l = 0;
size_t r = n - 1;
while (true) {
size_t j = rand_between(l, r);
if (a[j] == x) return j;
if (a[j] < x) l = j + 1;
if (a[j] > x) r = j - 1;
}
}
What is the expectation value of the Big Theta complexity (bounded both below and above) of this function when x is selected randomly from a?
Although this seems to be log(n), I carried out an experiment with instruction count, and found out that the result grows a little faster than log(n) (according to my data, even (log(n))^1.1 better fit the result).
Someone told me that this algorithm has an exact big theta complexity (so obviously log(n)^1.1 is not the answer). So, could you please give the time complexity along with your approach to prove it? Thanks.
Update: the data from my experiment
log(n) fit result by mathematica:
log(n)^1.1 fit result:
If you're willing to switch to counting three-way compares, I can tell you the exact complexity.
Suppose that the key is at position i, and I want to know the expected number of compares with position j. I claim that position j is examined if and only if it's the first position between i and j inclusive to be examined. Since the pivot element is selected uniformly at random each time, this happens with probability 1/(|i - j| + 1).
The total complexity is the expectation over i <- {1, ..., n} of sum_{j=1}^n 1/(|i - j| + 1), which is
sum_{i=1}^n 1/n sum_{j=1}^n 1/(|i - j| + 1)
= 1/n sum_{i=1}^n (sum_{j=1}^i 1/(i - j + 1) + sum_{j=i+1}^n 1/(j - i + 1))
= 1/n sum_{i=1}^n (H(i) + H(n + 1 - i) - 1)
= 1/n sum_{i=1}^n H(i) + 1/n sum_{i=1}^n H(n + 1 - i) - 1
= 1/n sum_{i=1}^n H(i) + 1/n sum_{k=1}^n H(k) - 1 (k = n + 1 - i)
= 2 H(n + 1) - 3 + 2 H(n + 1)/n - 2/n
= 2 H(n + 1) - 3 + O(log n / n)
= 2 log n + O(1)
= Theta(log n).
(log means natural log here.) Note the -3 in the low order terms. This makes it look like the number of compares is growing faster than logarithmic at the beginning, but the asymptotic behavior dictates that it levels off. Try excluding small n and refitting your curves.
Assuming rand_between to implement sampling from a uniform probability distribution in constant time, the expected running time of this algorithm is Θ(lg n). Informal sketch of a proof: the expected value of rand_between(l, r) is (l+r)/2, the midpoint between them. So each iteration is expected to skip half of the array (assuming the size is a power of two), just like a single iteration of binary search would.
More formally, borrowing from an analysis of quickselect, observe that when you pick a random midpoint, half of the time it will be between ¼n and ¾n. Neither the left nor the right subarray has more than ¾n elements. The other half of the time, neither has more than n elements (obviously). That leads to a recurrence relation
T(n) = ½T(¾n) + ½T(n) + f(n)
where f(n) is the amount of work in each iteration. Subtracting ½T(n) from both sides, then doubling both sides, we have
½T(n) = ½T(¾n) + f(n)
T(n) = T(¾n) + 2f(n)
Now, since 2f(n) = Θ(1) = Θ(n ᶜ log⁰ n) where c = log(1) = 0, it follows by the master theorem that T(n) = Θ(n⁰ lg n) = Θ(lg n).

Resources