recursive mergesort function analysis - data-structures

need some help with question from Data Structure course:
I was given this recursive function of mergesort (pseudo code):
Mergesort_1/3(A, p, r)
if p < r
then q = (p+(r-p)/3) // round the result down
Mergesort_1/3 (A,p,q)
Mergesort_1/3 (A,q+1,r)
Merge(A,p,q,r)
and these are the questions:
Let T(n) be the worst case running time of Mergesort _1/3. Write the recursive function for T(n). Give a short explanation.
Prove that T(n)=Ω(nlogn)

The gist of classic mergesort is the following recursion:
Split an unordered list in half.
It suffices to compute the limit offsets of the partial lists.
Apply the mergesort to each of the partial lists.
After this step, the partial lists will be sorted.
Merge the sorted partial lists by inspecting the elements of both lists in their respective order, coyping from and progressing in the list with the smaller element.
Let TC(n) be the time complexity of classic mergesort. The aforementioned steps take O(1) (*), 2*O(TC(ceil(n/2))), and O(n), respectively. This lends to the recursion TC(n) = cc_0 + cc_1 * n + 2 * TC(ceil(n/2)).
Consider the generalized mergesort where lists are split unevenly, though always with the same ratio. The complexity of splitting and merging remains the same, lending to the recursion TG(n) = ca_0 + ca_1 * n + TG(1/a * n + 1) + TG((a-1)/a * n + 1)) for the generalized mergesort (using TG(x+1) instead of TG(ceil(x)); ca_0, ca_1 being the constants hidden in the O(_) notation; a=3 for mergesort_1/3).
This recurrence can be solved using the Akra-Bazzi-Method.
To this end, the recurrence needs to be written as
TG(x) = g(x) + \sum_i=1..k ( a_i * TG(b_i * x + h_i(x) ) ; x >= x_0.
with
a_i, b_i const.
a_i > 0
0 < b_i < 1
|g(x)| = O(x^c); c const.
|h(x)| = O(x / (log(x))^2)
which can be done by setting ...
k = 2
a_1 = 1
a_2 = 1
b_1 = 1/a
b_2 = (a-1)/a
g(x) = ca_0 + ca_1 * x
h_1(x) = 1
h_2(x) = 1
x_0 = 2
-> TG(x) = ca_0 + ca_1 * n + 1 * TG(1/a * x + 1 ) + 1 * TG((a-1)/a * x + 1 ) ; x >= x_0.
The Akra-Bazzi theorem requires the exponent p be found in \sum_i=1..k ( a_i * (b_i)^p ) = 1. Then the following holds:
TG(x) = Θ ( x^p \integral_(1, x) ( g(u) / (u^(p+1)) du ) )
Specifically,
a_1 * b_1^p + a_2 * b_2^p = 1
=> (1/a)^p + ((a-1)/a)^p = 1
<=> p = 1
... and thus ...
TG(x) = Θ ( x \integral_(1, x) ( (ca_0 + ca_1 * u) / u^2 du ) )
= Θ ( x [ - ca_0/u + ca_1 * log(u) ]_(1,x) )
= Θ ( - ca_0 + ca_1 * x * log(x) + ca_0 * x - ca_1 * x * log(1) )
= Θ (x * log(x))
(*) Strictly speaking this is incorrect since basic arithmetics on binary representations and memory access is O(log n). However, this makes no difference asymptotically for Θ(n log n) complexity.

Instead of two, this mergesort makes partitions the data in three equal size chunks. So you are reducing a problem of size n to three problems of size n/3. Further, when merging, you will have to go through all the three n/3 sized sorted chunks which will result in going through total n elements. Hence, the recurrence can be written as:
T(n) = 3 T(n / 3) + O(n)
Using Master Theorem: Here a = 3, b = 3 and c = 1. Logba = Log33 = 1 = c.
So the recurrence will fall in the second case and T(n) = Θ(nc * Log(n)) = Θ(n * Log(n)).

Related

what is the best way to argue about the big O or about theta?

We're asked to provide a $ n+4![\sqrt{n}] =O(n) $ with having a good argumentation and a logical build up for it but it's not said how a good argumentation would look like, so I know that $2n+4\sqrt{n}$ always bigger for n=1 but i wouldn't know how to argue about it and how to logically build it since i just thought about it and it happened to be true. Can someone help out with this example so i would know how to do it?
You should look at the following site https://en.wikipedia.org/wiki/Big_O_notation
For the O big notation we would say that if a function is the following: X^3+X^2+100X = O(x^3). This is with idea that if X-> some very big number, the X^3 term will become the dominant factor in the equation.
You can use the same logic to your equation. Which term will become dominant in your equation.
If this is not clear you should try to plot both terms and see how they scale. This could be more clarifying.
A proof is a convincing, logical argument. When in doubt, a good way to write a convincing, logical argument is to use an accepted template for your argument. Then, others can simply check that you have used the template correctly and, if so, the validity of your argument follows.
A useful template for showing asymptotic bounds is mathematical induction. To use this, you show that what you are trying to prove is true for specific simple cases, called base cases, then you assume it is true in all cases up to a certain size (the induction hypothesis) and you finish the proof by showing the hypothesis implies the claim is true for cases of the very next size. If done correctly, you will have shown the claim (parameterized by a natural number n) is true for a fixed n and for all larger n. This is what is exactly what is required for proving asymptotic bounds.
In your case: we want to show that n + 4 * sqrt(n) = O(n). Recall that the (one?) formal definition of big-Oh is the following:
A function f is bound from above by a function g, written f(n) = O(g(n)), if there exist constants c > 0 and n0 > 0 such that for all n > n0, f(n) <= c * g(n).
Consider the case n = 0. We have n + 4 * sqrt(n) = 0 + 4 * 0 = 0 <= 0 = c * 0 = c * n for any constant c. If we now assume the claim is true for all n up to and including k, can we show it is true for n = k + 1? This would require (k + 1) + 4 * sqrt(k + 1) <= c * (k + 1). There are now two cases:
k + 1 is not a perfect square. Since we are doing analysis of algorithms it is implied that we are using integer math, so sqrt(k + 1) = sqrt(k) in this case. Therefore, (k + 1) + 4 * sqrt(k + 1) = (k + 4 * sqrt(k)) + 1 <= (c * k) + 1 <= c * (k + 1) by the induction hypothesis provided that c > 1.
k + 1 is a perfect square. Since we are doing analysis of algorithms it is implied that we are using integer math, so sqrt(k + 1) = sqrt(k) + 1 in this case. Therefore, (k + 1) + 4 * sqrt(k + 1) = (k + 4 * sqrt(k)) + 5 <= (c * k) + 5 <= c * (k + 1) by the induction hypothesis provided that c >= 5.
Because these two cases cover all possibilities and in each case the claim is true for n = k + 1 when we choose c >= 5, we see that n + 4 * sqrt(n) <= 5 * n for all n >= 0 = n0. This concludes the proof that n + 4 * sqrt(n) = O(n).

Computing expected time complexity of recursive program

I wish to determine the average processing time T(n) of the recursive algorithm:
int myTest( int n ) {
if ( n <= 0 ) {
return 0;
}
else {
int i = random( n - 1 );
return myTest( i ) + myTest( n - 1 - i );
}
}
provided that the algorithm random( int n ) spends one time unit to return
a random integer value uniformly distributed in the range [0, n] whereas
all other instructions spend a negligibly small time (e.g., T(0) = 0).
This is certainly not of the simpler form T(n) = a * T(n/b) + c so I am lost at how to write it. I can't figure out how to write it due to the fact that I am taking a random number each time from 0 to n-1 range and supplying it twice to the function and asking for the sum of those two calls.
The recurrence relations are:
T(0) = 0
T(n) = 1 + sum(T(i) + T(n-1-i) for i = 0..n-1) / n
The second can be simplified to:
T(n) = 1 + 2*sum(T(i) for i = 0..n-1) / n
Multiplying by n:
n T(n) = n + 2*sum(T(i) for i = 0..n-1)
Noting that (n-1) T(n-1) = n-1 + 2*sum(T(i) for i = 0..n-2), we get:
n T(n) = (n-1) T(n-1) + 1 + 2T(n-1)
= (n+1) T(n-1) + 1
Or:
T(n) = ((n+1)T(n-1) + 1) / n
This has the solution T(n) = n, which you can derive by telescoping the series, or by guessing the solution and then substituting it in to prove it works.

Calculating Time Complexity (2 Simple Algorithms)

Here are two algorithms (pseudo-code):
Alg1(n)
1. int x = n
2. int a = 0
3. while(x > 1) do
3.1. for i = 1 to x do
3.1.1 a = a + 1
3.2 x = x - n/5
Alg2(n)
1. int x = n
2. int a = 0
3. while(x > 1) do
3.1. for i = 1 to x do
3.1.1 a = a + 1
3.2 x = x/5
The difference is on line 3.2.
Time complexity:
Alg1: c + c +n*n*2c = 2c + 2cn² = 2c(n² + 1) = O(n²)
Alg2: c + c +n*n*2c = 2c + 2cn² = 2c(n² + 1) = O(n²)
I wanted to know if the calculation is correct.
Thanks!
No, I'm afraid you aren't correct.
In the first algorithm, the line:
x = x - n/5
Makes the while loop O(1) - it will run five times, however large n is. The for loop is O(N), so it's O(N) overall.
In algorithm 2, by contrast, x decreases as
x = x/5
As x = n to start with, this while loop runs in O(logN). However, the inner for loop also reduces as logN each time. Therefore you are carrying out n + n/5 + n/25 + ... operations, for O(N) again.
Below a formal method to deduce the order of growth of both your algorithms (I hope you're comfortable with Sigma Notation):
Algorithm 1 :
you decrease the value of x by 5 , so you do n + n-5 + n-10 +.... which is O(N^2)
Algorithm 2 :
you decrease the value of x by n/5 , so you do n + n/5 + n/25 +.... which is O(N logN)
see wikipedia for big-oh {O()} notation

f(n) = Sigma of I*Log(I) , where I=1 to Log(n) , equals what?

I'm trying to calculate the following :
f(n) = ∑ (i*log(i)) , when i=1 to log(n) .
How do I do that ?
I have succeeded doing :
f(n) = ∑ (i*log(i)) , when i=1 to n .
Which is : 1*log(1) + 2*log(2) + ... + n*log(n) <= n(n*log(n))
Where in the end : f(n) = ∑ (i*log(i)) = Ω(n^2 log^2(n) ) (Where i=1 to n)
But I don't know how to do the first one , any idea anybody ?
Regards
First, you have to remove ^2 from log^2(n) in your current result would be
f(n) = ∑ (i*log(i)) <= n(n*log(n)) = Ω(n^2*log(n))
Then, for the case where i goes from 1 to log(n), just substitute n by log(n).
Let's define
g(n) = ∑ (i*log(i)), when i=1 to log(n) // The result you are looking for
f(n) = ∑ (i*log(i)), when i=1 to n // The result we have
Then
g(n) = f(log(n)) = Ω(log(n)^2*log(log(n)))
f(n) = Theta(log2(n) * log(log(n))
Proof:
f(n) = 1 * log(1) + 2 * log(2) + ... + log(n) * log(log(n)) <=
<= log(n)*log(log(n)) * log(n) =
= O(log^2(n) * loglog(n))
f(n) = 1 * log(1) + 2 * log(2) + ... + log(n) * log(log(n)) >=
>= log(n/2) * log(log(n/2)) + log(n/2 + 1) * log(log(n/2 + 1) + ... + log(n) * log(log(n)) >=
>= log(n/2) * log(log(n/2)) + ... + log(n/2) * log(log(n/2)) =
= log(n/2) * log(log(n/2)) * log(n/2)
= log^2(n/2)*log(log(n/2)) = log^2(n/2)*log(log(n)-log(2)) =
= Omega(log^2(n)*loglog(n))
If you know some calculus, you can often find the order of growth of such sums by integration.
If f is a positive monotonic function, ∑ f(i) for 1 <= i <= k can be approximated by the integral ∫ f(t) dt (t ranging from 1 to k). So if you know a primitive function F of f (in modern parlance an antiderivative), you can easily evaluate the integral to F(k) - F(1). For growth analysis, the constant term F(1) is irrelevant, so you can approximate the sum (as well as the integral) simply by F(k).
A tool that is often useful in such calculations is partial integration,
b b
∫ f'(t)*g(t) dt = f(b)*g(b) - f(a)*g(a) - ∫ f(t)*g'(t) dt
a a
which follows from the product rule (f*g)' = f' * g + f * g'. It is often helpful to write f as 1*f in order to apply partial integration, for example to find a primitive of the (natural) logarithm,
∫ log t dt = ∫ 1*log t dt = t*log t - ∫ t * (log t)' dt = t*log t - ∫ t*(1/t) dt = t*log t - t
In this case, with f(t) = t*log t, partial integration yields
∫ t*log t dt = 1/2*t^2 * log t - ∫ (1/2*t^2) * (log t)' dt
= 1/2*t^2 * log t - 1/2 ∫ t^2*(1/t) dt
= 1/2*t^2 * log t - 1/4*t^2
Since the second term grows slower than the first, it can be ignored for growth analysis, so you obtain
k
∑ i*log i ≈ 1/2*k^2*log k
1
Since logarithms to different bases only differ by a constant factor, a different choice of logarithm just changes the constant factor, and you see that in all cases
k
∑ i*log i ∈ Θ(k^2 * log k)
1
For your specific problem, k = log n, so the sum is Θ((log n)^2 * log(log n)), as has been derived in a different way by the other answers.
http://img196.imageshack.us/img196/7012/5f1ff74e3e6e4a72bbd5483.png
now subtitute n for logn and you'll get it's VERY tightly bounded by log^2(n)*log(log(n))

Big O Analysis with Recursion Tree of Stooge Sort

I am trying to find the Big O for stooge sort. From Wikipedia
algorithm stoogesort(array L, i = 0, j = length(L)-1)
if L[j] < L[i] then
L[i] ↔ L[j]
if j - i > 1 then
t = (j - i + 1)/3
stoogesort(L, i , j-t)
stoogesort(L, i+t, j )
stoogesort(L, i , j-t)
return L
I am bad at performance analysis ... I drew the recursion tree
I believe the ... :
height: log(n)
work on level 0: n // do I start from level 0 or 1?
work on level 1: 2n
work on level 2: 4n
work on level 3: 8n
work on level log(n): (2^log(n))n = O(n^2)? 2^log2(n) = n, but its what does 2^log3(n) actually give?
So its O(n^2 * log(n)) = O(n^2)? Its far from Wikipedia's O(n^(log3/log1.5)) ...
The size of the problem at level k is (2/3)kn. The size at the lowest level is 1, so setting (2/3)kn = 1, the depth is k = log1.5 n (divide both sides by (2/3)k, take logs base 1.5).
The number of invocations at level k is 3k. At level k = log1.5 n, this is 3log1.5n = ((1.5)log1.53)log1.5 n = ((1.5)log1.5n)log1.5 3 = nlog1.53 = nlog 3/log 1.5.
Since the work at each level increases geometrically, the work at the leaves dominates.
You can use the Master Theorem to find this answer.
We can see from the Algorithm that the recurrence relation is:
T(n) = 3*T(2/3 n) + 1
Applying the theorem:
f(n) = 1 = O(nc), where c=0.
a = 3, b = 3/2 => log3/2(3) =~ 2.70
Since c < log3/2(3), we are at Case 1 of the Theorem, so:
T(n) = O(nlog3/2(3)) = O(n2.70)
Probably this could help:
/**********************************************************************
* Complexity:
* This algorithm makes exactly one comparison on each step.
* On each step - algorithm divide initial array of size n :
* on 3 arrays with (2*n / 3) sizes
* N
* / | \
* 2N/3 2N/3 2N/3
* /
* (2N/3)2/3
* /
* ...
* /
* N * (2/3)^k = 1
* By considering this tree we can find the depth - k:
* N * (2/3)^k = 1 =>>
* N = 1 / (2/3)^k =>>
* N = (3/2)^k =>>
* log(3/2, N) = log(3/2, (3/2)^k) =>>
* k = log(3/2, N) (!!!)
*
* On each step algorithm makes 3^k comparisons =>> on the last step we will get:
* 3^(log(3/2, N)) =>> N^(log(3/2, 3))
* comparisons.
*
* We can compute the full work:
* 1 + 3 + 9 + ... + 3^(log(3/2, N))
* by using geometric progression formulas.
*
*************************************************************************/
public static void sort(Comparable[] a, int lo, int hi) {
if (lo >= hi) return;
if (less(a[hi], a[lo])) exch(a, hi, lo);
if (hi - lo + 1 > 2) {
int t = (hi - lo + 1) / 3;
sort(a, lo, hi - t);
sort(a, lo + t, hi);
sort(a, lo, hi - t);
}
}
If we define T(n) as the answer (j-i+1 = n) we have:
T(n) = 3*T(n/1.5) + O(1)
You can solve that using Master Theorem and the answer will be theta(n^log(3,1.5)) = theta(n^(log 3 / log 1.5))
You can prove that using induction on n too.
Using recursion tree is acceptable too:
k = number of levels = log (n,1.5)
ans = 1 + 3 + 3^2 + ... + 3^k = theta(3^k) = theta(3^log(n,1.5)) = theta(n^log(3,1.5))
t(n) = 3⋅t(2*n/3) + Θ(n)
h = 1+log3/2(n)
Try to calculate. it is easy.
At every level we have complexity 3i⋅c, where c is some constant and i is the height of that particular level
t(n) = Σi=0,...,h c⋅3i
t(n) = n-2log₃(n)/2log₃(n)+1
then its a simple geometric progression.

Resources