Understanding geometric improvement approach for obtaining polynomial-time algorithms - algorithm

I'm reading Network Flows - Theory, Algorithms, and Applications and I'm stuck on the proof of the following theorem (Ch. 3, Page 67):
Theorem. Suppose that 𝑧^π‘˜ is the objective function value of some solution of a minimization problem at the π‘˜π‘‘β„Ž iteration of an algorithm and 𝑧^βˆ— is the minimum objective function value. Furthermore, suppose that the algorithm guarantees that for every iteration π‘˜,
(1) (𝑧^π‘˜ βˆ’ 𝑧^(π‘˜+1)) β‰₯ 𝛼(𝑧^π‘˜ βˆ’ 𝑧^βˆ—)
(i.e., the improvement at iteration π‘˜ + 1 is at least 𝛼 times the total possible improvement)for some constant 𝛼 with 0 < 𝛼 < 1 (which is independent of the problem data). Then the algorithm terminates in 𝑂((π‘™π‘œπ‘”π»)/𝛼) iterations, where 𝐻 is the difference between the maximum and minimum objective function values.
Proof. The quantity (𝑧^π‘˜ βˆ’ 𝑧^βˆ—) represents the total possible improvement in the objective function value after the π‘˜π‘‘β„Ž iteration. Consider a consecutive sequence of 2/π‘Ž iterations starting from iteration π‘˜. If each iteration of the algorithm improves the objective function value by at least π‘Ž(𝑧^π‘˜ βˆ’ 𝑧^βˆ—)/2 units, the algorithm would determine an optimal solution within these 2/π‘Ž iterations. Suppose, instead, that at some iteration π‘ž + 1, the algorithm improves the objective function value by less than π‘Ž(𝑧^π‘˜ βˆ’ 𝑧^βˆ—)/2 units. In other words,
(2) 𝑧^π‘ž βˆ’ 𝑧^(π‘ž+1) ≀ π‘Ž(𝑧^π‘˜ βˆ’ 𝑧^βˆ—)/2.
The inequality (1) implies that
(3) 𝛼(𝑧^π‘ž βˆ’ 𝑧^βˆ—) ≀ 𝑧^π‘ž βˆ’ 𝑧^(π‘ž+1)
The inequalities (2) and (3) imply that
(𝑧^π‘ž βˆ’ 𝑧^βˆ—) ≀ (𝑧^π‘˜ βˆ’ 𝑧^βˆ—)/2,
so the algorithm has reduced the total possible improvement (𝑧^π‘˜ βˆ’ 𝑧^βˆ—) by a factor at least 2. We have thus shown that within 2/π‘Ž consecutive iterations, the algorithm either obtains an optimal solution or reduces the total possible improvement by a factor of at least 2. Since 𝐻 is the maximum possible improvement and every objective function value is an integer, the algorithm must terminate within 𝑂((π‘™π‘œπ‘”π»)/𝛼) iterations.
Why does the author focus on 2/a iterations?

I think it's written this way for computer scientists to understand more easily that we're halving an integer quantity every 2/Ξ± rounds, hence the number of 2/Ξ±-round super-rounds will be O(log). No particular reason we can't do the math more directly:
T(n) is the gap between the nth solution and an optimal solution.
T(n) ≀ (1 - Ξ±) T(n-1) [assumption]
n
T(n) ≀ T(0) (1 - Ξ±) [by induction]
-Ξ± n x
< T(0) (e ) [by 1 + x < e for x β‰  0]
-Ξ± (ln T(0))/Ξ±
T((ln T(0))/Ξ±) < T(0) e
= 1,
so (ln T(0))/Ξ± rounds suffice.

Related

calculate n for nlog(n) and n! when time is 1 second. (algorithm takes f(n) microseconds)

given the following problem from CLRS algo book.
For each function f (n) and time t in the following table, determine
the largest size n of a problem that can be solved in time t, assuming
that the algorithm to solve the problem takes f(n) microseconds.
how can one calculate n for f(n)=nlog(n) when time is 1 second?
how can one calculate n for f(n)=n! when time is 1 second?
It is mentioned that the algorithm takes f(n) microseconds. Then, one may consider that algorithm to consist of f(n) steps each of which takes 1 microsecond.
The questions given state that relevant f(n) values are bound by 1 second. (i.e. 106 microseconds) Then, since you are looking for the largest n possible to fulfill those conditions, your questions boil down to the inequalities given below.
1) f(n) = nlog(n) <= 106
2) f(n) = n! <= 106
The rest, I believe, is mainly juggling with algebra and logarithmic equations to find the relevant values.
In first case, You can refer to Example of newtons method to calculate cube root Newton’s method to approximate the roots or Lambert W Function. It may help to calculate value of n. As per the my findings mostly there is no other analytical approach can help.
In second case, python script can help to calculate n with manual approch.
def calFact(n):
if(n == 0 or n==1):
return n
return n*calFact(n-1)
nVal=1
while(calFact(nVal)<1000000): # f(n) = n! * 10^-6 sec
nVal=nVal+1 # 10^6 = n!
print(nVal)
So in this case we are trying to find out n such that n! is equal to or near to 10^6.

Calculate big Theta bound for 2 recursive calls

T(m,n) = 2T(m/2,n)+n, assume T(m,n) is constant if either m<2 or n<2
So what I don't understand is, can this problem be solved using Master Theorem? If so how? If not, is this table correct?
level # of instances size cost of each level total cost
0 1 m, n n n
1 2 m/2, n n 2n
2 4 m/4, n n 4n
i 2^i m/(2^i), n n 2^i * n
k m 1, n n n*m
Thanks!
The Master Theorem might be a little bit of an overkill here and your solution method is not bad (log means logarithmus to base 2, c=T(1,n)):
T(m,n)=n+2T(m/2,n)=n+2n+4T(m/4,n)=n*(1+2+4+..+2^log(m))+2^log(m)*c
=n*(2^(log(m)+1)-1)+m*c=Theta(n*m)
If you use the Master Theorem by treating n as a constant, than you would easily get T(m,n)=Theta(m*C(n)) with a constant C depending on n, but the Master Theorem does not tell you much about this constant C. If you get too smart and inattentive you could easily get burned:
T(m,n)=n+2T(m/2,n)=n*(1+2/nT(m/2,n))=n*Theta(2^(log(m/n)))
=n*Theta(m/n)=Theta(m)
And now, because you left out C(n) in the third step, you got a wrong result!

Number of Comparisons in Merge-Sort

I was studying the merge-sort subject that I ran into this concept that the number of comparisons in merge-sort (in the worst-case, and according to Wikipedia) equals (n ⌈lg nβŒ‰ - 2⌈lg nβŒ‰ + 1); in fact it's between (n lg n - n + 1) and (n lg n + n + O(lg n)). The problem is that I cannot figure out what these complexities try to say. I know O(nlogn) is the complexity of merge-sort but the number of comparisons?
Why to count comparisons
There are basically two operations to any sorting algorithm: comparing data and moving data. In many cases, comparing will be more expensive than moving. Think about long strings in a reference-based typing system: moving data will simply exchange pointers, but comparing might require iterating over a large common part of the strings before the first difference is found. So in this sense, comparison might well be the operation to focus on.
Why an exact count
The numbers appear to be more detailed: instead of simply giving some Landau symbol (big-Oh notation) for the complexity, you get an actual number. Once you have decided what a basic operation is, like a comparison in this case, this approach of actually counting operations becomes feasible. This is particularly important when comparing the constants hidden by the Landau symbol, or when examining the non-asymptotic case of small inputs.
Why this exact count formula
Note that throughout this discussion, lg denotes the logarithm with base 2. When you merge-sort n elements, you have ⌈lg nβŒ‰ levels of merges. Assume you place ⌈lg nβŒ‰ coins on each element to be sorted, and a merge costs one coin. This will certainly be enough to pay for all the merges, as each element will be included in ⌈lg nβŒ‰ merges, and each merge won't take more comparisons than the number of elements involved. So this is the n⌈lg nβŒ‰ from your formula.
As a merge of two arrays of length m and n takes only m + n βˆ’ 1 comparisons, you still have coins left at the end, one from each merge. Let us for the moment assume that all our array lengths are powers of two, i.e. that you always have m = n. Then the total number of merges is n βˆ’ 1 (sum of powers of two). Using the fact that n is a power of two, this can also be written as 2⌈lg nβŒ‰ βˆ’ 1, and subtracting that number of returned coins from the number of all coins yields n⌈lg nβŒ‰ βˆ’ 2⌈lg nβŒ‰ + 1 as required.
If n is 1 less than a power of two, then there are ⌈lg nβŒ‰ merges where one element less is involved. This includes a merge of two one-element lists which used to take one coin and which now disappears altogether. So the total cost reduces by ⌈lg nβŒ‰, which is exactly the number of coins you'd have placed on the last element if n were a power of two. So you have to place fewer coins up front, but you get back the same number of coins. This is the reason why the formula has 2⌈lg nβŒ‰ instead of n: the value remains the same unless you drop to a smaller power of two. The same argument holds if the difference between n and the next power of two is greater than 1.
On the whole, this results in the formula given in Wikipedia:
n ⌈lg nβŒ‰ βˆ’ 2⌈lg nβŒ‰ + 1
Note: I'm pretty happy with the above proof. For those who like my formulation, feel free to distribute it, but don't forget to attribute it to me as the license requires.
Why this lower bound
To proove the lower bound formula, let's write ⌈lg nβŒ‰ = lg n + d with 0 ≀ d < 1. Now the formula above can be written as
n (lg n + d) βˆ’ 2lg n + d + 1 =
n lg n + nd βˆ’ n2d + 1 =
n lg n βˆ’ n(2d βˆ’ d) + 1 β‰₯
n lg n βˆ’ n + 1
where the inequality holds because 2d βˆ’ d ≀ 1 for 0 ≀ d < 1
Why this upper bound
I must confess, I'm rather confused why anyone would name n lg n + n + O(lg n) as an upper bound. Even if you wanted to avoid the floor function, the computation above suggests something like n lg n βˆ’ 0.9n + 1 as a much tighter upper bound for the exact formula. 2d βˆ’ d has its minimum (ln(ln(2)) + 1)/ln(2) β‰ˆ 0.914 for d = βˆ’ln(ln(2))/ln(2) β‰ˆ 0.529.
I can only guess that the quoted formula occurs in some publication, either as a rather loose bound for this algorithm, or as the exact number of comparisons for some other algorithm which is compared against this one.
(Two different counts)
This issue has been resolved by the comment below; one formula was originally quoted incorrectly.
equals (n lg n - n + 1); in fact it's between (n lg n - n + 1) and (n lg n + n + O(lg n))
If the first part is true, the second is trivially true as well, but explicitely stating the upper bound seems kind of pointless. I haven't looked at the details myself, but these two statements appear strange when taken together like this. Either the first one really is true, in which case I'd omit the second one as it is only confusing, or the second one is true, in which case the first one is wrong and should be omitted.

Running time of Euclid's GCD algorithm?

I am trying to learn number theory for RSA cryptography by reading the CLR algorithms book. I was looking at exercise 31.2-5 which claims a bound of 1 + logΦ(b / gcd(a,b)).
The full question is:
If a > b >= 0, show that the invocation EUCLID(a,b) makes at most 1 + logΦb recursive calls. Improve this bound to 1 + logΦ(b / gcd(a,b)).
Does anyone know how to show this? There are already several other questions and answers to Euclid's algorithm on this site already but none of them seem to have this exact precise answer.
Refer to the analysis of Euclid's algorithm by Donald Knuth, in TAOCP Vol.2 p.356
This is how I solved the first part. I am still working on the second
We know that the runtime of Euclid algorithm is a function of the number of steps involved (pg of the book)
Let k be the number of recursive steps needed.
Hence b >= F k+1 >= Ο† k -1
b >= Ο† k -1
Taking log to find k we have
Log Ο† b >= k-1
1 + Log Ο† b > = k
Hence the run time is O (1 + Log Ο† b)
Show that if Euclid(a,b) takes more than N steps, then
a>=F(n+1) and b>=F(n), where F(i) is the ith Fibonacci
number. This can easily be done by Induction.
Show that F(n) &geq; Ο†n-1, again by
Induction.
Using results of Step 1 and 2, we have b &geq; F(n) &geq;
Ο†n-1 Taking logarithm on both sides,
logφb &geq; n-1. Hence proved, n &leq; 1 +
logφb
The second part trivially follows.
No. of recursive calls in EUCLID(ka,kb) is the same as in EUCLID(a,b), where k is some integer.
Hence, the bound is improved to 1 + logφ( b / gcd(a,b) ).

Asymptotic runtime for an algorithm

I've decided to try and do a problem about analyzing the worst possible runtime of an algorithm and to gain some practice.
Since I'm a beginner I only need help in expressing my answer in a right way.
I came accros this problem in a book that uses the following algorithm:
Input: A set of n points (x1, y1), . . . , (xn, yn) with n β‰₯ 2.
Output: The squared distance of a closest pair of points.
ClosePoints
1. if n = 2 then return (x1 βˆ’ x2)^2 + (y1 βˆ’ y2)^2
2. else
3. d ← 0
4. for i ← 1 to n βˆ’ 1 do
5. for j ← i + 1 to n do
6. t ← (xi βˆ’ xj)^2 + (yi βˆ’ yj)^2
7. if t < d then
8. d ← t
9. return d
My question is how can I offer a good proof that T(n) = O(n^2),T(n) = Ω(n^2) and T (n) = Θ(n^2)?,where T(n) represents the worst possible runtime.
I know that we say that f is O(g),
if and only if there is an n0 ∈ N and c > 0 in R such that for all
n β‰₯ n0 we have
f(n) ≀ cg(n).
And also we say that f is Ξ©(g) if there is an
n0 ∈ N and c > 0 in R such that for all n β‰₯ n0 we have
f(n) β‰₯ cg(n).
Now I know that the algoritm is doing c * n(n - 1) iterations, yielding T(n)=c*n^2 - c*n.
The first 3 lines are executed O(1) times line 4 loops for n - 1 iterations which is O(n) . Line 5 loops for n - i iterations which is also O(n) .Does each line of the inner loop's content
(lines 6-7) takes (n-1)(n-i) or just O(1)?and why?The only variation is how many times 8.(d ← t) is performed but it must be lower than or equal to O(n^2).
So,how should I write a good and complete proof that T(n) = O(n^2),T(n) = Ω(n^2) and T (n) = Θ(n^2)?
Thanks in advance
Count the number of times t changes its value. Since changing t is the innermost operation performed, finding how many times that happens will allow you to find the complexity of the entire algorithm.
i = 1 => j runs n - 1 times (t changes value n - 1 times)
i = 2 => j runs n - 2 times
...
i = n - 1 => j runs 1 time
So the number of times t changes is 1 + 2 + ... + n - 1. This sum is equal n(n - 1) / 2. This is dominated by 0.5 * n^2.
Now just find appropriate constants and you can prove that this is Ω(n^2), O(n^2), Θ(n^2).
T(n)=c*n^2 - c*n approaches c*n^2 for large n, which is the definition of O(n^2).
if you observe the two for loops, each for loop gives an O(n) because each loop is incrementing/decrementing in a linear fashion. hence, two loops combined roughly give a O(n^2) complexity. the whole point of big-oh is to find the dominating term- coeffecients do not matter. i would strongly recommend formatting your pseudocode in a proper manner in which it is not ambiguous. in any case, the if and else loops do no affect the complexity of the algorithm.
lets observe the various definitions:
Big-Oh
β€’ f(n) is O(g(n)) if f(n) is
asymptotically β€œless than or equal” to
g(n)
Big-Omega
β€’ f(n) is Ξ©(g(n)) if f(n) is
asymptotically β€œgreater than or equal”
to g(n)
Big-Theta
β€’ f(n) is Θ(g(n)) if f(n) is
asymptotically β€œequal” to g(n)
so all you need are to find constraints which satisfy the answer.

Resources