Max-heapify with convergent series - algorithm

I am going through max-heapify and below was the observations
1- Observe max-heapify takes O(1) for nodes that are one level above leaves and in general O(L) times for nodes that are L level above leaves
2- n/4 nodes with level 1, n/8 nodes with level 2 so on.
The total amount of work in the for loop:
n/4 (1 c) + n/8 (2 c) + n/16 (3c) + ... + 1(log n c)
set n/4 = 2 pow k
C*2powk (1/2pow0 + 2/2pow1 + 3/2pow2 + ... + (k+1)/2powk)
series in the bracket is convergent series bounded by a constant
Algorithm is:
build max-heap (A)
for i=n/2 down to 1:
do max-heapify (A,i)
I understand most of the thing from the lecture, but I am confused on some points
1- Why we using n/4 (1 c), why not n/2? and how we know that n/4 leads us to level 1
2- How this convergent series leads us to theta n complexity

1: Consider some (complete for simplicity) binary tree. It has one root, two nodes below the root, 4 below them and so on. Hence the number of nodes in a binary tree of height h is
1 + 2 + 2^2 + ... + 2^(h-1) = 2^h - 1
Since this is the number of nodes n, roughly half of all nodes (n/2) are leaves. In the level above are half as many nodes as there are leaves so n/4.
2: You have a runtime of
C * 2^k * (1 + 2/2^1 + ... + (k+1)/2^k)
Let's call the Term in the bracket S(k) (dependent on k). We say that S(k) converges when k goes against infinity. Furthermore it is increasing, since S(k+1) is S(k) plus a positive term. Hence it has to always be lower than its limit. Would it be higher, we couldn't get down again. Therefore we can say there is a constant A (so independent of k) such that S(k) < A for all k.
Hence we can write the runtime as
C * 2^k * (1 + 2/2^1 + ... + (k+1)/2^k) < C * 2^k * A = A * C * n/4 = O(n)

It is kind of long explanation, I had the same difficulty to understand the heapify() before. So, I’d like to share it. Please let me know if you find some issue. Thanks.

Related

Order of Growth Analysis [duplicate]

This question already has answers here:
Big-O Notation regarding logarithms
(4 answers)
Finding Big O of the Harmonic Series
(3 answers)
Asymptotic complexity of T(n)=T(n-1)+1/n [closed]
(2 answers)
Closed 5 years ago.
Am seeking clarification on how the bound of a particular function is being determined.
E.g 1: A(n) = log(2^n) + n^(1/3) + 1000
Would I be right to say that the last 2 terms can be "ignored" as they are insignificant as compared to the first? And thus the bound is O(2^n)?
E.g 2: B(n) = n + (1/2)*n + (1/3)*n + (1/4)*n + ... + 1
I am more uncertain about this one, but I would give a guess that it would be O(n)? 1 is ignored (as per reasoning for 1000 in E.g. 1), that's what I'm sure.
Was also thinking if the fractions in E.g. 2 are modified, such that the denominators run in different patterns instead (e.g. (1/2)*n + (1/4)*n) + (1/8)*n...), would the order of growth be faster/slower than E.g. 2?
Appreciate any guidance available! Thank you!
E.g 1: A(n) = log(2^n) + n^(1/3) + 1000
Here log(2^n) = n which is bigger than n^(1/3) so by the property of Order function A(n0 = O(n)
E.g 2: B(n) = n + (1/2)*n + (1/3)*n + (1/4)*n + ... 1
= n*(1 + 1/2 + 1/3 + 1/4 ....+ 1/n)
Now (1 + 1/2 + 1/3 + 1/4 ....) you can approximate by thinking it is integration of dx/x from 1 to n which comes to be log(n) making the resulting Order = O(nlgn)
E.g 2 Modified = n + (1/2)*n + (1/4)*n + (1/8)*n +.....
= n( 1+ 1/2 + 1/4 +1/8...) [GP series]
= n / (1/(1-1/2))
= 2n
So it becomes O(n)
E.g 1: A(n) = log(2^n) + n^(1/3) + 1000 Would I be right to say that the last 2 terms can be "ignored" as they are insignificant as compared to the first? And thus the bound is O(2^n)?
If you simplify the expression, you get A(n) = n*log(2) + n^(1/3) + 1000. The last two terms grow more slowly than the first, n*log(2), which is simply O(n). Hence A(n) is O(n).
E.g 2: B(n) = n + (1/2)*n + (1/3)*n + (1/4)*n + ... + 1 I am more uncertain about this one, but I would give a guess that it would be O(n)? 1 is ignored (as per reasoning for 1000 in E.g. 1), that's what I'm sure.
This one is tricky because it involves an infinite series. If you only had n + (1/2)*n + (1/3)*n + (1/4)*n, then it would be equivalent to a*n with some fractional a, and that is O(n). However, since the expression is a divergent infinite series known as the harmonic series, you cannot conclude that B(n) is O(n). In fact, B(n) can be simplified as S_k(1/i)*n + 1 as k tends to infinity, with S_k(1/i) the sum of 1/i with i from 1 to k. And because S_k diverges as k tends to infinity, B(n) also tends to infinity, assuming n>0.
In the end, B(n) in not bounded. It doesn't have a proper order of growth.
Edit: if B(n) does not contain an infinite series but instead stops at (1/n)*n, which is the last 1 in your expression, then the answer is different.
The partial sums of the harmonic series have logarithmic growth. That means that B(n)/n, which is exactly the partial sum, up to n, of the harmonic series, is O(log n). In the end, B(n) is simply O(n log n).

2D Peak finding in linear time

I am reading this O(n) solution of the 2D peak finding problem. The author says that an important detail is to
split by the maximum direction. For square arrays this means that
split directions will be alternating.
Why is this necessary?
This is not a necessary. The alternating direction gives O(N) for any arrays.
Let's count the number of comparisons for an array M × N.
First iteration gives 3×M, second gives 3×N/2, third gives 3×M/4, fourth gives 3×N/8, i.e.:
3 * (M + M/4 + M/16 + ...) + 3 * (N/2 + N/8 + N/32 + ...)
We got two geometric series. Cause both of these series has common ratio 1/4, we can calculate the upper limit:
3 * (4 * M/3 + 2 * N/3)
Cause O(const × N) = O(N) and O(M + N) = O(N), we have O(N) algorithm.
If we always choose the vertical direction, then performance of algorithm is O(logM × N). If M much more N, then this algorithm will be faster. F.e. if M = 1025 and N = 3, then count of comparisons in the first algorithm is comparable to 1000, and in the second algorithm is comparable to 30.
Splitting an array by the maximum direction, we got the faster algorithm for specific values of M and N. Is this algorithm O(N)? Yes, cause even comparing both vertical and horizontal sections at each step we have 3 × (M + M/2 + M/4 + ...) + 3 × (N + N/2 + N/4 + ...) = 3 * (2 × M + 2 × N) comparisons, i.e. O(M + N) = O(N). But we always choose only one direction at each step.
Splitting the longer side ensures that the length of the split is at most sqrt(area). He could have also gone through the proof noticing that he's halving area with each call and he looks at most 3 sqrt(area) cells to do so.

Time complexity of iterating over a k-layer deep loop nest always Θ(nᵏ)?

Many algorithms have loops in them that look like this:
for a from 1 to n
for b from 1 to a
for c from 1 to b
for d from 1 to c
for e from 1 to d
...
// Do O(1) work
In other words, the loop nest is k layers deep, the outer layer loops from 1 to n, and each inner layer loops up from 1 to the index above it. This shows up, for example, in code to iterate over all k-tuples of positions inside an array.
Assuming that k is fixed, is the runtime of this code always Θ(nk)? For the special case where n = 1, the work is Θ(n) because it's just a standard loop over an array, and for the case where n = 2 the work is Θ(n2) because the work done by the inner loop is given by
0 + 1 + 2 + ... + n-1 = n(n-1)/2 = Θ(n2)
Does this pattern continue when k gets large? Or is it just a coincidence?
Yes, the time complexity will be Θ(nk). One way to measure the complexity of this code is to look at what values it generates. One particularly useful observation is that these loops will iterate over all possible k-element subsets of the array {1, 2, 3, ..., n} and will spend O(1) time producing each one of them. Therefore, we can say that the runtime is given by the number of such subsets. Given an n-element set, the number of k-element subsets is n choose k, which is equal to
n! / k!(n - k)!
This is given by
n (n-1)(n-2) ... (n - k + 1) / k!
This value is certainly no greater than this one:
n · n · n · ... · n / k! (with k copies of n)
= nk / k!
This expression is O(nk), since the 1/k! term is a fixed constant.
Similarly, when n - k + 1 ≥ n / 2, this expression is greater than or equal to
(n / 2) · (n / 2) · ... · (n / 2) / k! (with k copies of n/2)
= nk / k! 2k
This is Ω(nk), since 1 / k! 2k is a fixed constant.
Since the runtime is O(nk) and Ω(nk), the runtime is Θ(nk).
Hope this helps!
You may consume the following equation:
Where c is the number of constant time operations inside the innermost loop, n is the number of elements, and r is the number of nested loops.

Proving this recursive Fibonacci implementation runs in time O(2^n)?

I'm having difficulty proving that the 'bad' version of fibonacci is O(2^n).
Ie.
Given the function
int fib(int x)
{
if ( x == 1 || x == 2 )
{
return 1;
}
else
{
return ( f( x - 1 ) + f( x - 2) );
}
}
Can I get help for the proof of this being O(2^n).
Let's start off by writing a recurrence relation for the runtime:
T(1) = 1
T(2) = 1
T(n+2) = T(n) + T(n + 1) + 1
Now, let's take a guess that
T(n) ≤ 2n
If we try to prove this by induction, the base cases check out:
T(1) = 1 ≤ 2 = 21
T(2) = 1 ≤ 4 = 22
Then, in the inductive step, we see this:
T(n + 2) = T(n) + T(n + 1) + 1
≤ 2n + 2n+1 + 1
< 2n+1 + 2n+1
= 2n+2
Therefore, by induction, we can conclude that T(n) ≤ 2n for any n, and therefore T(n) = O(2n).
With a more precise analysis, you can prove that T(n) = 2Fn - 1, where Fn is the nth Fibonacci number. This proves, more accurately, that T(n) = Θ(φn), where φ is the Golden Ratio, which is approximately 1.61. Note that φn = o(2n) (using little-o notation), so this is a much better bound.
Hope this helps!
Try manually doing a few test cases like f(5) and take note of how many times the method f() is called.
A fat hint would be to notice that every time the method f() is called (except for x is 1 or 2), f() is called twice. Each of those call f() two more times each, and so on...
There's actually a pretty simple proof that the total number of calls to the f is going to be 2Fib(n)-1, where Fib(n) is the n'th Fibonacci number. It goes like this:
The set of calls to f form a binary tree, where each call is either a leaf (for x=1 or x=2) or else the call spawns two child calls (for x>2).
Each leaf contributes exactly 1 to the total returned by the original call, therefore there are Fib(n) total leaves.
The total number of internal nodes in any binary tree is equal to L-1, where L is the number of leaves, so the total number of nodes in this tree is 2L-1.
This shows that the running time (measured in terms of total calls to f) is
T(n)=2Fib(n)-1=O(Fib(n))
and since Fib(n)=Θ(φ^n), where φ is the golden ratio
Φ=(1+sqrt{5})/2 = 1.618...
this proves that T(n) = Θ(1.618...^n) = O(n).
Using the Recursion Tree Method :
T(n)
↙ ↘
n-1 n – 2
↙ ↘ ↙ ↘
N – 2 n – 3 n – 3 n - 4
Each tree level is considered as a call for fib(x - 1) fib(x - 2) if you complete the recursion tree on this manner you will stop when x = 1 or x = 2 (base case) .... this tree shows only three levels of the recursion tree . To solve this tree you need these important informations : 1- height of the tree. 2-how much work is done at each level .
The height of this tree is 2^n and the work at each level is O(1) then the Order of this recurrence is Height * Work at each level = 2^n * 1 = O(2^n)

Big O Speed of Removing Duplicates from Linked List Without Buffer

The approach I'm referring to is the dual-pointer technique. Where the first pointer is a straightforward iterator and the second pointer goes through only all previous values relative to the first pointer.
That way, less work is done than if, for each node, we compared against every other node. Which would end up being O(n^2).
My question is what is the speed of the dual-pointer method and why?
So if you have N elements in the list, doing your de-duping on element i will require i comparisons (there are i values behind it). So, we can set up the total number of comparisons as sum[i = 0 to N] i. This summation evaluates to N(N+1)/2, which is strictly less than N^2 for N > 1.
Edit:
To solve the summation, you can approach it like this.
1 + 2 + 3 + 4 + ... + (n-2) + (n-1) + n From here, you can match up numbers from opposite sides. This can then become
2 + 3 + ... + (n-1) + (n+1) by matching up the 1 at the start with the n at the end. Do the same with 2 and (n-1).
3 + ... + (n-1+2) + (n+1) simplify to become
3 + ... + (n+1) + (n+1)
You can repeat this n/2 times, since you are matching up two number each time. This will leave us with n/2 occurances of the term (n+1). Multiplying those and simplifying, we get (n+1)(n/2) or n(n+1)/2.
See here for more description.
Also, this suggests this summation still has a big-theta of n^2.

Resources