Time complexity of recursive function dividing the input value by 2/3 everytime - algorithm

I know that the time complexity of a recursive function dividing its input by /2 is log n base 2,I have come across some interesting scenarios on
https://stackoverflow.com/a/42038565/8169857
Kindly help me to understand the logic behind the scenarios in the answer regarding the derivation of the formula

It's back to the recursion tree. Why for 1/2 is O(log2(n))? Because if n = 2^k, you should divide k times to reach to 1. Hence, the number of computation is k = log2(n) comparison at most. Now suppose it is (c-1)/c. Hence, if n = (c/(c-1))^k, we need log_{c/(c-1)}(n) operations to reach to 1.
Now as for any constant c > 1, limit log2(n)/log_{c/(c-1)}(n), n \to \infty is equal to a constant greater than zero, log_{c/(c-1)}(n) = \Theta(log2(n)). Indeed, you can say this for any constants a, b > 1, log_a(n) = \Theta(log_b(n)). Now, the proof is completed.

Related

complexity/runtime of pseudocode using big-O notation

I need a little help with a problem. I just started reading about O-notation but I'm still new when it comes to analysing code.
So here's the problem:
The following pseudocode is given, where A is a number field whose elements over the indices 1 to length(A) can be accessed. i is made up of whole numbers so the results of the division are rounded down. What's the complexity of the function SkipPrint?
1: procedure SkipPrint(A)
2: i <- length(A)
3: do
4: print(A[i])
5: i <- i/2
6: while i>0
So I think the complexity O(n) since the function needs to go through the array but only once, right? (line 2) Every other line is of lesser magnitude so it stays O(n)?
Thanks in advance. Your help is appreciated.
Salute
This would be O(n), yes. You are correct. This is because there is a single loop that iterates in one direction. (i always get smaller in this case)
Let's say n = length(A) and assume first that we are in the simpler case where length(A) = 2^m, for some integer m.
Then i, being halved at every step, would have the values:
2^m, 2^{m-1}, 2^{m-2}, ..., 2, 1, 0
which shows that the loop will run m times until i reaches 0. Since n = 2^m, this means that m = lg n and therefore the complexity is O(lg n).
In the general case, define m := floor(lg n). The analysis above shows that the loop will iterate m times until i becomes 0. Thus, the complexity would be O(floor(lg n)), which is the same as O(lg n).

Binary vs Linear searches for unsorted N elements

I try to understand a formula when we should use quicksort. For instance, we have an array with N = 1_000_000 elements. If we will search only once, we should use a simple linear search, but if we'll do it 10 times we should use sort array O(n log n). How can I detect threshold when and for which size of input array should I use sorting and after that use binary search?
You want to solve inequality that rougly might be described as
t * n > C * n * log(n) + t * log(n)
where t is number of checks and C is some constant for sort implementation (should be determined experimentally). When you evaluate this constant, you can solve inequality numerically (with uncertainty, of course)
Like you already pointed out, it depends on the number of searches you want to do. A good threshold can come out of the following statement:
n*log[b](n) + x*log[2](n) <= x*n/2 x is the number of searches; n the input size; b the base of the logarithm for the sort, depending on the partitioning you use.
When this statement evaluates to true, you should switch methods from linear search to sort and search.
Generally speaking, a linear search through an unordered array will take n/2 steps on average, though this average will only play a big role once x approaches n. If you want to stick with big Omicron or big Theta notation then you can omit the /2 in the above.
Assuming n elements and m searches, with crude approximations
the cost of the sort will be C0.n.log n,
the cost of the m binary searches C1.m.log n,
the cost of the m linear searches C2.m.n,
with C2 ~ C1 < C0.
Now you compare
C0.n.log n + C1.m.log n vs. C2.m.n
or
C0.n.log n / (C2.n - C1.log n) vs. m
For reasonably large n, the breakeven point is about C0.log n / C2.
For instance, taking C0 / C2 = 5, n = 1000000 gives m = 100.
You should plot the complexities of both operations.
Linear search: O(n)
Sort and binary search: O(nlogn + logn)
In the plot, you will see for which values of n it makes sense to choose the one approach over the other.
This actually turned into an interesting question for me as I looked into the expected runtime of a quicksort-like algorithm when the expected split at each level is not 50/50.
the first question I wanted to answer was for random data, what is the average split at each level. It surely must be greater than 50% (for the larger subdivision). Well, given an array of size N of random values, the smallest value has a subdivision of (1, N-1), the second smallest value has a subdivision of (2, N-2) and etc. I put this in a quick script:
split = 0
for x in range(10000):
split += float(max(x, 10000 - x)) / 10000
split /= 10000
print split
And got exactly 0.75 as an answer. I'm sure I could show that this is always the exact answer, but I wanted to move on to the harder part.
Now, let's assume that even 25/75 split follows an nlogn progression for some unknown logarithm base. That means that num_comparisons(n) = n * log_b(n) and the question is to find b via statistical means (since I don't expect that model to be exact at every step). We can do this with a clever application of least-squares fitting after we use a logarithm identity to get:
C(n) = n * log(n) / log(b)
where now the logarithm can have any base, as long as log(n) and log(b) use the same base. This is a linear equation just waiting for some data! So I wrote another script to generate an array of xs and filled it with C(n) and ys and filled it with n*log(n) and used numpy to tell me the slope of that least squares fit, which I expect to equal 1 / log(b). I ran the script and got b inside of [2.16, 2.3] depending on how high I set n to (I varied n from 100 to 100'000'000). The fact that b seems to vary depending on n shows that my model isn't exact, but I think that's okay for this example.
To actually answer your question now, with these assumptions, we can solve for the cutoff point of when: N * n/2 = n*log_2.3(n) + N * log_2.3(n). I'm just assuming that the binary search will have the same logarithm base as the sorting method for a 25/75 split. Isolating N you get:
N = n*log_2.3(n) / (n/2 - log_2.3(n))
If your number of searches N exceeds the quantity on the RHS (where n is the size of the array in question) then it will be more efficient to sort once and use binary searches on that.

Big O Recursive Method

I have a method called binary sum
Algorithm BinarySum(A, i, n):
Input: An array A and integers i and n
Output: The sum of the n integers in A starting at index i
if n = 1 then
return A[i]
return BinarySum(A, i, n/ 2) + BinarySum(A, i + n/ 2, n/ 2)
Ignoring the fact of making a simple problem complicated I have been asked to find the Big O. Here is my thought process. For an array of size N I will be making 1 + 2 + 4 .. + N recursive calls. This is close to half the sum from 1 to N so I will say it is about N(N + 1)/4. After making this many calls now I need to add them together. So once again I need to perform N(N+1)/4 additions. Adding them together we are left with N^2 as the dominate term.
So would the big O of this algorithm be O(N^2)? Or am I doing something wrong. It feels strange to have binary recursion and not have a 2^n or log n in the final answer
There are in-fact 2^n and log n terms in the final result... sort of.
For each call to a sub-array of length n, two recursive calls are made to both halves of this array, plus a constant amount of work (if-statement, addition, pushing onto the call stack etc). Thus the recurrence relation is given by:
At this point we could just use the Master theorem to directly arrive at the final result - O(n). But let's instead derive it by repeated expansion:
The stopping condition n = 1 gives the maximum value of m (ignoring rounding):
In step (*) we used the standard formula for geometric series. So as you can see the answer does involve log n and 2^n terms in a sense, but they "cancel" out to give a simple linear term, which is the same as for a simple loop.

How do I prove that this algorithm is O(loglogn)

How do I prove that this algorithm is O(loglogn)
i <-- 2
while i < n
i <-- i*i
Well, I believe we should first start with n / 2^k < 1, but that will yield O(logn). Any ideas?
I want to look at this in a simple way, what happends after one iteration, after two iterations, and after k iterations, I think this way I'll be able to understand better how to compute this correctly. What do you think about this approach? I'm new to this, so excuse me.
Let us use the name A for the presented algorithm. Let us further assume that the input variable is n.
Then, strictly speaking, A is not in the runtime complexity class O(log log n). A must be in (Omega)(n), i.e. in terms of runtime complexity, it is at least linear. Why? There is i*i, a multiplication that depends on i that depends on n. A naive multiplication approach might require quadratic runtime complexity. More sophisticated approaches will reduce the exponent, but not below linear in terms of n.
For the sake of completeness, the comparison < is also a linear operation.
For the purpose of the question, we could assume that multiplication and comparison is done in constant time. Then, we can formulate the question: How often do we have to apply the constant time operations > and * until A terminates for a given n?
Simply speaking, the multiplication reduces the effort logarithmic and the iterative application leads to a further logarithmic reduce. How can we show this? Thankfully to the simple structure of A, we can transform A to an equation that we can solve directly.
A changes i to the power of 2 and does this repeatedly. Therefore, A calculates 2^(2^k). When is 2^(2^k) = n? To solve this for k, we apply the logarithm (base 2) two times, i.e., with ignoring the bases, we get k = log log n. The < can be ignored due to the O notation.
To answer the very last part of the original question, we can also look at examples for each iteration. We can note the state of i at the end of the while loop body for each iteration of the while loop:
1: i = 4 = 2^2 = 2^(2^1)
2: i = 16 = 4*4 = (2^2)*(2^2) = 2^(2^2)
3: i = 256 = 16*16 = 4*4 = (2^2)*(2^2)*(2^2)*(2^2) = 2^(2^3)
4: i = 65536 = 256*256 = 16*16*16*16 = ... = 2^(2^4)
...
k: i = ... = 2^(2^k)

Exponential complexity of a program

I got a program which depends on two entries of sizes m and n respectively. If T(m,n) is the running time of the problem, it follows:
T(m,n)=T(m-1,n-1)+T(m-1,n)+T(m,n-1)+C
for a given constant C.
I could prove that the time complexity is in Omega(2^min(m,n)). However, it seems that it is of complexity Omega(2^max(m,n)) (it was just confirmed to me) but I can't find a formal proof. Anyone has a trick?
Thanks in advance!
From the top of my head:
I assume the recursion of T(m,n) stops when you reach T(0,x) or T(x,0).
You have 3 factors contributing to the complexity:
Factor 1: T(m-1,n-1) decreases both m and n, so its lenght until m or n become 0 is min(m,n) steps (See note below)
Factor 2: T(m-1,n) decreases m only, so its length until m=0 is m steps.
Factor 3: T(m,n-1) same as above but until n=0 is n steps.
The overall complexity is the greater of all complexities, so it must be related to maximum of
max( min(m,n) steps, m steps, n steps) = max(m,n)
rather than min(m,n).
I guess you can fill in the details. The constant C does not contribute, or more precisely contributes with O(1), which is the lowest of all complexities here.
Note about item 1: This factor has also a branch for m-1 until 0 and n-1 until 0, so strictly speaking it s complexity will also be max(m,n)
First, you should define that, for all values of x, T(0, x) = T(x, 0) = 0 to have your recursion stop - or at least T(0, x) = T(x, 0) = C.
Second, "time complexity is in Omega(2^min(m,n))" must obviously be wrong. Set m=10000 and n=1. Now try to prove to me that the complexity is the same as m=1 and n=1. The T(m-1, n-1) and T(m, n-1) parts disappear quickly, but you still have to walk the T(m-1, n) part.
Third, this observation directly leads to 2^max(m, n). Try to find out the number of recursion steps for a few low values of m and n. Then try to make up a formula for the number of steps depending on m and n. (Hint: Fibonacci). When you have that formula, you're finished.

Resources