Expected running time of randomized binary search - algorithm

I want to calculate the expected running time of randomized binary search of the following pseudo-code, where instead of considering the midpoint as the pivot, a random point is selected:
BinarySearch(x, A, start, end)
if(start == end)
if(A[end] == x)
return end
else
return -1
else
mid = RANDOM(start, end)
if(A[mid] == x)
return mid
else if(A[mid] > x)
return BinarySearch(x, A, start, mid-1)
else
return BinarySearch(x, A, mid+1, end)
I looked at this previous question, which has the following:
T(n) = sum ( T(r)*Pr(search space becomes r) ) + O(1) = sum ( T(r) )/n + O(1)
How is this obtained?
sum( T(r)*Pr(search space becomes r) )
And in the last line of calculation, how was this obtained?
T(n) = 1 + 1/2 + 1/3 + ... + 1/(n-1) = H(n-1) < H(n) = O(log n)

sum( T(r)*Pr(search space becomes r) )
This line obtained by observing fact that you can choose any point to partition array, so to get expected time you need to sum up all possiblities multiplied with their probabilities. See expected value.
T(n) = 1 + 1/2 + 1/3 + ... + 1/(n-1) = H(n-1) < H(n) = O(log n)
About this line. Well you can think of it as of integral of 1/x on [1, n] and it is log(n) - log(1) = log(n). See Harmonic series.

I would argue that the recurrence only holds when the target element is the first/last element of the array. Assume that the target element is in the middle, then in the first call we reduce the size of the array to be up to n/2, not n as in the recursion. Moreover, the position of the target element may change with each recursive call. For the proof of O(log n) complexity you may want to see my answer which uses another approach here.

Related

How can I calculate the time function T(n) of the following code?

x=0;
for(int i=1 ; i<=n ; i++){
for(int j=1 ; j<=n ; j++){
x++;
n--;
}
}
By testing the code, the nested FOR loop recurs ⌈n/2⌉ times per steps of the first For loop.
But I don't know how to formulate these rules with sigmas. I would really appreciate if anyone can help me with this.
You can express T(n) as T(n-2)+1, i.e. T(n)=T(n-2)+1 Or its expected time complexity is O(n/2) => O(n).
Edit: T(n-2)+1 expression is evaluated as you can see if you increase n-2 by 2 means when n-2 became n, the number of times the loop will be executed is the 1 + number of time loop executed for n-2. 1 is because you are incrementing j and decrementing n simultaneously. it is exactly the same as incrementing j by 2.
Let's compute the exact value of x.
TL;DR: x(N) = N-[N/2^i+1], where i is the lowest number, satisfying the condition: (i+1) 2^i > N. As Mariano Demarchi said, T(N)=O(N).
First we will check how variables change after each inner loop. Let we have (n, i, x) values between 2 and 3 lines in code (before the inner loop):
How many iterations will happens? Each iteration increases j and decreases n, so the distance between them decreases by two. Start distance is n-1, and final, after the loop, is 0 (if n is odd) or -1 (if n is even). Thus if n=2k, the answer is k, otherwise k+1. So, the inner loop makes [(n+1)/2] = d iterations.
Thus x will increase by d, n becomes n-d and i becomes i+1.
(n, i, x) -> (n-d, i+1, x+d) or equal ([n/2], i+1, x + [(n+1)/2])
Now concentrate on values of n and i variables in the big loop:
They changes like that: (n, i) -> ([n/2], i+1)
The stop-condition is [N/2^i] < i+1, which is equals to (i+1)*2^i > N. Of course, we need minimal i, satisfying the condition.
So, i is the first number satisfying the condition and we DO NOT SUM further:
x = [(N+1)/2] + [([N/2]+1)/2] + [([N/2^2]+1)/2] + ... + [([N/2^i-1]+1)/2]
By the number theory magic (not related on this question) this series is equals to N (1-1/2^i+1). Particularly, if N is a power of 2 minus 1, we can see it easily.
So, this code returns exactly the same value in O(log(N)) time.
// finding i
unsigned long long i = 0;
while ((i + 1) * (1ll << i) < n)
++i;
// finding x
x = n - (n >> (i + 1));
In the inner loop, given that n decrements at the same time that j increments, n is going to be lower than j at the middle of the difference between both initial values, that is (n-1)/2.
That's why your tests show that the inner loop runs ⌈n/2⌉ times per each iteration of the outer loop.
Then the outer loop is going to stop when for the i that satisfies this equality n/2^i = i-1. This affects the outer loop stopping condition.
T(n)
=
n/2 + T(n/2)
=
n/2 + n/4 + T(n/4)
=
n (1/2 + 1/4 + ... + 1/(2^i))
This series converges to n so that algorithm is O(n).

Complexity Algorithm Analysis with if

I have the following code. What time complexity does it have?
I have tried to write a recurrence relation for it but I can't understand when will the algorithm add 1 to n or divide n by 4.
void T(int n) {
for (i = 0; i < n; i++);
if (n == 1 || n == 0)
return;
else if (n%2 == 1)
T(n + 1);
else if (n%2 == 0)
T(n / 4);
}
You can view it like this: you always divide by four only if you have odd you add 1 to n before division. So, you should count how many times 1 was added. If there no increments then you have log4n recursive calls. Let's assume that you always have to add 1 before division. Then can rewrite it like this:
void T(int n) {
for (i = 0; i < n; i++);
if (n == 1 || n == 0)
return;
else if (n%2 == 0)
T(n / 4 + 1);
}
But n/4 + 1 < n/2, and in case of recursive call T(n/2), running time is O(log(n,4)), but base of logarithm doesn't impact running time in big-O notation because it's just like constant factor. So running time is O(log(n)).
EDIT:
As ALB pointed in a comment, there is cycle of length n. So, with accordance with master theorem running time is Theta(n). You can see it in another way as sum of n * (1 + 1/2 + 1/4 + 1/8 + ...) = 2 * n.
Interesting question. Be aware that even though your for loop is doing nothing, since it is not an optimized solution (see Dukeling's comment), it will be considered in your time complexity as if computing time was taken to iterate through it.
First part
The first section is definitely O(n).
Second part
Let's suppose for the sake of simplicity that half the time n will be odd and the other half time it will be even. Hence, the recursion is looping (n+1) half the time and (n/4) the other half.
Conclusion
For each time T(n) is called, the recursion will implicitly loop n times. Hence, The first half of the time, we will have a complexity of n * (n+1) = n^2 + n. The other half of the time, we will deal with a n * (n/4) = (1/4)n^2.
For Big O notation, we care more about the upper bound than its precise behavior. Hence, your algorithm would be bound by O(n^2).

How to work out the complexity of an algorithm?

I have two questions for algorithm analysis, and would like to know how to determine the complexity of the following two:
First:
For(int i=2; i<n; i=i*i*i)
{
//something O(1)
}
Second:
n/1 + n/2 + n/3 +...+ n/n
To the first:
It will be infinite, because 1*1*1 = 1 so i is always 1 and will never be >= n.
The second algorithm is not really an algorithm but the addition runs in O(n).
For the first Algorithm:
Suppose that the initial value of i is 2 (rather than 1 that would lead to an infinite loop as #tschaefemedia remarked).
At the 1st iteration, i==2
At the second iteration, i==2*2*2 == 2ˆ3
At the third iteration, i== (2ˆ3 * 2ˆ3 * 2ˆ3) == 2ˆ(3*3)
At the fourth iteration, i== 2ˆ(3*3) * 2ˆ(3*3) * 2ˆ(3*3) == 2ˆ(3*3*3)
...
At iteration k+1, i== 2ˆ(3*3*3*...*3) == 2ˆ(3ˆk)
Suppose for simplicity that at iteration k-1, i becomes equal to n and the loop stops. Then:
n == 2ˆ(3ˆk)
log2(n) == 3^k
log3(log2(n)) == k
So, the complexity is O(log3(log2(n)))
As for the second question, I suppose that you are giving the complexity formula. So,
n/1 + n/2 + n/3 + ... + n/n = n (1+ 1/2 + 1/3 + ... + 1/n)
This is Harmonic series and it is O(log(n))
So, the overall complexity is O(n*log(n))

Efficient Algorithm to Solve a Recursive Formula

I am given a formula f(n) where f(n) is defined, for all non-negative integers, as:
f(0) = 1
f(1) = 1
f(2) = 2
f(2n) = f(n) + f(n + 1) + n (for n > 1)
f(2n + 1) = f(n - 1) + f(n) + 1 (for n >= 1)
My goal is to find, for any given number s, the largest n where f(n) = s. If there is no such n return None. s can be up to 10^25.
I have a brute force solution using both recursion and dynamic programming, but neither is efficient enough. What concepts might help me find an efficient solution to this problem?
I want to add a little complexity analysis and estimate the size of f(n).
If you look at one recursive call of f(n), you notice, that the input n is basically divided by 2 before calling f(n) two times more, where always one call has an even and one has an odd input.
So the call tree is basically a binary tree where always the half of the nodes on a specific depth k provides a summand approx n/2k+1. The depth of the tree is log₂(n).
So the value of f(n) is in total about Θ(n/2 ⋅ log₂(n)).
Just to notice: This holds for even and odd inputs, but for even inputs the value is about an additional summand n/2 bigger. (I use Θ-notation to not have to think to much about some constants).
Now to the complexity:
Naive brute force
To calculate f(n) you have to call f(n) Θ(2log₂(n)) = Θ(n) times.
So if you want to calculate the values of f(n) until you reach s (or notice that there is no n with f(n)=s) you have to calculate f(n) s⋅log₂(s) times, which is in total Θ(s²⋅log(s)).
Dynamic programming
If you store every result of f(n), the time to calculate a f(n) reduces to Θ(1) (but it requires much more memory). So the total time complexity would reduce to Θ(s⋅log(s)).
Notice: Since we know f(n) ≤ f(n+2) for all n, you don't have to sort the values of f(n) and do a binary search.
Using binary search
Algorithm (input is s):
Set l = 1 and r = s
Set n = (l+r)/2 and round it to the next even number
calculate val = f(n).
if val == s then return n.
if val < s then set l = n
else set r = n.
goto 2
If you found a solution, fine. If not: try it again but round in step 2 to odd numbers. If this also does not return a solution, no solution exists at all.
This will take you Θ(log(s)) for the binary search and Θ(s) for the calculation of f(n) each time, so in total you get Θ(s⋅log(s)).
As you can see, this has the same complexity as the dynamic programming solution, but you don't have to save anything.
Notice: r = s does not hold for all s as an initial upper limit. However, if s is big enough, it holds. To be save, you can change the algorithm:
check first, if f(s) < s. If not, you can set l = s and r = 2s (or 2s+1 if it has to be odd).
Can you calculate the value of f(x) which x is from 0 to MAX_SIZE only once time?
what i mean is : calculate the value by DP.
f(0) = 1
f(1) = 1
f(2) = 2
f(3) = 3
f(4) = 7
f(5) = 4
... ...
f(MAX_SIZE) = ???
If the 1st step is illegal, exit. Otherwise, sort the value from small to big.
Such as 1,1,2,3,4,7,...
Now you can find whether exists n satisfied with f(n)=s in O(log(MAX_SIZE)) time.
Unfortunately, you don't mention how fast your algorithm should be. Perhaps you need to find some really clever rewrite of your formula to make it fast enough, in this case you might want to post this question on a mathematics forum.
The running time of your formula is O(n) for f(2n + 1) and O(n log n) for f(2n), according to the Master theorem, since:
T_even(n) = 2 * T(n / 2) + n / 2
T_odd(n) = 2 * T(n / 2) + 1
So the running time for the overall formula is O(n log n).
So if n is the answer to the problem, this algorithm would run in approx. O(n^2 log n), because you have to perform the formula roughly n times.
You can make this a little bit quicker by storing previous results, but of course, this is a tradeoff with memory.
Below is such a solution in Python.
D = {}
def f(n):
if n in D:
return D[n]
if n == 0 or n == 1:
return 1
if n == 2:
return 2
m = n // 2
if n % 2 == 0:
# f(2n) = f(n) + f(n + 1) + n (for n > 1)
y = f(m) + f(m + 1) + m
else:
# f(2n + 1) = f(n - 1) + f(n) + 1 (for n >= 1)
y = f(m - 1) + f(m) + 1
D[n] = y
return y
def find(s):
n = 0
y = 0
even_sol = None
while y < s:
y = f(n)
if y == s:
even_sol = n
break
n += 2
n = 1
y = 0
odd_sol = None
while y < s:
y = f(n)
if y == s:
odd_sol = n
break
n += 2
print(s,even_sol,odd_sol)
find(9992)
This recursive in every iteration for 2n and 2n+1 is increasing values, so if in any moment you will have value bigger, than s, then you can stop your algorithm.
To make effective algorithm you have to find or nice formula, that will calculate value, or make this in small loop, that will be much, much, much more effective, than your recursion. Your recursion is generally O(2^n), where loop is O(n).
This is how loop can be looking:
int[] values = new int[1000];
values[0] = 1;
values[1] = 1;
values[2] = 2;
for (int i = 3; i < values.length /2 - 1; i++) {
values[2 * i] = values[i] + values[i + 1] + i;
values[2 * i + 1] = values[i - 1] + values[i] + 1;
}
And inside this loop add condition of possible breaking it with success of failure.

complexity of a randomized search algorithm

Consider the following randomized search algorithm on a sorted array a of length n (in increasing order). x can be any element of the array.
size_t randomized_search(value_t a[], size_t n, value_t x)
size_t l = 0;
size_t r = n - 1;
while (true) {
size_t j = rand_between(l, r);
if (a[j] == x) return j;
if (a[j] < x) l = j + 1;
if (a[j] > x) r = j - 1;
}
}
What is the expectation value of the Big Theta complexity (bounded both below and above) of this function when x is selected randomly from a?
Although this seems to be log(n), I carried out an experiment with instruction count, and found out that the result grows a little faster than log(n) (according to my data, even (log(n))^1.1 better fit the result).
Someone told me that this algorithm has an exact big theta complexity (so obviously log(n)^1.1 is not the answer). So, could you please give the time complexity along with your approach to prove it? Thanks.
Update: the data from my experiment
log(n) fit result by mathematica:
log(n)^1.1 fit result:
If you're willing to switch to counting three-way compares, I can tell you the exact complexity.
Suppose that the key is at position i, and I want to know the expected number of compares with position j. I claim that position j is examined if and only if it's the first position between i and j inclusive to be examined. Since the pivot element is selected uniformly at random each time, this happens with probability 1/(|i - j| + 1).
The total complexity is the expectation over i <- {1, ..., n} of sum_{j=1}^n 1/(|i - j| + 1), which is
sum_{i=1}^n 1/n sum_{j=1}^n 1/(|i - j| + 1)
= 1/n sum_{i=1}^n (sum_{j=1}^i 1/(i - j + 1) + sum_{j=i+1}^n 1/(j - i + 1))
= 1/n sum_{i=1}^n (H(i) + H(n + 1 - i) - 1)
= 1/n sum_{i=1}^n H(i) + 1/n sum_{i=1}^n H(n + 1 - i) - 1
= 1/n sum_{i=1}^n H(i) + 1/n sum_{k=1}^n H(k) - 1 (k = n + 1 - i)
= 2 H(n + 1) - 3 + 2 H(n + 1)/n - 2/n
= 2 H(n + 1) - 3 + O(log n / n)
= 2 log n + O(1)
= Theta(log n).
(log means natural log here.) Note the -3 in the low order terms. This makes it look like the number of compares is growing faster than logarithmic at the beginning, but the asymptotic behavior dictates that it levels off. Try excluding small n and refitting your curves.
Assuming rand_between to implement sampling from a uniform probability distribution in constant time, the expected running time of this algorithm is Θ(lg n). Informal sketch of a proof: the expected value of rand_between(l, r) is (l+r)/2, the midpoint between them. So each iteration is expected to skip half of the array (assuming the size is a power of two), just like a single iteration of binary search would.
More formally, borrowing from an analysis of quickselect, observe that when you pick a random midpoint, half of the time it will be between ¼n and ¾n. Neither the left nor the right subarray has more than ¾n elements. The other half of the time, neither has more than n elements (obviously). That leads to a recurrence relation
T(n) = ½T(¾n) + ½T(n) + f(n)
where f(n) is the amount of work in each iteration. Subtracting ½T(n) from both sides, then doubling both sides, we have
½T(n) = ½T(¾n) + f(n)
T(n) = T(¾n) + 2f(n)
Now, since 2f(n) = Θ(1) = Θ(n ᶜ log⁰ n) where c = log(1) = 0, it follows by the master theorem that T(n) = Θ(n⁰ lg n) = Θ(lg n).

Resources