Changing the randomized select algorithm affect on runtime - algorithm

What happens to Randomized select algorithm running time if we change line 8 in the code from q-1 to q in CLRS book page 216 ?
what I found is that algorithm should still work and there shouldn't be any change in running time since it depends only on RANDOMIZED PARTITION subroutine. Is it true ?
Randomized-Select (A,p,r,i)
// Finds the ith smallest value in A[p .. r].
if (p = r)
return A[p]
q = Randomized-Partition(A,p,r)
k = q-p+1 // k = size of low side + 1 (pivot)
if (i = k)
return A[q]
else if (i<k)
return Randomized-Select(A,p,q-1,i)
else
return Randomized-Select(A,q+1,r,i-k)

I-th statistics might be in the:
left part - range p ..q-1
right part - range q+1..r
exactly at index q
The last case happens when condition fulfills:
if (i = k)
return A[q]
otherwise we know that q-th element never will be i-th statistics, so it is not wise to treat it again and again at later iterations (recursion levels).
Proposed modification won't change complexity but real run time might increase a bit
(average case n + n/2 + n/4 + ... + 1=2n vs n + (n/2+1) + (n/4+1) + ... + 1=2n+log(n))

Related

Fast algorithm for distributing a value over a histogram?

I am looking for a fast (both in terms of complexity (the size of the problem may get close to 2^32) and in terms of the constant) algorithm, that doesn't necessarily have to compute the optimal solution (so a heuristic is acceptable if it produces results "close" to the optimal and has a "considerable" advantage in terms of computation time compared to computing the optimal solution) for a specific problem.
I have an integer histogram A: |A| = n, A[i]>0; and a value R: 0<R<=A[0]+...+A[n-1]. I must distribute -R over the histogram as evenly as possible. Formally this means something like this (there is some additional information in the formal notation too): I need to find B, such that |B| = |A| && B[i] = A[i] - C[i], where 0<=C[i]<=A[i] && C[0]+...+C[n-1] = R and C must minimize the expressions: L_2 = C[0]^2 + ... + C[n-1]^2 and L_infinity = max(C[0], ..., C[n-1]). Just from the formulation one can see that the problem doesn't necessarily have a unique solution (consider A[0] = 1, A[1] = 1 and R = 1, then both B[0]=0, B[1]=1 and B'[0]=1, B'[1]=0 are optimal solutions), an additional constraint may be added such as if A[i]<A[j] then C[i]<C[j] but it is not as important in my case. Naively one can iterate over all possibilities for C[i] (R-combination with repetitions) and find the optimal solutions, but obviously that is not very fast for larger n.
Another possible solution is finding q = R/n and r=R%n, then iterating over all elements and storing diff[i] = A[i]-q, if diff[i]<=0 then r-=diff[i] && B[i] = 0 && remove A[i], then continue with all non-removed A[i], by setting them to A[i] = diff[i], R = r, and n=n-removedElementsCount. If iterating this process, then at each step we would remove at least one element, until we reach the point where q == 0 or we have only 1 element, then we just need to only have A[i]-=1 for R such elements from A, since by then R<n in the q==0 case or just have A[i]-=R if we are in the case where we have only 1 element leftover (the case where we have 0 elements is trivial). Since we remove at least one element each step, and we need to iterate over (n - step) elements in the worst case, then we have a complexity of O((1+...+n)) = O(n^2).
I am hoping that somebody is already familiar with a better algorithm or if you have any ideas I'll be glad to hear them (I am aware that this can be regarded as an optimization problem also).
edit: made R positive so it would be easier to read.
Edit 2: I realized I messed up the optimization criteria.
Turn your histogram into an array of (value, index) pairs, and then turn it into a min heap. This operation is O(n).
Now your C is going to take some set of values to 0, reduce some by the max amount, and the rest by 1 less than the max amount. The max amount that you'd like to reduce everything by is easy to calculate, it is R/n rounded up.
Now go through the heap. As long as the value for the bottom of the heap is < ceil(R/size of heap), that value at that index will be set to zero, and remove that from the heap in time O(log(n)). Once that loop finishes, you can assign the max value and 1 less than the max value randomly to the rest.
This will run in O(n log(n)) worst time. You will hit that worst case when O(n) elements have to be zeroed out.
I came up with a very simple greedy algorithm in O(n*log(n)) time (if somebody manages to solve it in O(n) though I'll be glad to hear).
Algorithm:
Given: integer array: A[0],...,A[|A|-1]: A[i]>=0; integer: R0: 0<=R0<=A[0]+...+A[|A|-1].
Base:
Sort A in ascending order - takes O(n*log(n) time.
Set i = 0; R = R0; n = |A|; q = floor(R/n); r = R - q*n; d = q;.
if(i==|A| or R==0) goto 6.;
if(i>=|A|-r) d = q + 1;
4.
if(A[i]>=d)
{
R-=d;
A[i]-=d;
}
else
{
R-=A[i];
A[i] = 0;
n = |A|-(i+1);
q = floor(R/n);
d = q;
r = R - q*n;
}
i=i+1; goto 2.;
if(R>0) A[|A|-1] -= R; return A;
Informal solution optimality proof:
Let n = |A|.
Case 0: n==1 -> C[0] = R
Case 1: n>1 && A[i]>=q && A[j]>=q+1 for j>=max(0,n-r)
The optimal solution is given by C[i] = q for i<n-r && C[j] = q+1 for i>=n-r.
Assume there is another optimal solution given by C'[i] = C[i] + E[i], where the constraints for E are: E[0]+...+E[m-1]=0 (otherwise C' would violate C'[0] + ... + C'[n-1] = R), C[i]>=-E[i] (otherwise C'[i] would violate the non-negativity constraint), E[i] <= A[i] - C[i] (from C'[i]<=A[i]), and E[i]<=E[j] for i<=j (from C[i]<=C[j] for A[i]<=A[j] && A[i]<=A[j] for i<=j), then:
L_2' - L_2 = 2*q*(E[0]+...+E[n-r-1]) + 2*(q+1)*(E[n-r]+...+E[n-1]) + (E[0]^2 + ... + E[n-1]^2) = 2*q*0 + (E[0]^2 + ... + E[n-1]^2) + 2*(E[n-r] + ... + E[n-1]) >= 0
The last inequality is true since for every term 2*E[n-i], 1<=i<=r, there is a corresponding term E[n-i]^2, 1<=i<=r to cancel it out if it is negative at least for E[n-i]<-1. Let us analyze the case where 2*E[n-i] = -2, obviously E[n-i]^2 = 1 is not enough to cancel it out in this case. However, since all elements of E sum to 0, there exists j!=n-i: such that E[j] compensates for it, since we have the term E[j]^2. From the last inequality follows L_2<=L_2' for every possible solution C', this implies that C minimizes L_2. It is trivial to see that the L_inf minimization is also satisfied: L_inf = q + (r>0) <= L_inf' = max(q+E[0], ... , q+E[n-r-1], q+1+E[n-r], ... , q+1+E[n-1]), if we were to have an E[i]>1 for i<n-r, or E[j]>0 for j>=n-r, we get a higher maximum, we can also never decrease the maximum, since E sums to 0.
Case 2: n>1 && there exists k: A[k]<q
In this case the optimal solution requires that C[k] = A[k] for all k: A[k]<q. Let us assume that there exists an optimal solution C' such that C'[k]<A[k]<q -> C'[k]<q-1. There exists i>=k, such that C'[i]<q-1 && C'[i+1]>=q-1. Assume there is no such i, then C'[k] == C[n-1] < q-1, and C'[0]+...+C'[n-1]<n*q-n<R, this is a contradiction, which implies that such an i actually does exist. There also exists a j>k such that C[j]>q && C[j-1]<C[j] (if we assume this is untrue we once again get a contradiction with C summing to R). We needed these proofs in order to satisfy C[t]<=C[l] for t<=l. Let us consider the modified solution C''[t] = C'[t] for t!=i,j; and C''[i] = C'[i]+1, and C''[j] = C'[j]-1. L_2' - L_2'' = C'[i]^2 - (C'[i]+1)^2 + C'[j]^2 - (C'[j]-1)^2 = -2*C'[i] + 2*C'[j] - 2 = 2*((C'[j]-C'[i])-1) > 2*(1-1) = 0. The last inequality follows from (C'[i]<q-1 && C'[j]>q) -> C'[j] - C'[i] > 1. We proved that L_2'>L_2'' if we increment C[i]: C[i]<A[i]<q. By induction the optimal solution should have C[l]=A[l] for all l: A[l]<q. Once this is done one can inductively continue with the reduced problem n' = n-(i+1), R' = R - (C[0]+...+C[i]), q' = floor(R'/n'), r' = R' - q'*n', D[0] = A[i+1], ..., D[n'-1] = A[n-1].
Case 3: n>1 && A[i]>=q && A[j]<q+1 for j==max(0,n-r)
Since A[k]>=A[i] for k>=i, that implies that A[i]<q+1 for i<=j. But since we have also q<=A[i] this implies A[i]==q, so we cannot add any of the remainder in any C[i] : i<=j. The optimality of C[i]=A[i]=q for i<j follows from a proof done in case 1 (the proof there was more general with q+1 terms). Since the problem is optimal for 0<=i<j we can start solving a reduced problem: D[0] = A[j],...,D[n-j] = A[n-1].
Case 0, 1, 2, 3 are all the possible cases. Apart from case 0 and case 1 which give the solution explicitly, the solution in 2 and 3 reduces the problem to a smaller one which once again falls in one of the cases. Since the problem is reduced at every step, we get the final solution in a finite number of steps. We also never refer to an element more than once which implies O(n) time, but we need O(n*log(n)) for the sorting, so in the end we have O(n*log(n)) time complexity for the algorithm. I am unsure whether this problem can be solved in O(n) time, but I have the feeling that there is no way to get away without the sorting since case 2 and 3 rely on it heavily, so maybe O(n*log(n)) is the best possible complexity that can be achieved.

How do you determine the average-case complexity of this algorithm?

It's usually easy to calculate the time complexity for the best case and the worst case, but when it comes to the average case especially when there's a probability p given, I don't know where to start.
Let's look at the following algorithm to compute the product of all the elements in a matrix:
int computeProduct(int[][] A, int m, int n) {
int product = 1;
for (int i = 0; i < m; i++ {
for (int j = 0; j < n; j++) {
if (A[i][j] == 0) return 0;
product = product * A[i][j];
}
}
return product;
}
Suppose p is the probability of A[i][j] being 0 (i.e. the algorithm terminates there, return 0); how do we derive the average case time complexity for this algorithm?
Let’s consider a related problem. Imagine you have a coin that flips heads with probability p. How many times, on expectation, do you need to flip the coin before it comes up heads? The answer is 1/p, since
There’s a p chance that you need one flip.
There’s a p(1-p) chance that you need two flips (the first flip has to go tails and the second has to go heads).
There’s a p(1-p)^2 chance that you need three flips (the first two flips need to go tails and the third has to go heads)
...
There’s a p(1-p)^(k-1) chance that you need k flips (the first k-1 flips need to go tails and the kth needs to go heads.)
So this means the expected value of the number of flips is
p + 2p(1 - p) + 3p(1 - p)^2 + 4p(1 - p)^3 + ...
= p(1(1 - p)^0 + 2(1 - p)^1 + 3(1 - p)^2 + ...)
So now we need to work out what this summation is. The general form is
p sum from k = 1 to infinity (k(1 - p)^k).
Rather than solving this particular summation, let's make this more general. Let x be some variable that, later, we'll set equal to 1 - p, but which for now we'll treat as a free value. Then we can rewrite the above summation as
p sum from k = 1 to infinity (kx^(k-1)).
Now for a cute trick: notice that the inside of this expression is the derivative of x^k with respect to x. Therefore, this sum is
p sum from k = 1 to infinity (d/dx x^k).
The derivative is a linear operator, so we can move it out to the front:
p d/dx sum from k = 1 to infinity (x^k)
That inner sum (x + x^2 + x^3 + ...) is the Taylor series for 1 / (1 - x) - 1, so we can simplify this to get
p d/dx (1 / (1 - x) - 1)
= p / (1 - x)^2
And since we picked x = 1 - p, this simplifies to
p / (1 - (1 - p))^2
= p / p^2
= 1 / p
Whew! That was a long derivation. But it shows that the expected number of coin tosses needed is 1/p.
Now, in your case, your algorithm can be thought of as tossing mn coins that come up heads with probability p and stopping if any of them come up heads. Surely, the expected number of coins you’d need to toss won’t be more than the case where you’re allowed to flip infinitely often, so your expected runtime is at most O(1 / p) (assuming p > 0).
If we assume that p is independent of m and n, then we can notice that at after some initial growth, each added term into our summation as we increase the number of flips is exponentially lower than the previous ones. More specifically, after adding in roughly logarithmically many terms into the sum we’ll be off from the total in the case of the infinite summation. Therefore, provided that mn is roughly larger than Θ(log p), the sum ends up being Θ(1 / p). So in a big-O sense, if mn is independent of p, the runtime is Θ(1 / p).

How to solve the equation sum{max(a_i, x)}=y with variable x? Is there any algorithm with O(n) time complexity?

I am trying to find an algorithm to solve the following equation:
∑ max(ai, x) = y
in which the ai are constants and x is the variable.
I can find an algorithm with O(n log n) time complexity as follows:
First of all, sort the ai in O(n log n) time, and arrange intervals
(−∞, a0), (a0, a1), …, (ai, ai+1), …, (an−1, an), (an, ∞)
Then, for each interval, assume x belongs to this interval, and solve the equation. We could get a x̂, and then test whether x̂ belongs to this interval or not. If x̂ belongs to the corresponding interval, we will assign x̂ to x, and return x. On the other hand, we will try the next interval until we get the solution.
The above method is an O(n log n) algorithm due to the sort. With the definition of the equation-solving problem, I expect an algorithm with O(n) time complexity. Is there any reference for this problem?
First of all, this only has a solution if the sum of all a_i is smaller than y. You should check this first, because the algorithm below depends on this property.
Assume that we have chosen some pivot p from all a_i and want to calculate the x that corresponds to the interval [p, q), where q is the next larger a_i. This is:
If you move p to the next larger a_i, x changes as follows:
, where p' is the new pivot and n is the old number of a_i that are smaller or equal to p. Under the assumption that the sum of all a_i is smaller than y, this clearly leads to a decrease of x. Similarly, if we choose a smaller p, x is increased.
Coming back to the first equation, we can observe the following: If x is smaller than p, we should choose a smaller p. If x is greater than the smallest of the greater a_is, we should choose a larger p. In every other case, we have found the right x.
This can be utilized in a quick select procedure. #MvG's comment brought me onto this track. All credits for the quick select idea go to him. Here is some pseudo code (modified version from Wikipedia):
findX(list, y)
left := 0
right := length(list) - 1
sumGreater := 0 // the sum of all a_i greater than the current interval
numSmaller := 0 // the number of all a_i smaller than the current interval
minGreater := inf //the minimum of all a_i greater than the current interval
loop
if left = right
return (y - sumGreater) / (numSmaller + 1)
pivotIndex := medianOfMedians(list, left, right)
//the partition function will also sum the elements larger than the pivot,
//count the elements smaller than the pivot, and find the minimum of the
//larger elements
(pivotIndex, partialSumGreater, partialNumSmaller, partialMinGreater)
:= partition(list, left, right, pivotIndex)
x := (y - sumGreater - partialSumGreater) / (numSmaller + partialNumSmaller + 1)
if(x >= list[pivotIndex] && x < min(partialMinGreater, minGreater))
return x
else if x < list[pivotIndex]
right := pivotIndex - 1
minGreater := list[pivotIndex]
sumGreater += partialSumGreater + list[pivotIndex]
else
left := pivotIndex + 1
numSmaller += partialNumSmaller + 1
The key idea is that the partitioning function gathers some additional statistics. This does not change the time complexity of the partitioning function because it requires O(n) additional operations, leaving a total time complexity of O(n) for the partitioning function. The medianOfMedians function is also linear in time. The remaining operations in the loop are constant time. Assuming that the median of medians yields good pivots, the total time of the entire algorithm is approximately O(n + n/2 + n/4 + n/8 ...) = O(n).
Since comments might get deleted, I'm turning my own comments into a coherent answer. Contrary to the original question, I'm using indices 1 through n, avoiding the a0 originally used. So this is consistent one-based indexing using inclusive indices.
Assume for the moment that bi are the coefficients from your input, but in sorted order, so bi ≤ bi+1. As you essentially already wrote, if bi ≤ x ≤ bi+1 then the result is i ⋅ x + bi+1 + ⋯ + bn since the first i terms will use the x and the other terms will use the bj. Solving for x you get x = (y − bi+1 − ⋯ - bn) / i and putting that back into your inequality you have i ⋅ bi ≤ y − bi+1 − ⋯ − bn ≤ i ⋅ bi+1. Concentrating on one of the inequalities, you want the largest i such that
i ⋅ bi ≤ y − bi+1 − ⋯ − bn       (subsequently called “the inequality”)
But in order to make this work on unsorted ai, you'd need something similar to the median of medians. That is an algorithm which achieves O(n) guaranteed worst-case behavior for the problem of selecting a median, where the typical quickselect would take O(n²) in the worst case although it usually does quite well in practice.
Actually your problem is not that different from quickselect. You can pick a pivot coefficient, and split the remainder into larger and smaller values. Then you evaluate the inequality for the pivot element. If it is satisfied, you recurse into the list of larger elements, otherwise you recurse into the list of smaller elements, until at some point you have two adjacent elements, one which satisfies the inequality and one which does not.
This is O(n²) in the worst case, since you might need O(n) recursive calls, each of them taking O(n) time to process its input. Just like the O(n²) quickselect itself is suboptimal. The median-of-medians shows that that problem can indeed be solved in O(n). So we either need to find a similar solution here, or reformulate this problem here in terms of finding the median, or write some algorithm wich makes use of the median in a reasonable way.
Actually Nico Schertler found a way to achieve that last option: Take the algorithm I outlined above, but choose the pivot element to be the median. That way you can guarantee that each recursive call will process at most half as much input as the previous call. Since the median of medians itself is O(n) this can be done without exceeding the O(n) bound for each recursive call.
So in pseudocode it's like this (using inclusive indices throughout):
# f: Process whole problem with coefficients a_1 through a_n
f(y, a, n) := begin
if y < (sum of a_i for i from 1 through n): # O(n)
throw Error "Cannot satisfy equation" # Or omit check and risk division by zero
return g(a, 1, n, y) # O(n)
end
# g: Recursively process part of the problem, namely a_l through a_r
# Precondition: we know inequality holds for i = l - 1 and fails for i = r + 1
# a: the array as provided to f; will get modified in place
# l: left index (inclusive)
# r: right index (inclusive)
# y: (original y) - (sum of a_j for j from r + 1 through n)
g(a, l, r, y) := begin # process a_l through a_r O(r-l)
if r < l: # inequality holds in r but fails in l O(1)
return y / r # compute x for the case of i = r O(1)
m = median(a, l, r) # computed using median of medians O(r-l)
i = floor((l + r) / 2) # index of median, with same tie breaks O(1)
partition(a, l, r, m) # so a_l…a_(i-1) ≤ a_i=m ≤ a_(i+1)…a_r O(r-l)
rhs = y - (sum of a_j for j from i + 1 to r) # O((r-l)/2)
if i * a_i ≤ rhs: # condition holds, check larger i
return g(a, i + 1, r, y) # recurse in right half of list O((r-l)/2)
else: # condition fails, check smaller i
return g(a, l, i - 1, rhs - m) # recurse in left half of list O((r-l)/2)
end

Time complexity analysis of function with recursion inside loop

I am trying to analysis time complexity of below function. This function is used to check if a string is made of other strings.
set<string> s; // s has been initialized and stores all the strings
bool fun(string word) {
int len = word.size();
// something else that can also return true or false with O(1) complexity
for (int i=1; i<=len; ++i) {
string prefix = word.substr(0,i);
string suffix = word.substr(i);
if (prefix in s && fun(suffix))
return true;
else
return false;
}
}
I think the time complexity is O(n) where n is the length of word (am I right?). But as the recursion is inside the loop, I don't know how to prove it.
Edit:
This code is not a correct C++ code (e.g., prefix in s). I just show the idea of this function, and want to know how to analysis its time complexity
The way to analyze this is by developing a recursion relationship based on the length of the input and the (unknown) probability that a prefix is in s. Let's assume that the probability of a prefix being in s is given by some function pr(L) of the length L of the prefix. Let the complexity (number of operations) be given by T(len).
If len == 0 (word is the empty string), then T = 1. (The function is missing a final return statement after the loop, but we're assuming that the actual code is only a sketch of the idea, not what's actually executing).
For each loop iteration, denote the loop body complexity by T(len; i). If the prefix is not in s, then the body has constant complexity (T(len; i) = 1). This event has probability 1 - pr(i).
If the prefix is in s, then the function returns true or false according to the recursive call to fun(suffix), which has complexity T(len - i). This event has probability pr(i).
So for each value of i, the loop body complexity is:
T(len; i) = 1 * (1 - pr(i)) + T(len - i) * pr(i)
Finally (and this depends on the intended logic, not the posted code), we have
T(len) = sum i=1...len(T(len; i))
For simplicity, let's treat pr(i) as a constant function with value 0.5. Then the recursive relationship for T(len) is (up to a constant factor, which is unimportant for O() calculations):
T(len) = sum i=1...len(1 + T(len - i)) = len + sum i=0...len-1(T(i))
As noted above, the boundary condition is T(0) = 1. This can be solved by standard recursive function methods. Let's look at the first few terms:
len T(len)
0 1
1 1 + 1 = 2
2 2 + 2 + 1 = 5
3 3 + (4 + 2 + 1) = 11
4 4 + (11 + 5 + 2 + 1) = 23
5 5 + (23 + 11 + 5 + 2 + 1) = 47
The pattern is clearly T(len) = 2 * T(len - 1) + 1. This corresponds to exponential complexity:
T(n) = O(2n)
Of course, this result depends on the assumption we made about pr(i). (For instance, if pr(i) = 0 for all i, then T(n) = O(1). There would also be non-exponential growth if pr(i) had a maximum prefix length—pr(i) = 0 for all i > M for some M.) The assumption that pr(i) is independent of i is probably unrealistic, but this really depends on how s is populated.
Assuming that you've fixed the bugs others have noted, then the i values are the places that the string is being split (each i is the leftmost splitpoint, and then you recurse on everything to the right of i). This means that if you were to unwind the recursion, you are looking at up to n-1 different split points, and asking if each substring is a valid word. Things are ok if the beginning of word doesn't have a lot of elements from your set, since then you can skip the recursion. But in the worst case, prefix in s is always true, and you try every possible subset of the n-1 split points. This gives 2^{n-1} different splitting sets, multiplied by the length of each such set.

Time complexity

The Problem is finding majority elements in an array.
I understand how this algorithm works, but i don't know why this has O(nlogn) as a time complexity.....
a. Both return \no majority." Then neither half of the array has a majority
element, and the combined array cannot have a majority element. Therefore,
the call returns \no majority."
b. The right side is a majority, and the left isn't. The only possible majority for
this level is with the value that formed a majority on the right half, therefore,
just compare every element in the combined array and count the number of
elements that are equal to this value. If it is a majority element then return
that element, else return \no majority."
c. Same as above, but with the left returning a majority, and the right returning
\no majority."
d. Both sub-calls return a majority element. Count the number of elements equal
to both of the candidates for majority element. If either is a majority element
in the combined array, then return it. Otherwise, return \no majority."
The top level simply returns either a majority element or that no majority element
exists in the same way.
Therefore, T(1) = 0 and T(n) = 2T(n/2) + 2n = O(nlogn)
I think,
Every recursion it compares the majority element to whole array which takes 2n.
T(n) = 2T(n/2) + 2n = 2(2T(n/4) + 2n) +
2n = ..... = 2^kT(n/2^k) + 2n + 4n + 8n........ 2^kn = O(n^2)
T(n) = 2T(n/2) + 2n
The question is how many iterations does it take for n to get to 1.
We divide by 2 in each iteration so we get a series: n , n/2 , n/4 , n/8 ... n/(n^k)
So, let's find k that will bring us to 1 (last iteration):
n/(2^k)=1 .. n=2^k ... k=log(n)
So we got log(n) iterations.
Now, in each iteration we do 2n operations (less because we divide n by 2 each time) but in worth case scenario lets say 2n.
So in total, we got log(n) iterations with O(n) operations: nlog(n)
I'm not sure if I understand, but couldn't you just create a hash map, walk over the array, incrementing hash[value] at every step, then sort the hash map (xlogx time complexity) and compare the top two elements? This would cost you O(n) + O(mlogm) + 2 = O(n + mlogm), with n the size of the array and m the amount of different elements in the vector.
Am I mistaken here? Or ...?
When you do this recursively, you split the array in two for each level, make a call for each half, then makes one of the tests a - d. The test a requires no looping, the other tests requires looping through the entire array. By average you will loop through (0 + 1 + 1 + 1) / 4 = 3 / 4 of the array for each level in the recursion.
The number of levels in the recursion is based on the size of the array. As you split the array in half each level, the number of levels will be log2(n).
So, the total work is (n * 3/4) * log2(n). As constants are irrelevant to the time complexity, and all logarithms are the same, the complexity is O(n * log n).
Edit:
If someone is wondering about the algorithm, here's a C# implementation. :)
private int? FindMajority(int[] arr, int start, int len) {
if (len == 1) return arr[start];
int len1 = len / 2, len2 = len - len1;
int? m1 = FindMajority(arr, start, len1);
int? m2 = FindMajority(arr, start + len1, len2);
int cnt1 = m1.HasValue ? arr.Skip(start).Take(len).Count(n => n == m1.Value) : 0;
if (cnt1 * 2 >= len) return m1;
int cnt2 = m2.HasValue ? arr.Skip(start).Take(len).Count(n => n == m2.Value) : 0;
if (cnt2 * 2 >= len) return m2;
return null;
}
This guy has a lot of videos on recurrence relation, and the different techniques you can use to solve them:
https://www.youtube.com/watch?v=TEzbkIggJfo&list=PLj68PAxAKGoyyBwi6qrfcsqE_4trSO1yL
Basically for this problem I would use the Master Theorem:
https://youtu.be/i5kTZof1LRY
T(1) = 0 and T(n) = 2T(n/2) + 2n
Master Theorem ==> AT(n/B) + 2n^D, so in this case A=2, B=3, D=1
So according to the Master Theorem this is O(nlogn)
You can also use another method to solve this (below) it would just take a little bit more time:
https://youtu.be/TEzbkIggJfo?list=PLj68PAxAKGoyyBwi6qrfcsqE_4trSO1yL
I hope this helps you out !

Resources