Merge sort time and space complexity - algorithm

Let's take this implementation of Merge Sort as an example
void mergesort(Item a[], int l, int r) {
if (r <= l) return;
int m = (r+l)/2;
mergesort(a, l, m); ------------(1)
mergesort(a, m+1, r); ------------(2)
merge(a, l, m, r);
a) The time complexity of this Merge Sort is O(n lg(n)). Will parallelizing (1) and (2) give any practical gain? Theorotically, it appears that after parallelizing them also you would end up in O(n lg(n)). But practically can we get any gains?
b) Space complexity of this Merge Sort here is O(n). However, if I choose to perform in-place merge sort using linked lists (not sure if it can be done with arrays reasonably) will the space complexity become O(lg(n)), since you have to account for recursion stack frame size?
Can we treat O(lg(n)) as constant since it cannot be more than 64? I may have misunderstood this at couple of places. What exactly is the significance of 64?
c) Sorting Algorithms Compared - Cprogramming.com says merge sort requires constant space using linked lists. How? Did they treat O(lg(n)) constant?
d) Added to get more clarity. For space complexity calculation is it fair to assume the input array or list is already in memory? When I do complexity calculations I always calculate the "Extra" space I will be needing besides the space already taken by input. Otherwise space complexity will always be O(n) or worse.

MergeSort time Complexity is O(nlgn) which is a fundamental knowledge.
Merge Sort space complexity will always be O(n) including with arrays.
If you draw the space tree out, it will seem as though the space complexity is O(nlgn). However, as the code is a Depth First code, you will always only be expanding along one branch of the tree, therefore, the total space usage required will always be bounded by O(3n) = O(n).
For example, if you draw the space tree out, it seems like it is O(nlgn)
16 | 16
/ \
/ \
/ \
/ \
8 8 | 16
/ \ / \
/ \ / \
4 4 4 4 | 16
/ \ / \ / \ / \
2 2 2 2..................... | 16
/ \ /\ ........................
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 16
where height of tree is O(logn) => Space complexity is O(nlogn + n) = O(nlogn).
However, this is not the case in the actual code as it does not execute in parallel. For example, in the case where N = 16, this is how the code for mergesort executes. N = 16.
16
/
8
/
4
/
2
/ \
1 1
notice how number of space used is 32 = 2n = 2*16 < 3n
Then it merge upwards
16
/
8
/
4
/ \
2 2
/ \
1 1
which is 34 < 3n.
Then it merge upwards
16
/
8
/ \
4 4
/
2
/ \
1 1
36 < 16 * 3 = 48
then it merge upwards
16
/ \
8 8
/ \
4 4
/ \
2 2
/\
1 1
16 + 16 + 14 = 46 < 3*n = 48
in a larger case, n = 64
64
/ \
32 32
/ \
16 16
/ \
8 8
/ \
4 4
/ \
2 2
/\
1 1
which is 643 <= 3n = 3*64
You can prove this by induction for the general case.
Therefore, space complexity is always bounded by O(3n) = O(n) even if you implement with arrays as long as you clean up used space after merging and not execute code in parallel but sequential.
Example of my implementation is given below:
templace<class X>
void mergesort(X a[], int n) // X is a type using templates
{
if (n==1)
{
return;
}
int q, p;
q = n/2;
p = n/2;
//if(n % 2 == 1) p++; // increment by 1
if(n & 0x1) p++; // increment by 1
// note: doing and operator is much faster in hardware than calculating the mod (%)
X b[q];
int i = 0;
for (i = 0; i < q; i++)
{
b[i] = a[i];
}
mergesort(b, i);
// do mergesort here to save space
// http://stackoverflow.com/questions/10342890/merge-sort-time-and-space-complexity/28641693#28641693
// After returning from previous mergesort only do you create the next array.
X c[p];
int k = 0;
for (int j = q; j < n; j++)
{
c[k] = a[j];
k++;
}
mergesort(c, k);
int r, s, t;
t = 0; r = 0; s = 0;
while( (r!= q) && (s != p))
{
if (b[r] <= c[s])
{
a[t] = b[r];
r++;
}
else
{
a[t] = c[s];
s++;
}
t++;
}
if (r==q)
{
while(s!=p)
{
a[t] = c[s];
s++;
t++;
}
}
else
{
while(r != q)
{
a[t] = b[r];
r++;
t++;
}
}
return;
}

a) Yes - in a perfect world you'd have to do log n merges of size n, n/2, n/4 ... (or better said 1, 2, 3 ... n/4, n/2, n - they can't be parallelized), which gives O(n). It still is O(n log n). In not-so-perfect-world you don't have infinite number of processors and context-switching and synchronization offsets any potential gains.
b) Space complexity is always Ω(n) as you have to store the elements somewhere. Additional space complexity can be O(n) in an implementation using arrays and O(1) in linked list implementations. In practice implementations using lists need additional space for list pointers, so unless you already have the list in memory it shouldn't matter.
edit
if you count stack frames, then it's O(n)+ O(log n) , so still O(n) in case of arrays. In case of lists it's O(log n) additional memory.
c) Lists only need some pointers changed during the merge process. That requires constant additional memory.
d) That's why in merge-sort complexity analysis people mention 'additional space requirement' or things like that. It's obvious that you have to store the elements somewhere, but it's always better to mention 'additional memory' to keep purists at bay.

Simple and smart thinking.
Total levels (L) = log2(N).
At the last level number of nodes = N.
step 1 : let's assume for all levels (i) having nodes = x(i).
step 2 : so time complexity = x1 + x2 + x3 + x4 + .... + x(L-1) + N(for i = L);
step 3 : fact we know , x1,x2,x3,x4...,x(L-1) < N
step 4 : so let's consider x1=x2=x3=...=x(L-1)=N
step 5 : So time complexity = (N+N+N+..(L)times)
Time complexity = O(N*L);
put L = log(N);
Time complexity = O(N*log(N))
We use the extra array while merging so,
Space complexity: O(N).
Hint: Big O(x) time means, x is the smallest time for which we can surely say with proof that it will never exceed x in average case

a) Yes, of course, parallelizing merge sort can be very beneficial. It remains nlogn, but your constant should be significantly lower.
b) Space complexity with a linked list should be O(n), or more specifically O(n) + O(logn). Note that that's a +, not a *. Don't concern yourself with constants much when doing asymptotic analysis.
c) In asymptotic analysis, only the dominant term in the equation matters much, so the fact that we have a + and not a * makes it O(n). If we were duplicating the sublists all over, I believe that would be O(nlogn) space - but a smart linked-list-based merge sort can share regions of the lists.

Worst-case performance of merge sort : O(n log n),
Best-case performance of merge sort : O(n log n) typicaly, O(n) natural variant,
Average performance of merge sort : O(n log n),
Worst-case space complexity of merge sort : О(n) total, O(n) auxiliary

for both best and worst case the complexity is O(nlog(n)) .
though extra N size of array is needed in each step so
space complexity is O(n+n) is O(2n) as we remove constant value for calculating complexity so it is O(n)

merge sort space complexity is O(nlogn), this is quite obvious considering that it can go to at maximum of O(logn) recursions and for each recursion there is additional space of O(n) for storing the merged array that needs to be reassigned.
For those who are saying O(n) please don't forget that it is O(n) for reach stack frame depth.

Related

space complexity of merge sort using array

This algorithm is of mergesort, I know this may be looking weird to you but my main focus is on calculating space complexity of this algorithm.
If we look at the recurrence tree of mergesort function and try to trace the algorithm then the stack size will be log(n). But since merge function is also there inside the mergesort which is creating two arrays of size n/2, n/2 , then first should I find the space complexity of recurrence relation and then, should I add in that n/2 + n/2 that will become O(log(n) + n).
I know the answer, but I am confused in the process. Can anyone tell me correct procedure?
This confusion is due to merge function which is not recursive but called in a recursive function
And why we are saying that space complexity will be O(log(n) + n) and by the definition of recursive function space complexity, we usually calculate the height of recursive tree
Merge(Leftarray, Rightarray, Array) {
nL <- length(Leftarray)
nR <- length(Rightarray)
i <- j <- k <- 0
while (i < nL && j < nR) {
if (Leftarray[i] <= Rightarray[j])
Array[k++] <- Leftarray[i++]
else
Array[k++] <- Rightarray[j++]
}
while (i < nL) {
Array[k++] <- Leftarray[i++]
}
while (j < nR) {
Array[k++] <- Rightarray[j++]
}
}
Mergesort(Array) {
n <- length(Array)
if (n < 2)
return
mid <- n / 2
Leftarray <- array of size (mid)
Rightarray <- array of size (n-mid)
for i <- 0 to mid-1
Leftarray[i] <- Array[i]
for i <- mid to n-1
Right[i-mid] <- Array[mid]
Mergesort(Leftarray)
Mergesort(Rightarray)
Merge(Leftarray, Rightarray)
}
MergeSort time Complexity is O(nlgn) which is a fundamental knowledge. Merge Sort space complexity will always be O(n) including with arrays. If you draw the space tree out, it will seem as though the space complexity is O(nlgn). However, as the code is a Depth First code, you will always only be expanding along one branch of the tree, therefore, the total space usage required will always be bounded by O(3n) = O(n).
For example, if you draw the space tree out, it seems like it is O(nlgn)
16 | 16
/ \
/ \
/ \
/ \
8 8 | 16
/ \ / \
/ \ / \
4 4 4 4 | 16
/ \ / \ / \ / \
2 2 2 2..................... | 16
/ \ /\ ........................
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 16
where height of tree is O(logn) => Space complexity is O(nlogn + n) = O(nlogn). However, this is not the case in the actual code as it does not execute in parallel. For example, in the case where N = 16, this is how the code for mergesort executes. N = 16.
16
/
8
/
4
/
2
/ \
1 1
notice how number of space used is 32 = 2n = 2*16 < 3n
Then it merge upwards
16
/
8
/
4
/ \
2 2
/ \
1 1
which is 34 < 3n. Then it merge upwards
16
/
8
/ \
4 4
/
2
/ \
1 1
36 < 16 * 3 = 48
then it merge upwards
16
/ \
8 8
/ \
4 4
/ \
2 2
/\
1 1
16 + 16 + 14 = 46 < 3*n = 48
in a larger case, n = 64
64
/ \
32 32
/ \
16 16
/ \
8 8
/ \
4 4
/ \
2 2
/\
1 1
which is 64*3 <= 3*n = 3*64
You can prove this by induction for the general case.
Therefore, space complexity is always bounded by O(3n) = O(n) even if you implement with arrays as long as you clean up used space after merging and not execute code in parallel but sequential.
Example of my implementation is given below:
This implementation of MergeSort is quite inefficient in memory space and has some bugs:
the memory is not freed, I assume you rely on garbage collection.
the target array Array is not passed to Merge by MergeSort.
Extra space in the amount of the size of the Array is allocated by MergeSort for each recursion level, so at least twice the size of the initial array (2*N) is required, if the garbage collection is optimal, for example if it uses reference counts, and up to N*log2(N) space is used if the garbage collector is lazy. This is much more than required, as a careful implementation can use as little as N/2 extra space.

What is the Space Complexity of Tail Recursive Quicksort?

Looking at the following tail recursive quicksort pseudocode
QuickSort(A[1, ..., n], lo, hi)
Input: An array A of n distinct integers, the lower index and the higher index
// For the first call lo = 1 and hi = n
Output: The array A in sorted order
If lo = hi return
// The array A is already sorted in this case
If lo > hi or indices out of the range 1 to n then return
Else
Pick an index k in [lo,hi] as the pivot
// Assume that this can be done using O(1) cells on the stack
i = Partition(A[lo, ..., hi], k)
// Use in-place partitioning here so assume that this can be done
// using O(1) space on the stack
If i - lo <= hi - i
QuickSort(A, lo, i-1) // sort the smaller half first
QuickSort(A, i+1, hi)
Else
QuickSort(A, i+1, hi) // sort the smaller half first
QuickSort(A, lo, i-1)
Assuming that the pivot is chosen adversarially each time I analyzed that it should have a space complexity of O(logn) [which I am not entirely sure is correct], but how would the space complexity be affected if the pivot is then chosen uniformly at random? I am fairly new to understanding space complexity over time complexity, so any feedback is appreciated!
Refer this article covering Tail Recursion
Within the article it says that the Space Complexity of a Tail Recursive Quick Sort is as follows:
space complexity = input + O(log(n))
A few articles to get a more in depth of understanding can be found below:
Pivoting To Understand QuickSort Pt.1
Pivoting To Understand QuickSort Pt.2
QuickSort Notes from Duke
Carnegie Melon Randomized Quicksort Lecture Notes
Algorithmic Analysis of QuickSort
QuickSort Using Random Pivoting
The worst case for time is if you divide the array as unevenly as possible, and that time will be O(n^2). If you're not doing tail-recursion, that will also be the worst case for space.
However if you divide the array unevenly and are doing a tail-recursive sort, the call to sorting the larger half takes no space because you just replace the current call frame. Therefore the maximum space used is when you've made the first recursive calls over and over again. Which is at most 1/2 of at most 1/2 of... for a total of log_2(n) call frames.
If you switch from worst case to average case with a uniformly chosen pivot, it is O(log(n)) again, but with a better constant. First of all it can't be more than that because the average case cannot exceed the worst case.
The trick is to prove that you can't improve that bound. To demonstrate that, we can prove that the average space to sort an array of size n is at least C log(n+1)/(3 log(2)) where C is the space for a single call.
By inspection this is true for n = 1, 2, ..., 7 because the initial call takes space C and log(n+1)/(3 log(2)) <= 1.
If n is bigger than 7 and the statement is true up to n, our pivot will break us into groups of size m and n-m where m <= n-m. With at least even odds, n <= 4m and our expected maximum cost during the first recursive call is at least
C 1 + f(m)
>= C + f(n/4 rounded up)
>= C (3 log(2)) /(3 log(2)) + C log(n/4 + 1)/(3 log(2)))
> C (3 log(2) + log(n+1) - 2 log(2) ) / (3 log(2)) )
= C (log(n+1) + log(2)) / (3 log(2))
The rest of the time that doesn't hold and our expected maximum cost during the tail-recursive call is at least
f(n-m)
>= f(n/2 rounded down)
>= C log(n/2 + 1/2) / (3 log(2)) # n/2
= C (log(n+1) - log(2)) / (3 log(2))
When you average those two, you get the desired lower bound of C log(n+1) / (3 log(2)).
(I may have made a small error, but the idea is right.)

Merge sort space and time complexity

Given the following merge sort algorithm :
mergesort(A,p,r)
if (r <= l) return //constant amount of time
int m = (p+r)/2 //constant amount of time
mergesort(A, p, q) // these two calls will decide the
mergesort(A, q+1, r) // O(logn) factor inside O(n * logn) right?
merge(A, p, q, r) lets dwell further
merge(a,p,q,r){
n1 = q-p+1 //constant time
n2 = r-q //constant time
// Let L[1...n1+1] and R[1...n2+1] be new arrays // idk , lets say constant
for i,j in L[],R[]
L[i] = A[p+i-1]
R[j] = A[q+j] // shouldn't this take time varying on size of array?
// also extra space too?
i=1 j =1 // constant time
for k = p to r // everything below this will contribute to O(n)
// inside O(n*logn) amirite?
if L[i]<=R[j]
A[k] = L[i]
i++
else A[k] = R[j]
j++
How come we are estimating O(nlogn) time complexity for it , keeping in mind that there are left and right arrays being created to be merged back?
And how come space complexity is O(n) only if extra size is being used? Won't the two of them be increased by n, because filling up array takes O(n) and L[] and R[] are being created at each recursion step.
I suggest you reason about this by drawing a tree on paper: first write down your whole array:
2 4 7 1 4 6 2 3 7 ...
Then write what the recursion causes it to be split in below it:
2 4 7 1 3 4 6 2 3 7 ...
|
2 4 7 1 3 4 6 2 3 7 ...
| |
2 4 7 1 3 4 6 2 3 7
And so on with each piece.
Then, count how many rows you've written. This will be close to the base 2 logarithm of the number of elements you started with (O(log n)).
Now, how much work is being done for each row? It's O(n). Merging two arrays of lengths n1, n2 will take O(n1 + n2), even if you have to allocate space for them (and you don't in a proper implementation). Since each row in the recursion tree has n array elements, it follows that the work done for each row is O(n) and therefore the entire algorithm is O(n log n).
And how come space complexity is O(n) only if extra size is being used? Won't the two of them be increased by n , because filling up array takes O(n) and L[] and R[] are being created at each recursion step.
This is more interesting. If you do indeed create new L, R arrays at each recursion step, then the space complexity will be O(n log n). But you don't do that. You create one extra array of size n at the beginning (think of it as a global variable), and then you store the result of each merge into it.
You only pass around things that help you identify the subarrays, such as their sizes and the indexes they begin at. Then you access them using the original array and store the result of the merge in the globally allocated array, resulting in O(n) extra space:
global_temp = array of size equal to the array you're sorting
merge(a,p,q,r){
i=p
j =q // constant time
while i < q and j <= r // or a similar condition
if A[i]<=A[j]
global_temp[k++] = A[i]
i++
else
global_temp[k++] = A[j]
j++
// TODO: copy remaining elements into global_temp
// TODO: copy global_temp into A[p..r]
Your question is unclear, but perhaps you are confused by the extra space you need.
Obviously, on the first pass (and every pass) you read the entire data, and merge each partition into one twice as big.
Let's just focus on 8 elements.
8 7 6 5 4 3 2 1
In the first pass, the size of each partition is 1, and you are merging them to size=2. So you read the 8 and the 7, and merge them into a partition:
7 8 5 6 3 4 1 2
The next stage is to merge groups of 2 into groups of 4. Obviously, you have to read every element. So both of these passes take O(n) operations. The number of times to double is log2(n), which is why this algorithm is O(n log n)
In order to merge, you need extra room. You could recycle it. But the worst case is when you merge two n/2 partitions into n (the last time). The easy way to envision that is to allocate a buffer big enough to copy the entire data into. That would be O(n) storage.
5 6 7 8 1 2 3 4
i j
EMPTY Buffer int buf[8]
k = 0
buf[k++] = (orig[j] < orig[i]) ? orig[j++] : orig[k++]

Exactly how many comparisons does merge sort make?

I have read that quicksort is much faster than mergesort in practice, and the reason for this is the hidden constant.
Well, the solution for the randomized quick sort complexity is 2nlnn=1.39nlogn which means that the constant in quicksort is 1.39.
But what about mergesort? What is the constant in mergesort?
Let's see if we can work this out!
In merge sort, at each level of the recursion, we do the following:
Split the array in half.
Recursively sort each half.
Use the merge algorithm to combine the two halves together.
So how many comparisons are done at each step? Well, the divide step doesn't make any comparisons; it just splits the array in half. Step 2 doesn't (directly) make any comparisons; all comparisons are done by recursive calls. In step 3, we have two arrays of size n/2 and need to merge them. This requires at most n comparisons, since each step of the merge algorithm does a comparison and then consumes some array element, so we can't do more than n comparisons.
Combining this together, we get the following recurrence:
C(1) = 0
C(n) = 2C(n / 2) + n
(As mentioned in the comments, the linear term is more precisely (n - 1), though this doesn’t change the overall conclusion. We’ll use the above recurrence as an upper bound.)
To simplify this, let's define n = 2k and rewrite this recurrence in terms of k:
C'(0) = 0
C'(k) = 2C'(k - 1) + 2^k
The first few terms here are 0, 2, 8, 24, ... . This looks something like k 2k, and we can prove this by induction. As our base case, when k = 0, the first term is 0, and the value of k 2k is also 0. For the inductive step, assume the claim holds for some k and consider k + 1. Then the value is 2(k 2k) + 2k + 1 = k 2 k + 1 + 2k + 1 = (k + 1)2k + 1, so the claim holds for k + 1, completing the induction. Thus the value of C'(k) is k 2k. Since n = 2 k, this means that, assuming that n is a perfect power of two, we have that the number of comparisons made is
C(n) = n lg n
Impressively, this is better than quicksort! So why on earth is quicksort faster than merge sort? This has to do with other factors that have nothing to do with the number of comparisons made. Primarily, since quicksort works in place while merge sort works out of place, the locality of reference is not nearly as good in merge sort as it is in quicksort. This is such a huge factor that quicksort ends up being much, much better than merge sort in practice, since the cost of a cache miss is pretty huge. Additionally, the time required to sort an array doesn't just take the number of comparisons into account. Other factors like the number of times each array element is moved can also be important. For example, in merge sort we need to allocate space for the buffered elements, move the elements so that they can be merged, then merge back into the array. These moves aren't counted in our analysis, but they definitely add up. Compare this to quicksort's partitioning step, which moves each array element exactly once and stays within the original array. These extra factors, not the number of comparisons made, dominate the algorithm's runtime.
This analysis is a bit less precise than the optimal one, but Wikipedia confirms that the analysis is roughly n lg n and that this is indeed fewer comparisons than quicksort's average case.
Hope this helps!
In the worst case and assuming a straight-forward implementation, the number of comparisons to sort n elements is
n ⌈lg n⌉ − 2⌈lg n⌉ + 1
where lg n indicates the base-2 logarithm of n.
This result can be found in the corresponding Wikipedia article or recent editions of The Art of Computer Programming by Donald Knuth, and I just wrote down a proof for this answer.
Merging two sorted arrays (or lists) of size k resp. m takes k+m-1 comparisons at most, min{k,m} at best. (After each comparison, we can write one value to the target, when one of the two is exhausted, no more comparisons are necessary.)
Let C(n) be the worst case number of comparisons for a mergesort of an array (a list) of n elements.
Then we have C(1) = 0, C(2) = 1, pretty obviously. Further, we have the recurrence
C(n) = C(floor(n/2)) + C(ceiling(n/2)) + (n-1)
An easy induction shows
C(n) <= n*log_2 n
On the other hand, it's easy to see that we can come arbitrarily close to the bound (for every ε > 0, we can construct cases needing more than (1-ε)*n*log_2 n comparisons), so the constant for mergesort is 1.
Merge sort is O(n log n) and at each step, in the "worst" case (for number of comparisons), performs a comparison.
Quicksort, on the other hand, is O(n^2) in the worst case.
C++ program to count the number of comparisons in merge sort.
First the program will sort the given array, then it will show the number of comparisons.
#include<iostream>
using namespace std;
int count=0; /* to count the number of comparisions */
int merge( int arr [ ], int l, int m, int r)
{
int i=l; /* left subarray*/
int j=m+1; /* right subarray*/
int k=l; /* temporary array*/
int temp[r+1];
while( i<=m && j<=r)
{
if ( arr[i]<= arr[j])
{
temp[k]=arr[i];
i++;
}
else
{
temp[k]=arr[j];
j++;
}
k++;
count++;
}
while( i<=m)
{
temp[k]=arr[i];
i++;
k++;
}
while( j<=r)
{
temp[k]=arr[j];
j++;
k++;
}
for( int p=l; p<=r; p++)
{
arr[p]=temp[p];
}
return count;
}
int mergesort( int arr[ ], int l, int r)
{
int comparisons;
if(l<r)
{
int m= ( l+r)/2;
mergesort(arr,l,m);
mergesort(arr,m+1,r);
comparisions = merge(arr,l,m,r);
}
return comparisons;
}
int main ()
{
int size;
cout<<" Enter the size of an array "<< endl;
cin>>size;
int myarr[size];
cout<<" Enter the elements of array "<<endl;
for ( int i=0; i< size; i++)
{
cin>>myarr[i];
}
cout<<" Elements of array before sorting are "<<endl;
for ( int i=0; i< size; i++)
{
cout<<myarr[i]<<" " ;
}
cout<<endl;
int c=mergesort(myarr, 0, size-1);
cout<<" Elements of array after sorting are "<<endl;
for ( int i=0; i< size; i++)
{
cout<<myarr[i]<<" " ;
}
cout<<endl;
cout<<" Number of comaprisions while sorting the given array"<< c <<endl;
return 0;
}
I am assuming reader knows Merge sort. Comparisons happens only when two sorted arrays is getting merged. For simplicity, assume n as power of 2. To merge two (n/2) size arrays in worst case, we need (n - 1) comparisons. -1 appears here, as last element left on merging does not require any comparison. First found number of total comparison assuming it as n for some time, we can correct it by (-1) part. Number of levels for merging is log2(n) (Imagine as tree structure). In each layer there will be n comparison (need to minus some number, due to -1 part),so total comparison is nlog2(n) - (Yet to be found). "Yet to be found" part does not give nlog2(n) constant, it is actually (1 + 2 + 4 + 8 + ... + (n/2) = n - 1).
Number of total comparison in merge sort = n*log2(n) - (n - 1).
So, your constant is 1.

Time complexity

The Problem is finding majority elements in an array.
I understand how this algorithm works, but i don't know why this has O(nlogn) as a time complexity.....
a. Both return \no majority." Then neither half of the array has a majority
element, and the combined array cannot have a majority element. Therefore,
the call returns \no majority."
b. The right side is a majority, and the left isn't. The only possible majority for
this level is with the value that formed a majority on the right half, therefore,
just compare every element in the combined array and count the number of
elements that are equal to this value. If it is a majority element then return
that element, else return \no majority."
c. Same as above, but with the left returning a majority, and the right returning
\no majority."
d. Both sub-calls return a majority element. Count the number of elements equal
to both of the candidates for majority element. If either is a majority element
in the combined array, then return it. Otherwise, return \no majority."
The top level simply returns either a majority element or that no majority element
exists in the same way.
Therefore, T(1) = 0 and T(n) = 2T(n/2) + 2n = O(nlogn)
I think,
Every recursion it compares the majority element to whole array which takes 2n.
T(n) = 2T(n/2) + 2n = 2(2T(n/4) + 2n) +
2n = ..... = 2^kT(n/2^k) + 2n + 4n + 8n........ 2^kn = O(n^2)
T(n) = 2T(n/2) + 2n
The question is how many iterations does it take for n to get to 1.
We divide by 2 in each iteration so we get a series: n , n/2 , n/4 , n/8 ... n/(n^k)
So, let's find k that will bring us to 1 (last iteration):
n/(2^k)=1 .. n=2^k ... k=log(n)
So we got log(n) iterations.
Now, in each iteration we do 2n operations (less because we divide n by 2 each time) but in worth case scenario lets say 2n.
So in total, we got log(n) iterations with O(n) operations: nlog(n)
I'm not sure if I understand, but couldn't you just create a hash map, walk over the array, incrementing hash[value] at every step, then sort the hash map (xlogx time complexity) and compare the top two elements? This would cost you O(n) + O(mlogm) + 2 = O(n + mlogm), with n the size of the array and m the amount of different elements in the vector.
Am I mistaken here? Or ...?
When you do this recursively, you split the array in two for each level, make a call for each half, then makes one of the tests a - d. The test a requires no looping, the other tests requires looping through the entire array. By average you will loop through (0 + 1 + 1 + 1) / 4 = 3 / 4 of the array for each level in the recursion.
The number of levels in the recursion is based on the size of the array. As you split the array in half each level, the number of levels will be log2(n).
So, the total work is (n * 3/4) * log2(n). As constants are irrelevant to the time complexity, and all logarithms are the same, the complexity is O(n * log n).
Edit:
If someone is wondering about the algorithm, here's a C# implementation. :)
private int? FindMajority(int[] arr, int start, int len) {
if (len == 1) return arr[start];
int len1 = len / 2, len2 = len - len1;
int? m1 = FindMajority(arr, start, len1);
int? m2 = FindMajority(arr, start + len1, len2);
int cnt1 = m1.HasValue ? arr.Skip(start).Take(len).Count(n => n == m1.Value) : 0;
if (cnt1 * 2 >= len) return m1;
int cnt2 = m2.HasValue ? arr.Skip(start).Take(len).Count(n => n == m2.Value) : 0;
if (cnt2 * 2 >= len) return m2;
return null;
}
This guy has a lot of videos on recurrence relation, and the different techniques you can use to solve them:
https://www.youtube.com/watch?v=TEzbkIggJfo&list=PLj68PAxAKGoyyBwi6qrfcsqE_4trSO1yL
Basically for this problem I would use the Master Theorem:
https://youtu.be/i5kTZof1LRY
T(1) = 0 and T(n) = 2T(n/2) + 2n
Master Theorem ==> AT(n/B) + 2n^D, so in this case A=2, B=3, D=1
So according to the Master Theorem this is O(nlogn)
You can also use another method to solve this (below) it would just take a little bit more time:
https://youtu.be/TEzbkIggJfo?list=PLj68PAxAKGoyyBwi6qrfcsqE_4trSO1yL
I hope this helps you out !

Resources