This is a practice question for the understanding of Divide and conquer algorithms.
You are given an array of N sorted integers. All the elements are distinct except one
element is repeated twice. Design an O (log N) algorithm to find that element.
I get that array needs to be divided and see if an equal counterpart is found in the next index, some variant of binary search, I believe. But I can't find any solution or guidance regarding that.
You can not do it in O(log n) time because at any step even if u divide the array in 2 parts, u can not decide which part to consider for further processing and which should be left.
On the other hand if the consecutive numbers are all present in the array then by looking at the index and the value in the index we can decide if the duplicate number is in left side or right side of the array.
D&C should look something like this
int Twice (int a[],int i, int j) {
if (i >= j)
return -1;
int k = (i+j)/2;
if (a[k] == a[k+1])
return k;
if (a[k] == a[k-1])
return k-1;
int m = Twice(a,i,k-1);
int n = Twice(a,k+1,j);
return m != -1 ? m : n;
}
int Twice (int a[], int n) {
return Twice(a,0,n);
}
But it has complexity O(n). As it is said above, it is not possible to find O(lg n) algorithm for this problem.
Related
It's quite clear that we see an O(n^2) algorithm to choose the second largest number, and an algorithm using tree style with O(n * Log(n)), but, with extra space cost, like below:
But, eh..., is there a in-place algorithm with time complexity O(n * Log(n)) to select the second largest number in an array/vector?
Yes, in fact you can do this with a single pass over the range without modifying it. Here's an example algorithm:
Let m and M be the second largest, and largest elements. Initialize them to the smallest possible values the input range could contain.
For each number n in the range, the new second largest number depends on the relative order between n, m and M. The 3 possible orderings are n < m < M, m < n < M, or m < M < n. The new second largest element must be m, n, and M respectively. Essentially, n must be clamped between m and M.
The new largest number can't be m, so it must be the larger of n and M.
Here's a demonstration in c++:
int m = 0, M = 0; // assuming a range with non-negative values
for (int n : v)
{
m = std::clamp(n, m, M);
M = std::max(n, M);
}
If you are looking for something very simple O(n):
int getSecondLargest(vector<int>& vec){
int firstLargest = INT_MIN, secondLargest = INT_MIN;
for(auto i: vec){
if(i >= firstLargest){
if(firstLargest != INT_MIN){
secondLargest = firstLargest;
}
firstLargest = i;
}
else if(i > secondLargest){
secondLargest = i;
}
}
return secondLargest;
}
nth_element:
Pros:
If tomorrow you want not the second largest but say fifth largest, you won't need much code changes. The above algorithm I presented won't help.
Cons:
If you are just looking for second largest, nth_element is an overkill. The swaps and/or writes are more as compared to the above algorithm I showed.
Why are you guys giving me O(n) when I am asking for O(nlogn)?
You can find various in-place O(nlogn) sorting algorithms. One of them is Block Sort.
No. I want it to solve it with a tree style and I want O(nlogn) and I want in place. Do you have something like that?
No. That is not possible. When you say in-place, you can't use extra space depending on n. Constant extra space is fine. But tree style would require O(logn) extra space.
Suppose I have created a Binary Indexed Tree with prefix sum of length N. The main array contains only 0s and 1s. Now I want to find which index has a prefix sum M(That means have exactly M 1s).
Like my array is a[]={1,0,0,1,1};
prefix-sum would look like {1,1,1,2,3};
now 3rd index(0 based) has prefix sum of 2.
How can i find this index with BIT?
Thanks in advance.
Why can't you do binary search for that index ? It will take O(log n * log n) time. Here is a simple implementation -
int findIndex(int sum) {
int l = 1, r = n;
while(l <= r) {
int mid = l + r >> 1;
int This = read(mid);
if(This == sum) return mid;
else if(This < sum) l = mid+1;
else r = mid-1;
} return -1;
}
I used the read(x) function. That should return the sum of interval [1,x] in O(log n) time. The overall complexity will be O(log^2 n).
Hope it helps.
If elements in array a[n] is non-negative (and the prefix sum array p[n]is non-decreasing), you can locate an element by prefix sum as query prefix sum by index from BIT, which takes O(logn) time. The only difference is that you need to compare the prefix sum you get at each level to your input to decide which subtree you need to search subsequently -- if the prefix sum is smaller than your input, continue searching the left subtree; otherwise, search the right subtree; repeat this process until reach a node that sums up the desired prefix sum, in which case return the index of the node. The idea is analogous to binary search because the prefix sums are naturally sorted in BIT. If there are negative values in a[n], this method won't work since prefix sums in BIT won't be sorted in this case.
The problem is simple if I sort all the values and pick the fist K values. But it wastes too much time because maybe the smallest K values has been sorted before the whole array is sorted. I think the way to solve this problem is to add a flag in the code, but I can not figure out how to put the flag to judge whether the smallest k values has been sort.
You can use random selection algorithm to solve this problem with O(n) time. In the end, just return sub-array from 0 to k.
I think the problem can be solved by finding the kth smallest value. Suppose the signature of the function partition in quicksort is int partition(int* array, int start, int end), here is pseudocode which illustrate the basic idea:
int select(int[] a, int start, int end, int k)
{
j = partition(a,start,end);
if( k == j)
return a[j];
if( k < j )
select(a,start,j-1,k);
if( k > j )
select(a,j+1,end,k-j);
}
index = select(a, 0, length_of_a, k);
Then a[0...index] is the first k smallest values in array a. You can further sort a[0...index] if you want them sorted.
This was an interview question that I was asked to solve: Given an unsorted array, find out 2 numbers and their sum in the array. (That is, find three numbers in the array such that one is the sum of the other two.) Please note, I have seen question about the finding 2 numbers when the sum (int k) is given. However, this question expect you to find out the numbers and the sum in the array. Can it be solved in O(n), O(log n) or O(nlogn)
There is a standard solution of going through each integer and then doing a binary search on it. Is there a better solution?
public static void findNumsAndSum(int[] l) {
// sort the array
if (l == null || l.length < 2) {
return;
}
BinarySearch bs = new BinarySearch();
for (int i = 0; i < l.length; i++) {
for (int j = 1; j < l.length; j++) {
int sum = l[i] + l[j];
if (l[l.length - 1] < sum) {
continue;
}
if (bs.binarySearch(l, sum, j + 1, l.length)) {
System.out.println("Found the sum: " + l[i] + "+" + l[j]
+ "=" + sum);
}
}
}
}
This is very similar to the standard problem 3SUM, which many of the related questions along the right are about.
Your solution is O(n^2 lg n); there are O(n^2) algorithms based on sorting the array, which work with slight modification for this variant. The best known lower bound is O(n lg n) (because you can use it to perform a comparison sort, if you're clever about it). If you can find a subquadratic algorithm or a tighter lower bound, you'll get some publications out of it. :)
Note that if you're willing to bound the integers to fall in the range [-u, u], there's a solution for the a + b + c = 0 problem in time O(n + u lg u) using the Fast Fourier Transform. It's not immediately obvious to me how to adjust it to the a + b = c problem, though.
You can solve it in O(nlog(n)) as follows:
Sort your array in O(nlog(n)) ascendingly. You need 2 indices pointing to the left/right end of your array. Lets's call them i and j, i being the left one and j the right one.
Now calculate the sum of array[i] + array[j].
If this sum is greater than k, reduce j by one.
If this sum is smaller than k. increase i by one.
Repeat until the sum equals k.
So with this algorithm you can find the solution in O(nlog(n)) and it is pretty simple to implement
Sorry. It seems that I didn't read your post carefully enough ;)
I have read that quicksort is much faster than mergesort in practice, and the reason for this is the hidden constant.
Well, the solution for the randomized quick sort complexity is 2nlnn=1.39nlogn which means that the constant in quicksort is 1.39.
But what about mergesort? What is the constant in mergesort?
Let's see if we can work this out!
In merge sort, at each level of the recursion, we do the following:
Split the array in half.
Recursively sort each half.
Use the merge algorithm to combine the two halves together.
So how many comparisons are done at each step? Well, the divide step doesn't make any comparisons; it just splits the array in half. Step 2 doesn't (directly) make any comparisons; all comparisons are done by recursive calls. In step 3, we have two arrays of size n/2 and need to merge them. This requires at most n comparisons, since each step of the merge algorithm does a comparison and then consumes some array element, so we can't do more than n comparisons.
Combining this together, we get the following recurrence:
C(1) = 0
C(n) = 2C(n / 2) + n
(As mentioned in the comments, the linear term is more precisely (n - 1), though this doesn’t change the overall conclusion. We’ll use the above recurrence as an upper bound.)
To simplify this, let's define n = 2k and rewrite this recurrence in terms of k:
C'(0) = 0
C'(k) = 2C'(k - 1) + 2^k
The first few terms here are 0, 2, 8, 24, ... . This looks something like k 2k, and we can prove this by induction. As our base case, when k = 0, the first term is 0, and the value of k 2k is also 0. For the inductive step, assume the claim holds for some k and consider k + 1. Then the value is 2(k 2k) + 2k + 1 = k 2 k + 1 + 2k + 1 = (k + 1)2k + 1, so the claim holds for k + 1, completing the induction. Thus the value of C'(k) is k 2k. Since n = 2 k, this means that, assuming that n is a perfect power of two, we have that the number of comparisons made is
C(n) = n lg n
Impressively, this is better than quicksort! So why on earth is quicksort faster than merge sort? This has to do with other factors that have nothing to do with the number of comparisons made. Primarily, since quicksort works in place while merge sort works out of place, the locality of reference is not nearly as good in merge sort as it is in quicksort. This is such a huge factor that quicksort ends up being much, much better than merge sort in practice, since the cost of a cache miss is pretty huge. Additionally, the time required to sort an array doesn't just take the number of comparisons into account. Other factors like the number of times each array element is moved can also be important. For example, in merge sort we need to allocate space for the buffered elements, move the elements so that they can be merged, then merge back into the array. These moves aren't counted in our analysis, but they definitely add up. Compare this to quicksort's partitioning step, which moves each array element exactly once and stays within the original array. These extra factors, not the number of comparisons made, dominate the algorithm's runtime.
This analysis is a bit less precise than the optimal one, but Wikipedia confirms that the analysis is roughly n lg n and that this is indeed fewer comparisons than quicksort's average case.
Hope this helps!
In the worst case and assuming a straight-forward implementation, the number of comparisons to sort n elements is
n ⌈lg n⌉ − 2⌈lg n⌉ + 1
where lg n indicates the base-2 logarithm of n.
This result can be found in the corresponding Wikipedia article or recent editions of The Art of Computer Programming by Donald Knuth, and I just wrote down a proof for this answer.
Merging two sorted arrays (or lists) of size k resp. m takes k+m-1 comparisons at most, min{k,m} at best. (After each comparison, we can write one value to the target, when one of the two is exhausted, no more comparisons are necessary.)
Let C(n) be the worst case number of comparisons for a mergesort of an array (a list) of n elements.
Then we have C(1) = 0, C(2) = 1, pretty obviously. Further, we have the recurrence
C(n) = C(floor(n/2)) + C(ceiling(n/2)) + (n-1)
An easy induction shows
C(n) <= n*log_2 n
On the other hand, it's easy to see that we can come arbitrarily close to the bound (for every ε > 0, we can construct cases needing more than (1-ε)*n*log_2 n comparisons), so the constant for mergesort is 1.
Merge sort is O(n log n) and at each step, in the "worst" case (for number of comparisons), performs a comparison.
Quicksort, on the other hand, is O(n^2) in the worst case.
C++ program to count the number of comparisons in merge sort.
First the program will sort the given array, then it will show the number of comparisons.
#include<iostream>
using namespace std;
int count=0; /* to count the number of comparisions */
int merge( int arr [ ], int l, int m, int r)
{
int i=l; /* left subarray*/
int j=m+1; /* right subarray*/
int k=l; /* temporary array*/
int temp[r+1];
while( i<=m && j<=r)
{
if ( arr[i]<= arr[j])
{
temp[k]=arr[i];
i++;
}
else
{
temp[k]=arr[j];
j++;
}
k++;
count++;
}
while( i<=m)
{
temp[k]=arr[i];
i++;
k++;
}
while( j<=r)
{
temp[k]=arr[j];
j++;
k++;
}
for( int p=l; p<=r; p++)
{
arr[p]=temp[p];
}
return count;
}
int mergesort( int arr[ ], int l, int r)
{
int comparisons;
if(l<r)
{
int m= ( l+r)/2;
mergesort(arr,l,m);
mergesort(arr,m+1,r);
comparisions = merge(arr,l,m,r);
}
return comparisons;
}
int main ()
{
int size;
cout<<" Enter the size of an array "<< endl;
cin>>size;
int myarr[size];
cout<<" Enter the elements of array "<<endl;
for ( int i=0; i< size; i++)
{
cin>>myarr[i];
}
cout<<" Elements of array before sorting are "<<endl;
for ( int i=0; i< size; i++)
{
cout<<myarr[i]<<" " ;
}
cout<<endl;
int c=mergesort(myarr, 0, size-1);
cout<<" Elements of array after sorting are "<<endl;
for ( int i=0; i< size; i++)
{
cout<<myarr[i]<<" " ;
}
cout<<endl;
cout<<" Number of comaprisions while sorting the given array"<< c <<endl;
return 0;
}
I am assuming reader knows Merge sort. Comparisons happens only when two sorted arrays is getting merged. For simplicity, assume n as power of 2. To merge two (n/2) size arrays in worst case, we need (n - 1) comparisons. -1 appears here, as last element left on merging does not require any comparison. First found number of total comparison assuming it as n for some time, we can correct it by (-1) part. Number of levels for merging is log2(n) (Imagine as tree structure). In each layer there will be n comparison (need to minus some number, due to -1 part),so total comparison is nlog2(n) - (Yet to be found). "Yet to be found" part does not give nlog2(n) constant, it is actually (1 + 2 + 4 + 8 + ... + (n/2) = n - 1).
Number of total comparison in merge sort = n*log2(n) - (n - 1).
So, your constant is 1.