How to find K smallest values using quick sort - algorithm

The problem is simple if I sort all the values and pick the fist K values. But it wastes too much time because maybe the smallest K values has been sorted before the whole array is sorted. I think the way to solve this problem is to add a flag in the code, but I can not figure out how to put the flag to judge whether the smallest k values has been sort.

You can use random selection algorithm to solve this problem with O(n) time. In the end, just return sub-array from 0 to k.

I think the problem can be solved by finding the kth smallest value. Suppose the signature of the function partition in quicksort is int partition(int* array, int start, int end), here is pseudocode which illustrate the basic idea:
int select(int[] a, int start, int end, int k)
{
j = partition(a,start,end);
if( k == j)
return a[j];
if( k < j )
select(a,start,j-1,k);
if( k > j )
select(a,j+1,end,k-j);
}
index = select(a, 0, length_of_a, k);
Then a[0...index] is the first k smallest values in array a. You can further sort a[0...index] if you want them sorted.

Related

How to find which position have prefix sum M in BIT?

Suppose I have created a Binary Indexed Tree with prefix sum of length N. The main array contains only 0s and 1s. Now I want to find which index has a prefix sum M(That means have exactly M 1s).
Like my array is a[]={1,0,0,1,1};
prefix-sum would look like {1,1,1,2,3};
now 3rd index(0 based) has prefix sum of 2.
How can i find this index with BIT?
Thanks in advance.
Why can't you do binary search for that index ? It will take O(log n * log n) time. Here is a simple implementation -
int findIndex(int sum) {
int l = 1, r = n;
while(l <= r) {
int mid = l + r >> 1;
int This = read(mid);
if(This == sum) return mid;
else if(This < sum) l = mid+1;
else r = mid-1;
} return -1;
}
I used the read(x) function. That should return the sum of interval [1,x] in O(log n) time. The overall complexity will be O(log^2 n).
Hope it helps.
If elements in array a[n] is non-negative (and the prefix sum array p[n]is non-decreasing), you can locate an element by prefix sum as query prefix sum by index from BIT, which takes O(logn) time. The only difference is that you need to compare the prefix sum you get at each level to your input to decide which subtree you need to search subsequently -- if the prefix sum is smaller than your input, continue searching the left subtree; otherwise, search the right subtree; repeat this process until reach a node that sums up the desired prefix sum, in which case return the index of the node. The idea is analogous to binary search because the prefix sums are naturally sorted in BIT. If there are negative values in a[n], this method won't work since prefix sums in BIT won't be sorted in this case.

Divide and Conquer Algorithms- Binary search variant

This is a practice question for the understanding of Divide and conquer algorithms.
You are given an array of N sorted integers. All the elements are distinct except one
element is repeated twice. Design an O (log N) algorithm to find that element.
I get that array needs to be divided and see if an equal counterpart is found in the next index, some variant of binary search, I believe. But I can't find any solution or guidance regarding that.
You can not do it in O(log n) time because at any step even if u divide the array in 2 parts, u can not decide which part to consider for further processing and which should be left.
On the other hand if the consecutive numbers are all present in the array then by looking at the index and the value in the index we can decide if the duplicate number is in left side or right side of the array.
D&C should look something like this
int Twice (int a[],int i, int j) {
if (i >= j)
return -1;
int k = (i+j)/2;
if (a[k] == a[k+1])
return k;
if (a[k] == a[k-1])
return k-1;
int m = Twice(a,i,k-1);
int n = Twice(a,k+1,j);
return m != -1 ? m : n;
}
int Twice (int a[], int n) {
return Twice(a,0,n);
}
But it has complexity O(n). As it is said above, it is not possible to find O(lg n) algorithm for this problem.

Insertion sort comparison?

How to count number of comparisons in insertion sort in less than O(n^2) ?
When we're inserting an element, we alternate comparisons and swaps until either (1) the element compares not less than the element to its right (2) we hit the beginning of the array. In case (1), there is one comparison not paired with a swap. In case (2), every comparison is paired with a swap. The upward adjustment for number of comparisons can be computed by counting the number of successive minima from left to right (or however your insertion sort works), in time O(n).
num_comparisons = num_swaps
min_so_far = array[0]
for i in range(1, len(array)):
if array[i] < min_so_far:
min_so_far = array[i]
else:
num_comparisons += 1
As commented, to do it in less than O(n^2) is hard, maybe impossible if you must pay the price for sorting. If you already know the number of comparisons done at each external iteration then it would be possible in O(n), but the price for sorting was payed sometime before.
Here is a way for counting the comparisons inside the method (in pseudo C++):
void insertion_sort(int p[], const size_t n, size_t & count)
{
for (long i = 1, j; i < n; ++i)
{
auto tmp = p[i];
for (j = i - 1; j >= 0 and p[j] > tmp; --j) // insert a gap where put tmp
p[j + 1] = p[j];
count += i - j; // i - j is the number of comparisons done in this iteration
p[j + 1] = tmp;
}
}
n is the number of elements and count the comparisons counter which must receive a variable set to zero.
If I remember correctly, this is how insertion sort works:
A = unsorted input array
B := []; //sorted output array
while(A is not empty) {
remove first element from A and add it to B, preserving B's sorting
}
If the insertion to B is implemented by linear search from the left until you find a greater element, then the number of comparisons is the number of pairs (i,j) such that i < j and A[i] >= A[j] (I'm considering the stable variant).
In other words, for each element x, count the number of elements before x that have less or equal value. That can be done by scanning A from the left, adding it's element to some balanced binary search tree, that also remembers the number of elements under each node. In such tree, you can find number of elements lesser or equal to a certain value in O(log n). Total time: O(n log n).

find maximum sum of n elements in an array such that not more than k elements are adjacent

Almost the same as this:
find maximum sum of elements in an array such that not more than k elements are adjacent
except there is a limit of n elements we can choose. How to modify the DP algorithm to make it work for this?
Add new dimension of DP function:
f[i, j, l] - max sum for first i elements, if used j total elements and last l elements in this sum.
well, let me make it more clearly.
question: find maximum sum of n elements in an array such that not more than K elements are adjacent
let int f[i][j][k] means the maximum sum for first i elements, using j total elements and the last k elements are used. let bool g[i][j][k] denotes whether it is possible to get certain combination. eg. g[1][1][2] is false. this is important because without restrict, f may generate impossible answers.
initially, memset f and g to be all zeros and set g[0][0][0] to be true. we can use forward recurrence to solve this DP problem. obviously, each time you encounter a number, you have two choices: choose it, or abadon it. thay gives out the recurrence formula:
f[i][j][k] can infer f[i+1][j+1][k+1], or
f[i][j][k] can infer f[i+1][j][0]
so, the pseudo code can be as follow:
memset(f,0,sizeof(f));
memset(g,0,sizeof(g));
g[0][0][0]=true;
for (int i=0;i<array.size();i++)
for (int j=0;j<=n;j++)
for (int k=0;k<=K;k++) if (g[i][j][k]) {
f[i+1][j][0]=max(f[i+1][j][0],f[i][j][k]);
f[i+1][j+1][k+1]=max(f[i+1][j+1][k+1],f[i][j][k]+array[i]);
g[i+1][j][0]=true;
g[i+1][j+1][k+1]=true;
}
and the final result will be:
ans=0;
for (i=0;i<=K;i++)
ans=max(ans,f[array.size()][n][i]);
return ans;
above gives exactly j elements. if you want to get at most j elements, you can change it in this way:
ans=0;
for (i=0;i<=n;i++)
for (j=0;j<=K;j++)
ans=max(ans,f[array.size()][i][j]);
return ans;

selection algorithm problem

Suppose you have an array A of n items, and you want to find the k items in A closest
to the median of A. For example, if A contains the 9 values {7, 14, 10, 12, 2, 11, 29, 3, 4}
and k = 5, then the answer would be the values {7, 14, 10, 12, 11}, since the median
is 10 and these are the five values in A closest to the value 10. Give an algorithm
to solve this problem in O(n) time.
I know that a selection algorithm (deep selection) is the appropriate algorithm for this problem, but I think that would run in O(n*logn) time instead of O(n). Any help would be greatly appreciated :)
You will first need to find the median, which can be done in O(n) (for example using Hoare's Quickselect algorithm).
Then you will need to implement a sorting algorithm which sorts the elements in the array according to their absolute distance to the median (smallest distances first).
If you were to sort the entire array this way, this would typically take somewhere from O(n * log n) to O(n^2), depending on the algorithm being used. However since you only need the first k values, the complexity can be reduced to O(k * log n) to O(k * n).
Since k is a constant and does not depend on the size of the array, the overall complexity in a worst case scenario will be: O(n) (for finding the median) + O(k * n) (sorting), which is O(n) overall.
I think you can do this using a variant on quicksort.
You start with a set S of n items and are looking for the "middle" k items. You can think of this as partitioning S into three parts of sizes n - k/2 (the "lower" items), k (the "middle" items), and n - k/2 (the "upper" items).
This gives us a strategy: first remove the lower n - k/2 items from S, leaving S'. Then remove the upper n - k/2 items from S', leaving S'', which is the middle k items of S.
You can easily partition a set this way using "half a quicksort": choose a pivot, partition the set into L and U (lower and upper elements w.r.t. the pivot), then you know the items to discard in the partition must be either all of L and some of U or vice versa: recurse accordingly.
[Thinking further, this may not be exactly what you want if you define "closest to the median" in some other way, but it's a start.]
Assumption: we care about the k values in A that are closest to the median. If we had an A={1,2,2,2,2,2,2,2,2,2,2,2,3}, and k=3, the answer is {2,2,2}. Similarly, if we have A={0,1,2,3,3,4,5,6}, and k=3, answers {2,3,3} and {3,3,4} are equally valid. Furthermore, we are not interested in the indices from which these values came, though I imagine some small tweaks to the algorithm would work.
As Grodrigues states, first find the median in O(n) time. While we're at it, keep track of the largest and smallest number
Next, create an array K, k items long. This array will contain the distance an item is from the median. (note that
Copy the first k items from A into K.
For each item A[i], compare the distance of A[i] from the median to each item of K. If A[i] is closer to the median than the farthest item from the median in K, replace that item. As an optimization, we could also track K's closest and farthest items from the median, so we have a faster comparison to K, or we could keep K sorted, but neither optimization is necessary to operate in O(n) time.
Pseudocode, C++ ish:
/* n = length of array
* array = A, given in the problem
* result is a pre-allocated array where the result will be placed
* k is the length of result
*
* returns
* 0 for success
* -1 for invalid input
* 1 for other errors
*
* Implementation note: optimizations are skipped.
*/
#define SUCCESS 0
#define INVALID_INPUT -1
#define ERROR 1
void find_k_closest(int n, int[] array, int k, int[] result)
{
// if we're looking for more results than possible,
// it's impossible to give a valid result.
if( k > n ) return INVALID_INPUT;
// populate result with the first k elements of array.
for( int i=0; i<k; i++ )
{
result[i] = array[i];
}
// if we're looking for n items of an n length array,
// we don't need to do any comparisons
// Up to this point, function is O(k). Worst case, k==n,
// and we're O(n)
if( k==n ) return 0;
// Assume an O(n) median function
// Note that we don't bother finding the median if there's an
// error or if the output is the input.
int median = median(array);
// Convert the result array to be distance, not
// actual numbers
for( int i=0; i<k; i++)
{
result[i] = result[i]-median;
// if array[i]=1, median=3, array[i] will be set to 2.
// 4 3 -1.
}
// Up to this point, function is O(2k+n) = O(n)
// find the closest items.
// Outer loop is O(n * order_inner_loop)
// Inner loop is O(k)
// Thus outer loop is O(2k*n) = O(n)
// Note that we start at k, since the first k elements
// of array are already in result.
OUTER: for(int i=k; i<n; i++)
{
int distance = array[i]-median;
int abs_distance = abs(distance);
// find the result farthest from the median
int idx = 0;
#define FURTHER(a,b) ((abs(a)>abs(b)) ? 1 : 0;
INNER: for( int i=1; i<k; i++ )
{
idx = (FURTHER(result[i],result[i-1])) ? i:i-1;
}
// If array[i] is closer to the median than the farthest element of
// result, replace the farthest element of result with array[i]
if( abs_distance < result[idx] ){ result[idx] = distance; }
}
}
// Up to this point, function is O(2n)
// convert result from distance to values
for( int i=0; i<k; i++)
{
result[i] = median - result[i];
// if array[i]=2 , median=3, array[i] will be set to 1.
// -1 3 4.
}
}

Resources