Implementing quickselect - algorithm

I'm trying to implement the quickselect algorithm. Though, I have understood the theory behind it very well; I'm finding it difficult to convert it into a well functioning program.
Here is how I'm going step by step to implement it and where I am facing problem:
Problem: Find the 4th smallest element in A[] = {2,1,3,7,5,4,6}
k = 4.
index:0|1|2|3|4|5|6
Corresponding values: 2|1|3|7|5|4|6
initially, l = 0 and r = 6
Step 1) Taking pivot as the leftmost element (pivot will always be the leftmost in this problem)-
pivot_index = 0
pivot_value = 2
Step 2) Applying the partition algo; putting the pivot at the right place ([<p][p][>p])-
We get the following array: 1|2|3|7|5|4|6
where, pivot_index = i-1 = 1
and therefore, pivot_value = 2
Step 3) Compare pivot_index with k-
k=3, pivot_index = 1; k>pivot_index
Hence, Our k-th smallest number lies in the right part of the array.
Right array = i to r and we do not bother with the left part (l to i-1) anymore.
Step 4) We modify the value of k as k - (pivot_index) => 4-1 = 2; k = 3.
Here is the problem: Should not the value of k be 2? Because we have two values on the left part of the array: 1|2? Should we calculate k as k - (pivot_index+1)?
Let's assume k = 3 is correct.
Step 5) "New" array to work on: 3|7|5|4|6 with corresponding indexes: 2|3|4|5|6
Now, pivot_index = 2 and pivot_index = 3
Step 6) Applying partition algo on the above array-
3|7|5|4|6 (array remains unchanged as pivot itself is the lowest value).
i = 3
pivot_index = i-1 = 2
pivot_value = 3
Step 7) Compare pivot_index with k
k=3 and pivot_index=2
k > pivot_index
and so on....
Is this approach correct?
Here is my code which is not working. I have used a random number generator to select a random pivot, the pivot is then swapped with the first element in the array.
#include<stdio.h>
#include<stdlib.h>
void print_array(int arr[], int array_length){
int i;
for(i=0; i<array_length; ++i) {
printf("%d ", arr[i]);
}
}
int random_no(min, max){
int diff = max-min;
return (int) (((double)(diff+1)/RAND_MAX) * rand() + min);
}
void swap(int *a, int *b){
int temp;
temp = *a;
*a = *b;
*b = temp;
}
int get_kth_small(int arr[], int k, int l, int r){
if((r-l) >= 1){
k = k + (l-1);
int pivot_index = random_no(l, r);
int i, j;
swap(&arr[pivot_index], &arr[l]); //Switch the pivot with the first element in the array. Now, the pivit is in arr[l]
i=l+1;
for(j=l+1; j<=r; ++j){
if(arr[j]<arr[l]){
swap(&arr[j], &arr[i]);
++i;
}
}
swap(&arr[l], &arr[i-1]); //Switch the pivot to the correct place; <p, p, >p
printf("value of i-1: %d\n", i-1);
printf("Value of k: %d\n", k);
if(k == (i-1)){
printf("Found: %d\n", arr[i]);
return 0;
}
if(k>(i-1)){
k=k-(i-1);
get_kth_small(arr, k, i, r);
} else {
get_kth_small(arr, k, l, r-1);
}
//get_kth_small(arr, k, i, r);
//get_kth_small(arr, k, l, i-1);
}
}
void main(){
srand(time(NULL));
int arr[] = {2,1,3,7,5,4,6};
int arr_size = sizeof(arr)/sizeof(arr[0]);
int k = 3, l = 0;
int r = arr_size - 1;
//printf("Enter the value of k: ");
//scanf("%d", &k);
get_kth_small(arr, k, l, r);
print_array(arr, arr_size);
printf("\n");
}

What you describe is a valid way to implement quick select. There are numerous other approaches how to select the pivot and most of them will give a better expected complexity but in essence the algorithm is the same.

"Step 2: putting the pivot at the right place": don't do that. In fact you can't put the pivot at the right place, as you don't know what it is. The partitioning rule is to put all elements smaller or equal than the pivot before those larger. Just leave the pivot where it is!
Quick select goes as follows: to find the Kth among N elements, 1) choose a pivot value, 2) move all elements smaller or equal to the pivot before the others, forming two zones of length Nle and Ngt, 3) recurse on the relevant zone with (K, Nle) or (K-Nle, Ngt), until N=1.
Actually, any value can be taken for the pivot, even one not present in the array; but the partition must be such that Nle and Ngt are nonzero.

Related

SUM exactly using K elements solution

Problem: On a given array with N numbers, find subset of size M (exactly M elements) that equal to SUM.
I am looking for a Dynamic Programming(DP) solution for this problem. Basically looking to understand the matrix filled approach. I wrote below program but didn't add memoization as i am still wondering how to do that.
#include <stdio.h>
#define SIZE(a) sizeof(a)/sizeof(a[0])
int binary[100];
int a[] = {1, 2, 5, 5, 100};
void show(int* p, int size) {
int j;
for (j = 0; j < size; j++)
if (p[j])
printf("%d\n", a[j]);
}
void subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
show(binary, size);
} else if (sum < target && i < size) {
binary[i] = 1;
foo(target, i + 1, sum + a[i], a, size, K-1);
binary[i] = 0;
foo(target, i + 1, sum, a, size, K);
}
}
int main() {
int target = 10;
int K = 2;
subset_sum(target, 0, 0, a, SIZE(a), K);
}
Is the below recurrence solution makes sense?
Let DP[SUM][j][k] sum up to SUM with exactly K elements picked from 0 to j elements.
DP[i][j][k] = DP[i][j-1][k] || DP[i-a[j]][j-1][k-1] { input array a[0....j] }
Base cases are:
DP[0][0][0] = DP[0][j][0] = DP[0][0][k] = 1
DP[i][0][0] = DP[i][j][0] = 0
It means we can either consider this element ( DP[i-a[j]][j-1][k-1] ) or we don't consider the current element (DP[i][j-1][k]). If we consider current element, k is reduced by 1 which reduces the elements that needs to be considered and same goes when current element is not considered i.e. K is not reduced by 1.
Your solution looks right to me.
Right now, you're basically backtracking over all possibilities and printing each solution. If you only want one solution, you could add a flag that you set when one solution was found and check before continuing with recursive calls.
For memoization, you should first get rid of the binary array, after which you can do something like this:
int memo[NUM_ELEMENTS][MAX_SUM][MAX_K];
bool subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
memo[i][sum][K] = true;
return memo[i][sum][K];
} else if (sum < target && i < size) {
if (memo[i][sum][K] != -1)
return memo[i][sum][K];
memo[i][sum][K] = foo(target, i + 1, sum + a[i], a, size, K-1) ||
foo(target, i + 1, sum, a, size, K);
return memo[i][sum][K]
}
return false;
}
Then, look at memo[_all indexes_][target][K]. If this is true, there exists at least one solution. You can store addition information to get you that next solution, or you can iterate with an i from found_index - 1 to 0 and check for which i you have memo[i][sum - a[i]][K - 1] == true. Then recurse on that, and so on. This will allow you to reconstruct the solution using just the memo array.
To my understanding, if only the feasibility of the input has to be checked, the problem can be solved with a two-dimensional state space
bool[][] IsFeasible = new bool[n][k]
where IsFeasible[i][j] is true if and only if there is a subset of the elements 1 to i which sum up to exactly j for every
1 <= i <= n
1 <= j <= k
and for this state space, the recurrence relation
IsFeasible[i][j] = IsFeasible[i-1][k-a[i]] || IsFeasible[i-1][k]
can be used, where the left-hand side of the or-operator || corresponds to selecting the i-th item and the right-hand side corresponds to to not selecting the i-th item. The actual choice of items could be obtained by backtracking or auxiliary information saved during evaluation.

Adding sum of frequencies whille solving Optimal Binary search tree

I am referring to THIS problem and solution.
Firstly, I did not get why sum of frequencies is added in the recursive equation.
Can someone please help understand that with an example may be.
In Author's word.
We add sum of frequencies from i to j (see first term in the above
formula), this is added because every search will go through root and
one comparison will be done for every search.
In code, sum of frequencies (purpose of which I do not understand) ... corresponds to fsum.
int optCost(int freq[], int i, int j)
{
// Base cases
if (j < i) // If there are no elements in this subarray
return 0;
if (j == i) // If there is one element in this subarray
return freq[i];
// Get sum of freq[i], freq[i+1], ... freq[j]
int fsum = sum(freq, i, j);
// Initialize minimum value
int min = INT_MAX;
// One by one consider all elements as root and recursively find cost
// of the BST, compare the cost with min and update min if needed
for (int r = i; r <= j; ++r)
{
int cost = optCost(freq, i, r-1) + optCost(freq, r+1, j);
if (cost < min)
min = cost;
}
// Return minimum value
return min + fsum;
}
Secondly, this solution will just return the optimal cost. Any suggestions regarding how to get the actual bst ?
Why we need sum of frequencies
The idea behind sum of frequencies is to correctly calculate cost of particular tree. It behaves like accumulator value to store tree weight.
Imagine that on first level of recursion we start with all keys located on first level of the tree (we haven't picked any root element yet). Remember the weight function - it sums over all node weights multiplied by node level. For now weight of our tree equals to sum of weights of all keys because any of our keys can be located on any level (starting from first) and anyway we will have at least one weight for each key in our result.
1) Suppose that we found optimal root key, say key r. Next we move all our keys except r one level down because each of the elements left can be located at most on second level (first level is already occupied). Because of that we add weight of each key left to our sum because anyway for all of them we will have at least double weight. Keys left we split in two sub arrays according to r element(to the left from r and to the right) which we selected before.
2) Next step is to select optimal keys for second level, one from each of two sub arrays left from first step. After doing that we again move all keys left one level down and add their weights to the sum because they will be located at least on third level so we will have at least triple weight for each of them.
3) And so on.
I hope this explanation will give you some understanding of why we need this sum of frequencies.
Finding optimal bst
As author mentioned at the end of the article
2) In the above solutions, we have computed optimal cost only. The
solutions can be easily modified to store the structure of BSTs also.
We can create another auxiliary array of size n to store the structure
of tree. All we need to do is, store the chosen ‘r’ in the innermost
loop.
We can do just that. Below you will find my implementation.
Some notes about it:
1) I was forced to replace int[n][n] with utility class Matrix because I used Visual C++ and it does not support non-compile time constant expression as array size.
2) I used second implementation of the algorithm from article which you provided (with memorization) because it is much easier to add functionality to store optimal bst to it.
3) Author has mistake in his code:
Second loop for (int i=0; i<=n-L+1; i++) should have n-L as upper bound not n-L+1.
4) The way we store optimal bst is as follows:
For each pair i, j we store optimal key index. This is the same as for optimal cost but instead of storing optimal cost we store optimal key index. For example for 0, n-1 we will have index of the root key r of our result tree. Next we split our array in two according to root element index r and get their optimal key indexes. We can dot that by accessing matrix elements 0, r-1 and r+1, n-1. And so forth. Utility function 'PrintResultTree' uses this approach and prints result tree in in-order (left subtree, node, right subtree). So you basically get ordered list because it is binary search tree.
5) Please don't flame me for my code - I'm not really a c++ programmer. :)
int optimalSearchTree(int keys[], int freq[], int n, Matrix& optimalKeyIndexes)
{
/* Create an auxiliary 2D matrix to store results of subproblems */
Matrix cost(n,n);
optimalKeyIndexes = Matrix(n, n);
/* cost[i][j] = Optimal cost of binary search tree that can be
formed from keys[i] to keys[j].
cost[0][n-1] will store the resultant cost */
// For a single key, cost is equal to frequency of the key
for (int i = 0; i < n; i++)
cost.SetCell(i, i, freq[i]);
// Now we need to consider chains of length 2, 3, ... .
// L is chain length.
for (int L = 2; L <= n; L++)
{
// i is row number in cost[][]
for (int i = 0; i <= n - L; i++)
{
// Get column number j from row number i and chain length L
int j = i + L - 1;
cost.SetCell(i, j, INT_MAX);
// Try making all keys in interval keys[i..j] as root
for (int r = i; r <= j; r++)
{
// c = cost when keys[r] becomes root of this subtree
int c = ((r > i) ? cost.GetCell(i, r - 1) : 0) +
((r < j) ? cost.GetCell(r + 1, j) : 0) +
sum(freq, i, j);
if (c < cost.GetCell(i, j))
{
cost.SetCell(i, j, c);
optimalKeyIndexes.SetCell(i, j, r);
}
}
}
}
return cost.GetCell(0, n - 1);
}
Below is utility class Matrix:
class Matrix
{
private:
int rowCount;
int columnCount;
std::vector<int> cells;
public:
Matrix()
{
}
Matrix(int rows, int columns)
{
rowCount = rows;
columnCount = columns;
cells = std::vector<int>(rows * columns);
}
int GetCell(int rowNum, int columnNum)
{
return cells[columnNum + rowNum * columnCount];
}
void SetCell(int rowNum, int columnNum, int value)
{
cells[columnNum + rowNum * columnCount] = value;
}
};
And main method with utility function to print result tree in in-order:
//Print result tree in in-order
void PrintResultTree(
Matrix& optimalKeyIndexes,
int startIndex,
int endIndex,
int* keys)
{
if (startIndex == endIndex)
{
printf("%d\n", keys[startIndex]);
return;
}
else if (startIndex > endIndex)
{
return;
}
int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
PrintResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys);
printf("%d\n", keys[currentOptimalKeyIndex]);
PrintResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys);
}
int main(int argc, char* argv[])
{
int keys[] = { 10, 12, 20 };
int freq[] = { 34, 8, 50 };
int n = sizeof(keys) / sizeof(keys[0]);
Matrix optimalKeyIndexes;
printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);
return 0;
}
EDIT:
Below you can find code to create simple tree like structure.
Here is utility TreeNode class
struct TreeNode
{
public:
int Key;
TreeNode* Left;
TreeNode* Right;
};
Updated main function with BuildResultTree function
void BuildResultTree(Matrix& optimalKeyIndexes,
int startIndex,
int endIndex,
int* keys,
TreeNode*& tree)
{
if (startIndex > endIndex)
{
return;
}
tree = new TreeNode();
tree->Left = NULL;
tree->Right = NULL;
if (startIndex == endIndex)
{
tree->Key = keys[startIndex];
return;
}
int currentOptimalKeyIndex = optimalKeyIndexes.GetCell(startIndex, endIndex);
tree->Key = keys[currentOptimalKeyIndex];
BuildResultTree(optimalKeyIndexes, startIndex, currentOptimalKeyIndex - 1, keys, tree->Left);
BuildResultTree(optimalKeyIndexes, currentOptimalKeyIndex + 1, endIndex, keys, tree->Right);
}
int main(int argc, char* argv[])
{
int keys[] = { 10, 12, 20 };
int freq[] = { 34, 8, 50 };
int n = sizeof(keys) / sizeof(keys[0]);
Matrix optimalKeyIndexes;
printf("Cost of Optimal BST is %d \n", optimalSearchTree(keys, freq, n, optimalKeyIndexes));
PrintResultTree(optimalKeyIndexes, 0, n - 1, keys);
TreeNode* tree = new TreeNode();
BuildResultTree(optimalKeyIndexes, 0, n - 1, keys, tree);
return 0;
}

Find the number of possible combinations of three permuted lists

I'm looking for a solution for this task:
There are three permuted integer lists:
index
0 1 2 3
[2,4,3,1]
[3,4,1,2]
[1,2,4,3]
I'd like to know how many combinations of three tuples across the lists there are. For example, after rotating the second list by one to the right and the third list by one to the left:
0 1 2 3
[2,4,3,1]
3 0 1 2
[2,3,4,1]
1 2 3 0
[2,4,3,1]
would result in two combinations (2,2,2) and (1,1,1). I'm only interested in the number of combinations, not the actual combinations themselves.
The lists have always the same length N. From my understanding, there are is at least one combination and maximally N.
I've written an imperative solution, using three nested for loops, but for larger problems sizes (e.g. N > 1000) this quickly becomes unbearable.
Is there are more efficient approach than brute force (trying all combinations)?. Maybe some clever algorithm or a mathematical trick?
Edit:
I'm rephrasing the question to make it (hopefully) more clear:
I have 3 permutations of a list [1..N].
The lists can be individually rotated left or right, until the elements for some indexes line up. In the above example that would be:
Right rotate list 2 by 1
Left rotate list 3 by 1
Now the columns are aligned for 2 and 1.
I've also added the indexes the example above. Please tell me, if it's still unclear.
My code so far:
#include <iostream>
int
solve(int n, int * a, int * b, int * c)
{
int max = 0;
for (int i = 0; i < n; ++i) {
int m = 0;
for (int j = 0; j < n; ++j) {
if (a[i] == b[j]) {
for (int k = 0; k < n; ++k) {
if (a[i] == c[k]) {
for (int l = 0; l < n; ++l) {
if (a[l] == b[(l+j) % n] && a[l] == b[(l+k) % n]) {
++m;
}
}
}
}
}
}
if (m > max) {
max = m;
}
}
return max;
}
int
main(int argc, char ** argv)
{
int n = 5;
int a[] = { 1, 5, 4, 3, 2 };
int b[] = { 1, 3, 2, 4, 5 };
int c[] = { 2, 1, 5, 4, 3 };
std::cout << solve(n, a, b, c) << std::endl;
return 0;
}
Here is an efficient solution:
Let's assume that we have picked a fixed element from the first list and we want to match it to the elements from the second and the third list with the same value. It uniquely determines the rotation of the second and the third list(we can assume that the first list is never rotated). It gives us a pair of two integers: (the position of this element in the first list minus its position in the second list modulo N, the same thing for the first and the third list).
Now we can iterate over all elements of the first list and generate these pairs.
The answer is the number of ocurrences of the most frequent pair.
The time complexity is O(N * log N) if we use standard sort to find the most frequent pair or O(N) if we use radix sort or a hash table.
You can make it by creating all combinations like:
0,0,0
0,0,1
0,1,0
0,1,1
1,0,0
1,0,1
1,1,0
1,1,1
Each 0 / 1 can be your array
This code can help you creating this list above:
private static ArrayList<String> getBinaryArray(ArrayList<Integer> array){
//calculating the possible combinations we can get
int possibleCombinations = (int) Math.pow(2, array.size());
//creating an array with all the possible combinations in binary
String binary = "";
ArrayList<String> binaryArray = new ArrayList<String>();
for (int k = 0; k <possibleCombinations; k++) {
binary = Integer.toBinaryString(k);
//adding '0' as much as we need
int len = (array.size() - binary.length());
for (int w = 1; w<=len; w++) {
binary = "0" + binary;
}
binaryArray.add(binary);
}
return binaryArray;
}
it's also either can be with 0/1/2 numbers which each other number can be the lists you got.
if it's not so clear please tell me

Modifying this Quicksort to always use the last element as the pivot

I have the following Quicksort that always chooses the first element of the subsequence as its pivot:
void qqsort(int array[], int start, int end) {
int i = start; // index of left-to-right scan
int k = end; // index of right-to-left scan
if (end - start >= 1) { // check that there are at least two elements to sort
int pivot = array[start]; // set the pivot as the first element in the partition
while (k > i) { // while the scan indices from left and right have not met,
while (array[i] <= pivot && i <= end && k > i) // from the left, look for the first element greater than the pivot
i++;
while (array[k] > pivot && k >= start && k >= i) // from the right, look for the first element not greater than the pivot
k--;
if (k > i) // if the left seekindex is still smaller than the right index, swap the corresponding elements
swap(array, i, k);
}
swap(array, start, k); // after the indices have crossed, swap the last element in the left partition with the pivot
qqsort(array, start, k - 1); // quicksort the left partition
qqsort(array, k + 1, end); // quicksort the right partition
} else { // if there is only one element in the partition, do not do any sorting
return;
}
}
Now as you can see, this algorithm always takes the first element to be the pivot: int pivot = array[start];
I want to modify this algorithm to make it always use the last element instead of the first element of the subsequence, because I want to analyze the physical running times of both implementations.
I tried changing the line int pivot = array[start]; to int pivot = array[end]; but the algorithm then outputted an unsorted sequence:
//Changes: int pivot = array[end];
unsorted: {5 4 3 2 1}
*sorted*: {1 2 5 4 3}
To test another pivot, I also tried using the center element of the subsequence but the algorithm still failed:
//Changes: int pivot = array[(start + end) / 2];
unsorted: {5 3 4 2 1}
*sorted*: {3 2 4 1 5}
Can someone please help me understand this algorithm correctly and tell me what changes do I need to make to successfully have this implementation always choose the last element of the subsequence as the pivot?
The Cause of the Problem
The problem is that you use int k = end;. It was fine to use int i = start; when you had the pivot element as the first element in the array because your checks in the loop will skim past it (array[i] <= pivot). However, when you use the last element as the pivot, k stops on the end index and switches the pivot to a position in the left half of the partition. Already you're in trouble because your pivot will most likely be somewhere inside of the left partition rather than at the border .
The Solution
To fix this, you need to set int k = end - 1; when you use the rightmost element as the pivot. You'll also need to change the lines for swapping the pivot to the border between the left and right partitions:
swap(array, i, end);
qqsort(array, start, i - 1);
qqsort(array, i + 1, end);
You have to use i for this because i will end up at the leftmost element of the right partition (which can then be swapped with the pivot being in the rightmost element and it will preserver the order). Lastly, you'll want to change k >= i to k > i in the while which decrements k or else there is small change of an array[-1] indexing error. This wasn't possible to happen before because i always at least was equal to i+1 by this point.
That should do it.
Sidenote:
This is a poorly written quicksort which I wouldn't recommend learning from. It has a some extraneous, unnecessary comparisons along with some other faults that I won't waste time listing. I would recommend using the quicksorts in this presentation by Sedgewick and Bentley.
I didn't test it, but check it anyway:
this
// after the indices have crossed,
// swap the last element in the left partition with the pivot
swap(array, start, k);
probably should be
swap(array, end, i);
or something similar, if we choose end as pivot.
Edit: That's an interesting partitioning algorithm, but it's not the standard one.
Well, the pivot is fixed in the logic of the partitioning.
The algorithm treats the first element as the Head and the rest elements as the Body to be partitioned.
After the partitioning is done, as a final step, the head (pivot) is swapped with the last element of the left partitioned part, to keep the ordering.
The only way I figured to use a different pivot, without changing the algorithm, is this:
...
if (end - start >= 1) {
// Swap the 1st element (Head) with the pivot
swap(array, start, pivot_index);
int pivot = array[start];
...
First hint: If the data are random, it does not matter, on the average, which value you choose as pivot. The only way to actually improve the "quality" of the pivot is to take more (e.g. 3) indices and use the one with median value of these.
Second hint: If you change the pivot value, you also need to change the pivot index. This is not named explicitly, but array[start] is swapped into the "middle" of the sorted subsequence at one point. You need to modify this line accordingly. If you take an index which is not at the edge of the subsequence, you need to swap it to the edge first, before the iteration.
Third hint: The code you provided is excessively commented. You should be able to actually understand this implementation.
Put a single
swap(array, start, end)
before initializing pivot
int pivot = array[start]
#include <time.h>
#include <stdlib.h>
#include<iostream>
#include<fstream>
using namespace std;
int counter=0;
void disp(int *a,int n)
{
for(int i=0;i<n;i++)
cout<<a[i]<<" ";
cout<<endl;
}
void swap(int a[],int p,int q)
{
int temp;
temp=a[p];
a[p]=a[q];
a[q]=temp;
}
int partition(int a[], int p, int start, int end)
{
swap(a,p,start);// to swap the pivot with the first element of the partition
counter+=end-start; // instead of (end-start+1)
int i=start+1;
for(int j=start+1 ; j<=end ; j++)
{
if(a[j]<a[start])
{
swap(a,j,i);
i++;
}
}
swap(a,start,i-1); // not swap(a,p,i-1) because p and start were already swaped..... this was the earlier mistake comitted
return i-1; // returning the adress of pivot
}
void quicksort(int a[],int start,int end)
{
if(start>=end)
return;
int p=end; // here we are choosing last element of the sub array as pivot
// here p is the index of the array where pivot is chosen randomly
int index=partition(a,p,start,end);
quicksort(a,start,index-1);
quicksort(a,index+1,end);
}
int main()
{
ifstream fin("data.txt");
int count=0;
int array[100000];
while(fin>>array[count])
{
count++;
}
quicksort(array,0,count-1);
/*
int a[]={32,56,34,45,23,54,78};
int n=sizeof(a)/sizeof(int);
disp(a,n);
quicksort(a,0,n-1);
disp(a,n);*/
cout<<endl<<counter;
return 0;
}
If you start monitoring each element from the 1st element of the array to the last - 1, keeping the last element as the pivot at every recursion, then you will get the answer in exact O(nlogn) time.
#include<stdio.h>
void quicksort(int [], int, int);
int main()
{
int n, i = 0, a[20];
scanf("%d", &n);
while(i < n)
scanf("%d", &a[i++]);
quicksort(a, 0, n - 1);
i = 0;
while(i < n)
printf("%d", a[i++]);
}
void quicksort(int a[], int p, int r)
{
int i, j, x, temp;
if(p < r)
{
i = p;
x = a[r];
for(j = p; j < r; j++)
{
if(a[j] <= x)
{
if(a[j] <a[i])
{
temp = a[j];
a[j] = a[i];
a[i] = temp;
}
i++;
}
else
{
temp = a[i];
a[i] = a[j];
a[j] = temp;
}
}
if(x != i)
{
temp = a[r];
a[r] = a[i];
a[i] = temp;
}
quicksort(a, p, i - 1);
quicksort(a, i + 1, r);
}
}

How to find the kth largest element in an unsorted array of length n in O(n)?

I believe there's a way to find the kth largest element in an unsorted array of length n in O(n). Or perhaps it's "expected" O(n) or something. How can we do this?
This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.
Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):
Select(A,n,i):
Divide input into ⌈n/5⌉ groups of size 5.
/* Partition on median-of-medians */
medians = array of each group’s median.
pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
Left Array L and Right Array G = partition(A, pivot)
/* Find ith element in L, pivot, or G */
k = |L| + 1
If i = k, return pivot
If i < k, return Select(L, k-1, i)
If i > k, return Select(G, n-k, i-k)
It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.
If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)
The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.
Here is the algorithm.
QuickSelect(A, k)
let r be chosen uniformly at random in the range 1 to length(A)
let pivot = A[r]
let A1, A2 be new arrays
# split into a pile A1 of small elements and A2 of big elements
for i = 1 to n
if A[i] < pivot then
append A[i] to A1
else if A[i] > pivot then
append A[i] to A2
else
# do nothing
end for
if k <= length(A1):
# it's in the pile of small elements
return QuickSelect(A1, k)
else if k > length(A) - length(A2)
# it's in the pile of big elements
return QuickSelect(A2, k - (length(A) - length(A2))
else
# it's equal to the pivot
return pivot
What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of
T(n) = Theta(n) + T(n-1) = Theta(n2)
But if the choices are indeed random, the expected running time is given by
T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))
where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.
Let's guess that T(n) <= an for some a. Then we get
T(n)
<= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
= cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n ai
and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:
∑i=floor(n/2) to n i
= ∑i=1 to n i - ∑i=1 to floor(n/2) i
= n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2
<= n2/2 - (n/4)2/2
= (15/32)n2
where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with
cn + 2 (1/n) ∑i=floor(n/2) to n ai,
<= cn + (2a/n) (15/32) n2
= n (c + (15/16)a)
<= an
provided a > 16c.
This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).
A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17
"Make one pass through tracking the three largest values so far."
(it was specifically for 3d largest)
and this answer:
Build a heap/priority queue. O(n)
Pop top element. O(log n)
Pop top element. O(log n)
Pop top element. O(log n)
Total = O(n) + 3 O(log n) = O(n)
You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).
A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.
I answered the letter of your question :)
The C++ standard library has almost exactly that function call nth_element, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.
const int N = ...;
double a[N];
// ...
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a
You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen.
For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.
Warren's priority heap solution is neater though.
Although not very sure about O(n) complexity, but it will be sure to be between O(n) and nLog(n). Also sure to be closer to O(n) than nLog(n). Function is written in Java
public int quickSelect(ArrayList<Integer>list, int nthSmallest){
//Choose random number in range of 0 to array length
Random random = new Random();
//This will give random number which is not greater than length - 1
int pivotIndex = random.nextInt(list.size() - 1);
int pivot = list.get(pivotIndex);
ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();
//Split list into two.
//Value smaller than pivot should go to smallerNumberList
//Value greater than pivot should go to greaterNumberList
//Do nothing for value which is equal to pivot
for(int i=0; i<list.size(); i++){
if(list.get(i)<pivot){
smallerNumberList.add(list.get(i));
}
else if(list.get(i)>pivot){
greaterNumberList.add(list.get(i));
}
else{
//Do nothing
}
}
//If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list
if(nthSmallest < smallerNumberList.size()){
return quickSelect(smallerNumberList, nthSmallest);
}
//If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
//The step is bit tricky. If confusing, please see the above loop once again for clarification.
else if(nthSmallest > (list.size() - greaterNumberList.size())){
//nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in
//smallerNumberList
nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
return quickSelect(greaterNumberList,nthSmallest);
}
else{
return pivot;
}
}
I implemented finding kth minimimum in n unsorted elements using dynamic programming, specifically tournament method. The execution time is O(n + klog(n)). The mechanism used is listed as one of methods on Wikipedia page about Selection Algorithm (as indicated in one of the posting above). You can read about the algorithm and also find code (java) on my blog page Finding Kth Minimum. In addition the logic can do partial ordering of the list - return first K min (or max) in O(klog(n)) time.
Though the code provided result kth minimum, similar logic can be employed to find kth maximum in O(klog(n)), ignoring the pre-work done to create tournament tree.
Sexy quickselect in Python
def quickselect(arr, k):
'''
k = 1 returns first element in ascending order.
can be easily modified to return first element in descending order
'''
r = random.randrange(0, len(arr))
a1 = [i for i in arr if i < arr[r]] '''partition'''
a2 = [i for i in arr if i > arr[r]]
if k <= len(a1):
return quickselect(a1, k)
elif k > len(arr)-len(a2):
return quickselect(a2, k - (len(arr) - len(a2)))
else:
return arr[r]
As per this paper Finding the Kth largest item in a list of n items the following algorithm will take O(n) time in worst case.
Divide the array in to n/5 lists of 5 elements each.
Find the median in each sub array of 5 elements.
Recursively find the median of all the medians, lets call it M
Partition the array in to two sub array 1st sub-array contains the elements larger than M , lets say this sub-array is a1 , while other sub-array contains the elements smaller then M., lets call this sub-array a2.
If k <= |a1|, return selection (a1,k).
If k− 1 = |a1|, return M.
If k> |a1| + 1, return selection(a2,k −a1 − 1).
Analysis: As suggested in the original paper:
We use the median to partition the list into two halves(the first half,
if k <= n/2 , and the second half otherwise). This algorithm takes
time cn at the first level of recursion for some constant c, cn/2 at
the next level (since we recurse in a list of size n/2), cn/4 at the
third level, and so on. The total time taken is cn + cn/2 + cn/4 +
.... = 2cn = o(n).
Why partition size is taken 5 and not 3?
As mentioned in original paper:
Dividing the list by 5 assures a worst-case split of 70 − 30. Atleast
half of the medians greater than the median-of-medians, hence atleast
half of the n/5 blocks have atleast 3 elements and this gives a
3n/10 split, which means the other partition is 7n/10 in worst case.
That gives T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1, the
worst-case running time isO(n).
Now I have tried to implement the above algorithm as:
public static int findKthLargestUsingMedian(Integer[] array, int k) {
// Step 1: Divide the list into n/5 lists of 5 element each.
int noOfRequiredLists = (int) Math.ceil(array.length / 5.0);
// Step 2: Find pivotal element aka median of medians.
int medianOfMedian = findMedianOfMedians(array, noOfRequiredLists);
//Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian.
List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian
List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian
for (Integer element : array) {
if (element < medianOfMedian) {
listWithSmallerNumbers.add(element);
} else if (element > medianOfMedian) {
listWithGreaterNumbers.add(element);
}
}
// Next step.
if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k);
else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian;
else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1);
return -1;
}
public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) {
int[] medians = new int[noOfRequiredLists];
for (int count = 0; count < noOfRequiredLists; count++) {
int startOfPartialArray = 5 * count;
int endOfPartialArray = startOfPartialArray + 5;
Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray);
// Step 2: Find median of each of these sublists.
int medianIndex = partialArray.length/2;
medians[count] = partialArray[medianIndex];
}
// Step 3: Find median of the medians.
return medians[medians.length / 2];
}
Just for sake of completion, another algorithm makes use of Priority Queue and takes time O(nlogn).
public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) {
int p = 0;
int numElements = nums.length;
// create priority queue where all the elements of nums will be stored
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
// place all the elements of the array to this priority queue
for (int n : nums) {
pq.add(n);
}
// extract the kth largest element
while (numElements - k + 1 > 0) {
p = pq.poll();
k++;
}
return p;
}
Both of these algorithms can be tested as:
public static void main(String[] args) throws IOException {
Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
System.out.println(findKthLargestUsingMedian(numbers, 8));
System.out.println(findKthLargestUsingPriorityQueue(numbers, 8));
}
As expected output is:
18
18
Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies,
Now recurrence becomes:
T(n) = T(n/2) + cn
which gives me O (n) overal.
Below is the link to full implementation with quite an extensive explanation how the algorithm for finding Kth element in an unsorted algorithm works. Basic idea is to partition the array like in QuickSort. But in order to avoid extreme cases (e.g. when smallest element is chosen as pivot in every step, so that algorithm degenerates into O(n^2) running time), special pivot selection is applied, called median-of-medians algorithm. The whole solution runs in O(n) time in worst and in average case.
Here is link to the full article (it is about finding Kth smallest element, but the principle is the same for finding Kth largest):
Finding Kth Smallest Element in an Unsorted Array
How about this kinda approach
Maintain a buffer of length k and a tmp_max, getting tmp_max is O(k) and is done n times so something like O(kn)
Is it right or am i missing something ?
Although it doesn't beat average case of quickselect and worst case of median statistics method but its pretty easy to understand and implement.
There is also one algorithm, that outperforms quickselect algorithm. It's called Floyd-Rivets (FR) algorithm.
Original article: https://doi.org/10.1145/360680.360694
Downloadable version: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf
Wikipedia article https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm
I tried to implement quickselect and FR algorithm in C++. Also I compared them to the standard C++ library implementations std::nth_element (which is basically introselect hybrid of quickselect and heapselect). The result was quickselect and nth_element ran comparably on average, but FR algorithm ran approx. twice as fast compared to them.
Sample code that I used for FR algorithm:
template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
if (n == 0)
return *(std::min_element(data.begin(), data.end()));
else if (n == data.size() - 1)
return *(std::max_element(data.begin(), data.end()));
else
return _FRselect(data, 0, data.size() - 1, n);
}
template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
size_t leftIdx = left;
size_t rightIdx = right;
while (rightIdx > leftIdx)
{
if (rightIdx - leftIdx > 600)
{
size_t range = rightIdx - leftIdx + 1;
long long i = n - (long long)leftIdx + 1;
long long z = log(range);
long long s = 0.5 * exp(2 * z / 3);
long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);
size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);
_FRselect(data, newLeft, newRight, n);
}
T t = data[n];
size_t i = leftIdx;
size_t j = rightIdx;
// arrange pivot and right index
std::swap(data[leftIdx], data[n]);
if (data[rightIdx] > t)
std::swap(data[rightIdx], data[leftIdx]);
while (i < j)
{
std::swap(data[i], data[j]);
++i; --j;
while (data[i] < t) ++i;
while (data[j] > t) --j;
}
if (data[leftIdx] == t)
std::swap(data[leftIdx], data[j]);
else
{
++j;
std::swap(data[j], data[rightIdx]);
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= n)
leftIdx = j + 1;
if (n <= j)
rightIdx = j - 1;
}
return data[leftIdx];
}
template <typename T>
int sgn(T val) {
return (T(0) < val) - (val < T(0));
}
iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)
i would like to suggest one answer
if we take the first k elements and sort them into a linked list of k values
now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)
cheers
Explanation of the median - of - medians algorithm to find the k-th largest integer out of n can be found here:
http://cs.indstate.edu/~spitla/presentation.pdf
Implementation in c++ is below:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int findMedian(vector<int> vec){
// Find median of a vector
int median;
size_t size = vec.size();
median = vec[(size/2)];
return median;
}
int findMedianOfMedians(vector<vector<int> > values){
vector<int> medians;
for (int i = 0; i < values.size(); i++) {
int m = findMedian(values[i]);
medians.push_back(m);
}
return findMedian(medians);
}
void selectionByMedianOfMedians(const vector<int> values, int k){
// Divide the list into n/5 lists of 5 elements each
vector<vector<int> > vec2D;
int count = 0;
while (count != values.size()) {
int countRow = 0;
vector<int> row;
while ((countRow < 5) && (count < values.size())) {
row.push_back(values[count]);
count++;
countRow++;
}
vec2D.push_back(row);
}
cout<<endl<<endl<<"Printing 2D vector : "<<endl;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
cout<<vec2D[i][j]<<" ";
}
cout<<endl;
}
cout<<endl;
// Calculating a new pivot for making splits
int m = findMedianOfMedians(vec2D);
cout<<"Median of medians is : "<<m<<endl;
// Partition the list into unique elements larger than 'm' (call this sublist L1) and
// those smaller them 'm' (call this sublist L2)
vector<int> L1, L2;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
if (vec2D[i][j] > m) {
L1.push_back(vec2D[i][j]);
}else if (vec2D[i][j] < m){
L2.push_back(vec2D[i][j]);
}
}
}
// Checking the splits as per the new pivot 'm'
cout<<endl<<"Printing L1 : "<<endl;
for (int i = 0; i < L1.size(); i++) {
cout<<L1[i]<<" ";
}
cout<<endl<<endl<<"Printing L2 : "<<endl;
for (int i = 0; i < L2.size(); i++) {
cout<<L2[i]<<" ";
}
// Recursive calls
if ((k - 1) == L1.size()) {
cout<<endl<<endl<<"Answer :"<<m;
}else if (k <= L1.size()) {
return selectionByMedianOfMedians(L1, k);
}else if (k > (L1.size() + 1)){
return selectionByMedianOfMedians(L2, k-((int)L1.size())-1);
}
}
int main()
{
int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
vector<int> vec(values, values + 25);
cout<<"The given array is : "<<endl;
for (int i = 0; i < vec.size(); i++) {
cout<<vec[i]<<" ";
}
selectionByMedianOfMedians(vec, 8);
return 0;
}
There is also Wirth's selection algorithm, which has a simpler implementation than QuickSelect. Wirth's selection algorithm is slower than QuickSelect, but with some improvements it becomes faster.
In more detail. Using Vladimir Zabrodsky's MODIFIND optimization and the median-of-3 pivot selection and paying some attention to the final steps of the partitioning part of the algorithm, i've came up with the following algorithm (imaginably named "LefSelect"):
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
int l=0, m = n-1, i=l, j=m;
float x;
while (l<m) {
if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
if( a[j] < a[k] ) F_SWAP(a[k],a[j]);
x=a[k];
while (j>k & i<k) {
do i++; while (a[i]<x);
do j--; while (a[j]>x);
F_SWAP(a[i],a[j]);
}
i++; j--;
if (j<k) {
while (a[i]<x) i++;
l=i; j=m;
}
if (k<i) {
while (x<a[j]) j--;
m=j; i=l;
}
}
return a[k];
}
In benchmarks that i did here, LefSelect is 20-30% faster than QuickSelect.
Haskell Solution:
kthElem index list = sort list !! index
withShape ~[] [] = []
withShape ~(x:xs) (y:ys) = x : withShape xs ys
sort [] = []
sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs)
where
ls = filter (< x)
rs = filter (>= x)
This implements the median of median solutions by using the withShape method to discover the size of a partition without actually computing it.
Here is a C++ implementation of Randomized QuickSelect. The idea is to randomly pick a pivot element. To implement randomized partition, we use a random function, rand() to generate index between l and r, swap the element at randomly generated index with the last element, and finally call the standard partition process which uses last element as pivot.
#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;
int randomPartition(int arr[], int l, int r);
// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method. ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
// If k is smaller than number of elements in array
if (k > 0 && k <= r - l + 1)
{
// Partition the array around a random element and
// get position of pivot element in sorted array
int pos = randomPartition(arr, l, r);
// If position is same as k
if (pos-l == k-1)
return arr[pos];
if (pos-l > k-1) // If position is more, recur for left subarray
return kthSmallest(arr, l, pos-1, k);
// Else recur for right subarray
return kthSmallest(arr, pos+1, r, k-pos+l-1);
}
// If k is more than number of elements in array
return INT_MAX;
}
void swap(int *a, int *b)
{
int temp = *a;
*a = *b;
*b = temp;
}
// Standard partition process of QuickSort(). It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
int x = arr[r], i = l;
for (int j = l; j <= r - 1; j++)
{
if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
{
swap(&arr[i], &arr[j]);
i++;
}
}
swap(&arr[i], &arr[r]); // swap the pivot
return i;
}
// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
int n = r-l+1;
int pivot = rand() % n;
swap(&arr[l + pivot], &arr[r]);
return partition(arr, l, r);
}
// Driver program to test above methods
int main()
{
int arr[] = {12, 3, 5, 7, 4, 19, 26};
int n = sizeof(arr)/sizeof(arr[0]), k = 3;
cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
return 0;
}
The worst case time complexity of the above solution is still O(n2).In worst case, the randomized function may always pick a corner element. The expected time complexity of above randomized QuickSelect is Θ(n)
Have Priority queue created.
Insert all the elements into heap.
Call poll() k times.
public static int getKthLargestElements(int[] arr)
{
PriorityQueue<Integer> pq = new PriorityQueue<>((x , y) -> (y-x));
//insert all the elements into heap
for(int ele : arr)
pq.offer(ele);
// call poll() k times
int i=0;
while(i<k)
{
int result = pq.poll();
}
return result;
}
This is an implementation in Javascript.
If you release the constraint that you cannot modify the array, you can prevent the use of extra memory using two indexes to identify the "current partition" (in classic quicksort style - http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/).
function kthMax(a, k){
var size = a.length;
var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2)
//Create an array with all element lower than the pivot and an array with all element higher than the pivot
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
lowerArray.push(current);
} else if (current > pivot) {
upperArray.push(current);
}
}
//Which one should I continue with?
if(k <= upperArray.length) {
//Upper
return kthMax(upperArray, k);
} else {
var newK = k - (size - lowerArray.length);
if (newK > 0) {
///Lower
return kthMax(lowerArray, newK);
} else {
//None ... it's the current pivot!
return pivot;
}
}
}
If you want to test how it perform, you can use this variation:
function kthMax (a, k, logging) {
var comparisonCount = 0; //Number of comparison that the algorithm uses
var memoryCount = 0; //Number of integers in memory that the algorithm uses
var _log = logging;
if(k < 0 || k >= a.length) {
if (_log) console.log ("k is out of range");
return false;
}
function _kthmax(a, k){
var size = a.length;
var pivot = a[parseInt(Math.random()*size)];
if(_log) console.log("Inputs:", a, "size="+size, "k="+k, "pivot="+pivot);
// This should never happen. Just a nice check in this exercise
// if you are playing with the code to avoid never ending recursion
if(typeof pivot === "undefined") {
if (_log) console.log ("Ops...");
return false;
}
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
comparisonCount += 1;
memoryCount++;
lowerArray.push(current);
} else if (current > pivot) {
comparisonCount += 2;
memoryCount++;
upperArray.push(current);
}
}
if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);
if(k <= upperArray.length) {
comparisonCount += 1;
return _kthmax(upperArray, k);
} else if (k > size - lowerArray.length) {
comparisonCount += 2;
return _kthmax(lowerArray, k - (size - lowerArray.length));
} else {
comparisonCount += 2;
return pivot;
}
/*
* BTW, this is the logic for kthMin if we want to implement that... ;-)
*
if(k <= lowerArray.length) {
return kthMin(lowerArray, k);
} else if (k > size - upperArray.length) {
return kthMin(upperArray, k - (size - upperArray.length));
} else
return pivot;
*/
}
var result = _kthmax(a, k);
return {result: result, iterations: comparisonCount, memory: memoryCount};
}
The rest of the code is just to create some playground:
function getRandomArray (n){
var ar = [];
for (var i = 0, l = n; i < l; i++) {
ar.push(Math.round(Math.random() * l))
}
return ar;
}
//Create a random array of 50 numbers
var ar = getRandomArray (50);
Now, run you tests a few time.
Because of the Math.random() it will produce every time different results:
kthMax(ar, 2, true);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 34, true);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
If you test it a few times you can see even empirically that the number of iterations is, on average, O(n) ~= constant * n and the value of k does not affect the algorithm.
I came up with this algorithm and seems to be O(n):
Let's say k=3 and we want to find the 3rd largest item in the array. I would create three variables and compare each item of the array with the minimum of these three variables. If array item is greater than our minimum, we would replace the min variable with the item value. We continue the same thing until end of the array. The minimum of our three variables is the 3rd largest item in the array.
define variables a=0, b=0, c=0
iterate through the array items
find minimum a,b,c
if item > min then replace the min variable with item value
continue until end of array
the minimum of a,b,c is our answer
And, to find Kth largest item we need K variables.
Example: (k=3)
[1,2,4,1,7,3,9,5,6,2,9,8]
Final variable values:
a=7 (answer)
b=8
c=9
Can someone please review this and let me know what I am missing?
Here is the implementation of the algorithm eladv suggested(I also put here the implementation with random pivot):
public class Median {
public static void main(String[] s) {
int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
System.out.println(selectK(test,8));
/*
int n = 100000000;
int[] test = new int[n];
for(int i=0; i<test.length; i++)
test[i] = (int)(Math.random()*test.length);
long start = System.currentTimeMillis();
random_selectK(test, test.length/2);
long end = System.currentTimeMillis();
System.out.println(end - start);
*/
}
public static int random_selectK(int[] a, int k) {
if(a.length <= 1)
return a[0];
int r = (int)(Math.random() * a.length);
int p = a[r];
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return random_selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return random_selectK(temp,k-small-equal);
}
}
public static int selectK(int[] a, int k) {
if(a.length <= 5) {
Arrays.sort(a);
return a[k-1];
}
int p = median_of_medians(a);
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return selectK(temp,k-small-equal);
}
}
private static int median_of_medians(int[] a) {
int[] b = new int[a.length/5];
int[] temp = new int[5];
for(int i=0; i<b.length; i++) {
for(int j=0; j<5; j++)
temp[j] = a[5*i + j];
Arrays.sort(temp);
b[i] = temp[2];
}
return selectK(b, b.length/2 + 1);
}
}
it is similar to the quickSort strategy, where we pick an arbitrary pivot, and bring the smaller elements to its left, and the larger to the right
public static int kthElInUnsortedList(List<int> list, int k)
{
if (list.Count == 1)
return list[0];
List<int> left = new List<int>();
List<int> right = new List<int>();
int pivotIndex = list.Count / 2;
int pivot = list[pivotIndex]; //arbitrary
for (int i = 0; i < list.Count && i != pivotIndex; i++)
{
int currentEl = list[i];
if (currentEl < pivot)
left.Add(currentEl);
else
right.Add(currentEl);
}
if (k == left.Count + 1)
return pivot;
if (left.Count < k)
return kthElInUnsortedList(right, k - left.Count - 1);
else
return kthElInUnsortedList(left, k);
}
Go to the End of this link : ...........
http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/
You can find the kth smallest element in O(n) time and constant space. If we consider the array is only for integers.
The approach is to do a binary search on the range of Array values. If we have a min_value and a max_value both in integer range, we can do a binary search on that range.
We can write a comparator function which will tell us if any value is the kth-smallest or smaller than kth-smallest or bigger than kth-smallest.
Do the binary search until you reach the kth-smallest number
Here is the code for that
class Solution:
def _iskthsmallest(self, A, val, k):
less_count, equal_count = 0, 0
for i in range(len(A)):
if A[i] == val: equal_count += 1
if A[i] < val: less_count += 1
if less_count >= k: return 1
if less_count + equal_count < k: return -1
return 0
def kthsmallest_binary(self, A, min_val, max_val, k):
if min_val == max_val:
return min_val
mid = (min_val + max_val)/2
iskthsmallest = self._iskthsmallest(A, mid, k)
if iskthsmallest == 0: return mid
if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
return self.kthsmallest_binary(A, mid+1, max_val, k)
# #param A : tuple of integers
# #param B : integer
# #return an integer
def kthsmallest(self, A, k):
if not A: return 0
if k > len(A): return 0
min_val, max_val = min(A), max(A)
return self.kthsmallest_binary(A, min_val, max_val, k)
What I would do is this:
initialize empty doubly linked list l
for each element e in array
if e larger than head(l)
make e the new head of l
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.
Update:
initialize empty sorted tree l
for each element e in array
if e between head(l) and tail(l)
insert e into l // O(log k)
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
First we can build a BST from unsorted array which takes O(n) time and from the BST we can find the kth smallest element in O(log(n)) which over all counts to an order of O(n).

Resources