Perfect minimal hash for mathematical combinations - algorithm

First, define two integers N and K, where N >= K, both known at compile time. For example: N = 8 and K = 3.
Next, define a set of integers [0, N) (or [1, N] if that makes the answer simpler) and call it S. For example: {0, 1, 2, 3, 4, 5, 6, 7}
The number of subsets of S with K elements is given by the formula C(N, K). Example
My problem is this: Create a perfect minimal hash for those subsets. The size of the example hash table will be C(8, 3) or 56.
I don't care about ordering, only that there be 56 entries in the hash table, and that I can determine the hash quickly from a set of K integers. I also don't care about reversibility.
Example hash: hash({5, 2, 3}) = 42. (The number 42 isn't important, at least not here)
Is there a generic algorithm for this that will work with any values of N and K? I wasn't able to find one by searching Google, or my own naive efforts.

There is an algorithm to code and decode a combination into its number in the lexicographical order of all combinations with a given fixed K. The algorithm is linear to N for both code and decode of the combination. What language are you interested in?
EDIT: here is example code in c++(it founds the lexicographical number of a combination in the sequence of all combinations of n elements as opposed to the ones with k elements but is really good starting point):
typedef long long ll;
// Returns the number in the lexicographical order of all combinations of n numbers
// of the provided combination.
ll code(vector<int> a,int n)
{
sort(a.begin(),a.end());
int cur = 0;
int m = a.size();
ll res =0;
for(int i=0;i<a.size();i++)
{
if(a[i] == cur+1)
{
res++;
cur = a[i];
continue;
}
else
{
res++;
int number_of_greater_nums = n - a[i];
for(int j = a[i]-1,increment=1;j>cur;j--,increment++)
res += 1LL << (number_of_greater_nums+increment);
cur = a[i];
}
}
return res;
}
// Takes the lexicographical code of a combination of n numbers and returns the
// combination
vector<int> decode(ll kod, int n)
{
vector<int> res;
int cur = 0;
int left = n; // Out of how many numbers are we left to choose.
while(kod)
{
ll all = 1LL << left;// how many are the total combinations
for(int i=n;i>=0;i--)
{
if(all - (1LL << (n-i+1)) +1 <= kod)
{
res.push_back(i);
left = n-i;
kod -= all - (1LL << (n-i+1)) +1;
break;
}
}
}
return res;
}
I am sorry I have an algorithm for the problem you are asking for right now, but I believe it will be a good exercise to try to understand what I do above. Truth is this is one of the algorithms I teach in the course "Design and analysis of algorithms" and that is why I had it pre-written.

This is what you (and I) need:
hash() maps k-tuples from [1..n] onto the set 1..C(n,k)\subset N.
The effort is k subtractions (and O(k) is a lower bound anyway, see Strandjev's remark above):
// bino[n][k] is (n "over" k) = C(n,k) = {n \choose k}
// these are assumed to be precomputed globals
int hash(V a,int n, int k) {// V is assumed to be ordered, a_k<...<a_1
// hash(a_k,..,a_2,a_1) = (n k) - sum_(i=1)^k (n-a_i i)
// ii is "inverse i", runs from left to right
int res = bino[n][k];
int i;
for(unsigned int ii = 0; ii < a.size(); ++ii) {
i = a.size() - ii;
res = res - bino[n-a[ii]][i];
}
return res;
}

Related

K-th Smallest in Lexicographical Order

Given integers n and k, find the lexicographically k-th smallest integer in the range from 1 to n.
Note: 1 ≤ k ≤ n ≤ 109.
Example:
Input:
n: 13 k: 2
Output:
10
Explanation:
The lexicographical order is [1, 10, 11, 12, 13, 2, 3, 4, 5, 6, 7, 8, 9], so the second smallest number is 10.
I have written a code which works fine but when I give very high input it takes a lot of time to execute and hence time out. Could some one please suggest me how i can make it more efficient.
Thanks!!
public class Solution {
class MyComp implements Comparator<Integer>{
#Override
public int compare(Integer n1, Integer n2) {
return String.valueOf(n1).compareTo(String.valueOf(n2));
}
}
public int findKthNumber(int n, int k) {
if(n==0 || k ==0 || k > n) return 0;
int[] tracker = new int[9];
Arrays.fill(tracker,0);
Map<Integer,TreeSet<Integer>> map = new HashMap<Integer,TreeSet<Integer>>();
for(int i =1;i<=n;i++){
String prefix = String.valueOf(i);
int currIndex = Integer.parseInt(prefix.substring(0,1));
//Update count
tracker[currIndex-1] = tracker[currIndex-1] + 1;
if(map.containsKey(currIndex)){
TreeSet<Integer> set = map.get(currIndex);
set.add(i);
map.put(currIndex,set);
}else{
TreeSet<Integer> set = new TreeSet<Integer>(new MyComp());
set.add(i);
map.put(currIndex,set);
}
}
// counter to check the if we reach near by K
int count =1;
for(int i=0;i<9 ;i++ ){
int lookUp = i+1;
int val = tracker[i];
if( count + map.get(lookUp).size() > k){
for(int res : map.get(lookUp)){
if(count == k) return res;
count++;
}
}
count = count + map.get(lookUp).size();
}
return 0;
}
}
You can use QuickSort for finding location. That will give you nlogn but will be much quicker n simpler.
Psudo code
-> pick random number of out list.
-> put all element lesser than that one side,higher than that number other side.
->if the number of lesser elements are k-1, you got the answer.
->if lesser elements are less than k-1, apply same algo on right side.
-> if lesser numbers of elements are greater than k, apply same algo on left side.
You can do this inplace
Best time complexity is o (1)
Worst time complexity is o (n*n)
But it gives very stable performance for multiple iteration
Random nature works on all sort of data
Let me know if any of the steps are not clear :)
For such small numbers you could generate all Strings in an array, sort it and return the k-th entry:
String[] arr = new String[n];
for (int i = 0; i < n; i++)
arr[i] = String.valueOf(n + 1);
Arrays.sort(arr);
return Integer.parseInt(arr[k - 1]);
It seems to be much easier than counting to the k-th entry. You don't need to sort the whole array, because you only need to find the k-th smallest entry, anyway for those small numbers it doesn't matter.
Or even better use an integer array and the comparator you already created:
Integer[] arr = new Integer[n];
for (int i = 0; i < n; i++)
arr[i] = i + 1;
Arrays.sort(arr, MyComp);
return arr[k - 1].intValue();

SUM exactly using K elements solution

Problem: On a given array with N numbers, find subset of size M (exactly M elements) that equal to SUM.
I am looking for a Dynamic Programming(DP) solution for this problem. Basically looking to understand the matrix filled approach. I wrote below program but didn't add memoization as i am still wondering how to do that.
#include <stdio.h>
#define SIZE(a) sizeof(a)/sizeof(a[0])
int binary[100];
int a[] = {1, 2, 5, 5, 100};
void show(int* p, int size) {
int j;
for (j = 0; j < size; j++)
if (p[j])
printf("%d\n", a[j]);
}
void subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
show(binary, size);
} else if (sum < target && i < size) {
binary[i] = 1;
foo(target, i + 1, sum + a[i], a, size, K-1);
binary[i] = 0;
foo(target, i + 1, sum, a, size, K);
}
}
int main() {
int target = 10;
int K = 2;
subset_sum(target, 0, 0, a, SIZE(a), K);
}
Is the below recurrence solution makes sense?
Let DP[SUM][j][k] sum up to SUM with exactly K elements picked from 0 to j elements.
DP[i][j][k] = DP[i][j-1][k] || DP[i-a[j]][j-1][k-1] { input array a[0....j] }
Base cases are:
DP[0][0][0] = DP[0][j][0] = DP[0][0][k] = 1
DP[i][0][0] = DP[i][j][0] = 0
It means we can either consider this element ( DP[i-a[j]][j-1][k-1] ) or we don't consider the current element (DP[i][j-1][k]). If we consider current element, k is reduced by 1 which reduces the elements that needs to be considered and same goes when current element is not considered i.e. K is not reduced by 1.
Your solution looks right to me.
Right now, you're basically backtracking over all possibilities and printing each solution. If you only want one solution, you could add a flag that you set when one solution was found and check before continuing with recursive calls.
For memoization, you should first get rid of the binary array, after which you can do something like this:
int memo[NUM_ELEMENTS][MAX_SUM][MAX_K];
bool subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
memo[i][sum][K] = true;
return memo[i][sum][K];
} else if (sum < target && i < size) {
if (memo[i][sum][K] != -1)
return memo[i][sum][K];
memo[i][sum][K] = foo(target, i + 1, sum + a[i], a, size, K-1) ||
foo(target, i + 1, sum, a, size, K);
return memo[i][sum][K]
}
return false;
}
Then, look at memo[_all indexes_][target][K]. If this is true, there exists at least one solution. You can store addition information to get you that next solution, or you can iterate with an i from found_index - 1 to 0 and check for which i you have memo[i][sum - a[i]][K - 1] == true. Then recurse on that, and so on. This will allow you to reconstruct the solution using just the memo array.
To my understanding, if only the feasibility of the input has to be checked, the problem can be solved with a two-dimensional state space
bool[][] IsFeasible = new bool[n][k]
where IsFeasible[i][j] is true if and only if there is a subset of the elements 1 to i which sum up to exactly j for every
1 <= i <= n
1 <= j <= k
and for this state space, the recurrence relation
IsFeasible[i][j] = IsFeasible[i-1][k-a[i]] || IsFeasible[i-1][k]
can be used, where the left-hand side of the or-operator || corresponds to selecting the i-th item and the right-hand side corresponds to to not selecting the i-th item. The actual choice of items could be obtained by backtracking or auxiliary information saved during evaluation.

Dynamic programming exercise for string cutting

I have been working on the following problem from this book.
A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions 3 and 10, then making the first cut at position 3 incurs a total cost of 20+17=37, while doing position 10 first has a better cost of 20+10=30.
I need a dynamic programming algorithm that given m cuts, finds the minimum cost of cutting a string into m+1 pieces.
The divide and conquer approach seems to me the best one for this kind of problem. Here is a Java implementation of the algorithm:
Note: the array m should be sorted in ascending order (use Arrays.sort(m);)
public int findMinCutCost(int[] m, int n) {
int cost = n * m.length;
for (int i=0; i<m.length; i++) {
cost = Math.min(findMinCutCostImpl(m, n, i), cost);
}
return cost;
}
private int findMinCutCostImpl(int[] m, int n, int i) {
if (m.length == 1) return n;
int cl = 0, cr = 0;
if (i > 0) {
cl = Integer.MAX_VALUE;
int[] ml = Arrays.copyOfRange(m, 0, i);
int nl = m[i];
for (int j=0; j<ml.length; j++) {
cl = Math.min(findMinCutCostImpl(ml, nl, j), cl);
}
}
if (i < m.length - 1) {
cr = Integer.MAX_VALUE;
int[] mr = Arrays.copyOfRange(m, i + 1, m.length);
int nr = n - m[i];
for (int j=0; j<mr.length; j++) {
mr[j] = mr[j] - m[i];
}
for (int j=0; j<mr.length; j++) {
cr = Math.min(findMinCutCostImpl(mr, nr, j), cr);
}
}
return n + cl + cr;
}
For example :
int n = 20;
int[] m = new int[] { 10, 3 };
System.out.println(findMinCutCost(m, n));
Will print 30
** Edit **
I have implemented two other methods to answer the problem in the question.
1. Median cut approximation
This method cut recursively always the biggest chunks. The results are not always the best solution, but offers a not negligible gain (in the order of +100000% gain from my tests) for a negligible minimal cut loss difference from the best cost.
public int findMinCutCost2(int[] m, int n) {
if (m.length == 0) return 0;
if (m.length == 1) return n;
float half = n/2f;
int bestIndex = 0;
for (int i=1; i<m.length; i++) {
if (Math.abs(half - m[bestIndex]) > Math.abs(half - m[i])) {
bestIndex = i;
}
}
int cl = 0, cr = 0;
if (bestIndex > 0) {
int[] ml = Arrays.copyOfRange(m, 0, bestIndex);
int nl = m[bestIndex];
cl = findMinCutCost2(ml, nl);
}
if (bestIndex < m.length - 1) {
int[] mr = Arrays.copyOfRange(m, bestIndex + 1, m.length);
int nr = n - m[bestIndex];
for (int j=0; j<mr.length; j++) {
mr[j] = mr[j] - m[bestIndex];
}
cr = findMinCutCost2(mr, nr);
}
return n + cl + cr;
}
2. A constant time multi-cut
Instead of calculating the minimal cost, just use different indices and buffers. Since this method executes in a constant time, it always returns n. Plus, the method actually split the string in substrings.
public int findMinCutCost3(int[] m, int n) {
char[][] charArr = new char[m.length+1][];
charArr[0] = new char[m[0]];
for (int i=0, j=0, k=0; j<n; j++) {
//charArr[i][k++] = string[j]; // string is the actual string to split
if (i < m.length && j == m[i]) {
if (++i >= m.length) {
charArr[i] = new char[n - m[i-1]];
} else {
charArr[i] = new char[m[i] - m[i-1]];
}
k=0;
}
}
return n;
}
Note: that this last method could easily be modified to accept a String str argument instead of n and set n = str.length(), and return a String[] array from charArr[][].
For dynamic programming, I claim that all you really need to know is what the state space should be - how to represent partial problems.
Here we are dividing a string up into m+1 pieces by creating new breaks. I claim that a good state space is a set of (a, b) pairs, where a is the location of the start of a substring and b is the location of the end of the same substring, counted as number of breaks in the final broken down string. The cost associated with each pair is the minimum cost of breaking it up. If b <= a + 1, then the cost is 0, because there are no more breaks to put in. If b is larger, then the possible locations for the next break in that substring are the points a+1, a+2,... b-1. The next break is going to cost b-a regardless of where we put it, but if we put it at position k the minimum cost of later breaks is (a, k) + (k, b).
So to solve this with dynamic programming, build up a table (a, b) of minimum costs, where you can work out the cost of breaks on strings with k sections by considering k - 1 possible breaks and then looking up the costs of strings with at most k - 1 sections.
One way to expand on this would be to start by creating a table T[a, b] and setting all entries in that table to infinity. Then go over the table again and where b <= a+1 put T[a,b] = 0. This fills in entries representing sections of the original string which need no further cuts. Now scan through the table and for each T[a,b] with b > a + 1 consider every possible k such that a < k < b and if min_k ((length between breaks a and b) + T[a,k] + T[k,b]) < T[a,b] set T[a,b] to that minimum value. This recognizes where you now know a way to chop up the substrings represented by T[a,k] and T[k,b] cheaply, so this gives you a better way to chop up T[a,b]. If you now repeat this m times you are done - use a standard dynamic programming backtrack to work out the solution. It might help if you save the best value of k for each T[a,b] in a separate table.
python code:
mincost(n, cut_list) =min { n+ mincost(k,left_cut_list) + min(n-k, right_cut_list) }
import sys
def splitstr(n,cut_list):
if len(cut_list) == 0:
return [0,[]]
min_positions = []
min_cost = sys.maxint
for k in cut_list:
left_split = [ x for x in cut_list if x < k]
right_split = [ x-k for x in cut_list if x > k]
#print n,k, left_split, right_split
lcost = splitstr(k,left_split)
rcost = splitstr(n-k,right_split)
cost = n+lcost[0] + rcost[0]
positions = [k] + lcost[1]+ [x+k for x in rcost[1]]
#print "cost:", cost, " min: ", positions
if cost < min_cost:
min_cost = cost
min_positions = positions
return ( min_cost, min_positions)
print splitstr(20,[3,10,16]) # (40, [10, 3, 16])
print splitstr(20,[3,10]) # (30, [10, 3])
print splitstr(5,[1,2,3,4,5]) # (13, [2, 1, 3, 4, 5])
print splitstr(1,[1]) # (1, [1]) # m cuts m+1 substrings
Here is a c++ implementation. Its an O(n^3) Implementation using D.P . Assuming that the cut array is sorted . If it is not it takes O(n^3) time to sort it hence asymptotic time complexity remains same.
#include <iostream>
#include <string.h>
#include <stdio.h>
#include <limits.h>
using namespace std;
int main(){
int i,j,gap,k,l,m,n;
while(scanf("%d%d",&n,&k)!=EOF){
int a[n+1][n+1];
int cut[k];
memset(a,0,sizeof(a));
for(i=0;i<k;i++)
cin >> cut[i];
for(gap=1;gap<=n;gap++){
for(i=0,j=i+gap;j<=n;j++,i++){
if(gap==1)
a[i][j]=0;
else{
int min = INT_MAX;
for(m=0;m<k;m++){
if(cut[m]<j and cut[m] >i){
int cost=(j-i)+a[i][cut[m]]+a[cut[m]][j];
if(cost<min)
min=cost;
}
}
if(min>=INT_MAX)
a[i][j]=0;
else
a[i][j]=min;
}
}
}
cout << a[0][n] << endl;
}
return 0;
}

3-PARTITION problem

here is another dynamic programming question (Vazirani ch6)
Consider the following 3-PARTITION
problem. Given integers a1...an, we
want to determine whether it is
possible to partition of {1...n} into
three disjoint subsets I, J, K such
that
sum(I) = sum(J) = sum(K) = 1/3*sum(ALL)
For example, for input (1; 2; 3; 4; 4;
5; 8) the answer is yes, because there
is the partition (1; 8), (4; 5), (2;
3; 4). On the other hand, for input
(2; 2; 3; 5) the answer is no. Devise
and analyze a dynamic programming
algorithm for 3-PARTITION that runs in
time poly- nomial in n and (Sum a_i)
How can I solve this problem? I know 2-partition but still can't solve it
It's easy to generalize 2-sets solution for 3-sets case.
In original version, you create array of boolean sums where sums[i] tells whether sum i can be reached with numbers from the set, or not. Then, once array is created, you just see if sums[TOTAL/2] is true or not.
Since you said you know old version already, I'll describe only difference between them.
In 3-partition case, you keep array of boolean sums, where sums[i][j] tells whether first set can have sum i and second - sum j. Then, once array is created, you just see if sums[TOTAL/3][TOTAL/3] is true or not.
If original complexity is O(TOTAL*n), here it's O(TOTAL^2*n).
It may not be polynomial in the strictest sense of the word, but then original version isn't strictly polynomial too :)
I think by reduction it goes like this:
Reducing 2-partition to 3-partition:
Let S be the original set, and A be its total sum, then let S'=union({A/2},S).
Hence, perform a 3-partition on the set S' yields three sets X, Y, Z.
Among X, Y, Z, one of them must be {A/2}, say it's set Z, then X and Y is a 2-partition.
The witnesses of 3-partition on S' is the witnesses of 2-partition on S, thus 2-partition reduces to 3-partition.
If this problem is to be solvable; then sum(ALL)/3 must be an integer. Any solution must have SUM(J) + SUM(K) = SUM(I) + sum(ALL)/3. This represents a solution to the 2-partition problem over concat(ALL, {sum(ALL)/3}).
You say you have a 2-partition implementation: use it to solve that problem. Then (at least) one of the two partitions will contain the number sum(ALL)/3 - remove the number from that partion, and you've found I. For the other partition, run 2-partition again, to split J from K; after all, J and K must be equal in sum themselves.
Edit: This solution is probably incorrect - the 2-partition of the concatenated set will have several solutions (at least one for each of I, J, K) - however, if there are other solutions, then the "other side" may not consist of the union of two of I, J, K, and may not be splittable at all. You'll need to actually think, I fear :-).
Try 2: Iterate over the multiset, maintaining the following map: R(i,j,k) :: Boolean which represents the fact whether up to the current iteration the numbers permit division into three multisets that have sums i, j, k. I.e., for any R(i,j,k) and next number n in the next state R' it holds that R'(i+n,j,k) and R'(i,j+n,k) and R'(i,j,k+n). Note that the complexity (as per the excersize) depends on the magnitude of the input numbers; this is a pseudo-polynomialtime algorithm. Nikita's solution is conceptually similar but more efficient than this solution since it doesn't track the third set's sum: that's unnecessary since you can trivially compute it.
As I have answered in same another question like this, the C++ implementation would look something like this:
int partition3(vector<int> &A)
{
int sum = accumulate(A.begin(), A.end(), 0);
if (sum % 3 != 0)
{
return false;
}
int size = A.size();
vector<vector<int>> dp(sum + 1, vector<int>(sum + 1, 0));
dp[0][0] = true;
// process the numbers one by one
for (int i = 0; i < size; i++)
{
for (int j = sum; j >= 0; --j)
{
for (int k = sum; k >= 0; --k)
{
if (dp[j][k])
{
dp[j + A[i]][k] = true;
dp[j][k + A[i]] = true;
}
}
}
}
return dp[sum / 3][sum / 3];
}
Let's say you want to partition the set $X = {x_1, ..., x_n}$ in $k$ partitions.
Create a $ n \times k $ table. Assume the cost $M[i,j]$ be the maximum sum of $i$ elements in $j$ partitions. Just recursively use the following optimality criterion to fill it:
M[n,k] = min_{i\leq n} max ( M[i, k-1], \sum_{j=i+1}^{n} x_i )
Using these initial values for the table:
M[i,1] = \sum_{j=1}^{i} x_i and M[1,j] = x_j
The running time is $O(kn^2)$ (polynomial )
Create a three dimensional array, where size is count of elements, and part is equal to to sum of all elements divided by 3. So each cell of array[seq][sum1][sum2] tells can you create sum1 and sum2 using max seq elements from given array A[] or not. So compute all values of array, result will be in cell array[using all elements][sum of all element / 3][sum of all elements / 3], if you can create two sets without crossing equal to sum/3, there will be third set.
Logic of checking: exlude A[seq] element to third sum(not stored), check cell without element if it has same two sums; OR include to sum1 - if it is possible to get two sets without seq element, where sum1 is smaller by value of element seq A[seq], and sum2 isn't changed; OR include to sum2 check like previous.
int partition3(vector<int> &A)
{
int part=0;
for (int a : A)
part += a;
if (part%3)
return 0;
int size = A.size()+1;
part = part/3+1;
bool array[size][part][part];
//sequence from 0 integers inside to all inside
for(int seq=0; seq<size; seq++)
for(int sum1=0; sum1<part; sum1++)
for(int sum2=0;sum2<part; sum2++) {
bool curRes;
if (seq==0)
if (sum1 == 0 && sum2 == 0)
curRes = true;
else
curRes= false;
else {
int curInSeq = seq-1;
bool excludeFrom = array[seq-1][sum1][sum2];
bool includeToSum1 = (sum1>=A[curInSeq]
&& array[seq-1][sum1-A[curInSeq]][sum2]);
bool includeToSum2 = (sum2>=A[curInSeq]
&& array[seq-1][sum1][sum2-A[curInSeq]]);
curRes = excludeFrom || includeToSum1 || includeToSum2;
}
array[seq][sum1][sum2] = curRes;
}
int result = array[size-1][part-1][part-1];
return result;
}
Another example in C++ (based on the previous answers):
bool partition3(vector<int> const &A) {
int sum = 0;
for (int i = 0; i < A.size(); i++) {
sum += A[i];
}
if (sum % 3 != 0) {
return false;
}
vector<vector<vector<int>>> E(A.size() + 1, vector<vector<int>>(sum / 3 + 1, vector<int>(sum / 3 + 1, 0)));
for (int i = 1; i <= A.size(); i++) {
for (int j = 0; j <= sum / 3; j++) {
for (int k = 0; k <= sum / 3; k++) {
E[i][j][k] = E[i - 1][j][k];
if (A[i - 1] <= k) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j][k - A[i - 1]] + A[i - 1]);
}
if (A[i - 1] <= j) {
E[i][j][k] = max(E[i][j][k], E[i - 1][j - A[i - 1]][k] + A[i - 1]);
}
}
}
}
return (E.back().back().back() / 2 == sum / 3);
}
You really want Korf's Complete Karmarkar-Karp algorithm (http://ac.els-cdn.com/S0004370298000861/1-s2.0-S0004370298000861-main.pdf, http://ijcai.org/papers09/Papers/IJCAI09-096.pdf). A generalization to three-partitioning is given. The algorithm is surprisingly fast given the complexity of the problem, but requires some implementation.
The essential idea of KK is to ensure that large blocks of similar size appear in different partitions. One groups pairs of blocks, which can then be treated as a smaller block of size equal to the difference in sizes that can be placed as normal: by doing this recursively, one ends up with small blocks that are easy to place. One then does a two-coloring of the block groups to ensure that the opposite placements are handled. The extension to 3-partition is a bit complicated. The Korf extension is to use depth-first search in KK order to find all possible solutions or to find a solution quickly.

How to find the kth largest element in an unsorted array of length n in O(n)?

I believe there's a way to find the kth largest element in an unsorted array of length n in O(n). Or perhaps it's "expected" O(n) or something. How can we do this?
This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.
Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):
Select(A,n,i):
Divide input into ⌈n/5⌉ groups of size 5.
/* Partition on median-of-medians */
medians = array of each group’s median.
pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
Left Array L and Right Array G = partition(A, pivot)
/* Find ith element in L, pivot, or G */
k = |L| + 1
If i = k, return pivot
If i < k, return Select(L, k-1, i)
If i > k, return Select(G, n-k, i-k)
It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.
If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)
The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.
Here is the algorithm.
QuickSelect(A, k)
let r be chosen uniformly at random in the range 1 to length(A)
let pivot = A[r]
let A1, A2 be new arrays
# split into a pile A1 of small elements and A2 of big elements
for i = 1 to n
if A[i] < pivot then
append A[i] to A1
else if A[i] > pivot then
append A[i] to A2
else
# do nothing
end for
if k <= length(A1):
# it's in the pile of small elements
return QuickSelect(A1, k)
else if k > length(A) - length(A2)
# it's in the pile of big elements
return QuickSelect(A2, k - (length(A) - length(A2))
else
# it's equal to the pivot
return pivot
What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of
T(n) = Theta(n) + T(n-1) = Theta(n2)
But if the choices are indeed random, the expected running time is given by
T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))
where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.
Let's guess that T(n) <= an for some a. Then we get
T(n)
<= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
= cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n ai
and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:
∑i=floor(n/2) to n i
= ∑i=1 to n i - ∑i=1 to floor(n/2) i
= n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2
<= n2/2 - (n/4)2/2
= (15/32)n2
where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with
cn + 2 (1/n) ∑i=floor(n/2) to n ai,
<= cn + (2a/n) (15/32) n2
= n (c + (15/16)a)
<= an
provided a > 16c.
This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).
A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17
"Make one pass through tracking the three largest values so far."
(it was specifically for 3d largest)
and this answer:
Build a heap/priority queue. O(n)
Pop top element. O(log n)
Pop top element. O(log n)
Pop top element. O(log n)
Total = O(n) + 3 O(log n) = O(n)
You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).
A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.
I answered the letter of your question :)
The C++ standard library has almost exactly that function call nth_element, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.
const int N = ...;
double a[N];
// ...
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a
You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen.
For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.
Warren's priority heap solution is neater though.
Although not very sure about O(n) complexity, but it will be sure to be between O(n) and nLog(n). Also sure to be closer to O(n) than nLog(n). Function is written in Java
public int quickSelect(ArrayList<Integer>list, int nthSmallest){
//Choose random number in range of 0 to array length
Random random = new Random();
//This will give random number which is not greater than length - 1
int pivotIndex = random.nextInt(list.size() - 1);
int pivot = list.get(pivotIndex);
ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();
//Split list into two.
//Value smaller than pivot should go to smallerNumberList
//Value greater than pivot should go to greaterNumberList
//Do nothing for value which is equal to pivot
for(int i=0; i<list.size(); i++){
if(list.get(i)<pivot){
smallerNumberList.add(list.get(i));
}
else if(list.get(i)>pivot){
greaterNumberList.add(list.get(i));
}
else{
//Do nothing
}
}
//If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list
if(nthSmallest < smallerNumberList.size()){
return quickSelect(smallerNumberList, nthSmallest);
}
//If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
//The step is bit tricky. If confusing, please see the above loop once again for clarification.
else if(nthSmallest > (list.size() - greaterNumberList.size())){
//nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in
//smallerNumberList
nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
return quickSelect(greaterNumberList,nthSmallest);
}
else{
return pivot;
}
}
I implemented finding kth minimimum in n unsorted elements using dynamic programming, specifically tournament method. The execution time is O(n + klog(n)). The mechanism used is listed as one of methods on Wikipedia page about Selection Algorithm (as indicated in one of the posting above). You can read about the algorithm and also find code (java) on my blog page Finding Kth Minimum. In addition the logic can do partial ordering of the list - return first K min (or max) in O(klog(n)) time.
Though the code provided result kth minimum, similar logic can be employed to find kth maximum in O(klog(n)), ignoring the pre-work done to create tournament tree.
Sexy quickselect in Python
def quickselect(arr, k):
'''
k = 1 returns first element in ascending order.
can be easily modified to return first element in descending order
'''
r = random.randrange(0, len(arr))
a1 = [i for i in arr if i < arr[r]] '''partition'''
a2 = [i for i in arr if i > arr[r]]
if k <= len(a1):
return quickselect(a1, k)
elif k > len(arr)-len(a2):
return quickselect(a2, k - (len(arr) - len(a2)))
else:
return arr[r]
As per this paper Finding the Kth largest item in a list of n items the following algorithm will take O(n) time in worst case.
Divide the array in to n/5 lists of 5 elements each.
Find the median in each sub array of 5 elements.
Recursively find the median of all the medians, lets call it M
Partition the array in to two sub array 1st sub-array contains the elements larger than M , lets say this sub-array is a1 , while other sub-array contains the elements smaller then M., lets call this sub-array a2.
If k <= |a1|, return selection (a1,k).
If k− 1 = |a1|, return M.
If k> |a1| + 1, return selection(a2,k −a1 − 1).
Analysis: As suggested in the original paper:
We use the median to partition the list into two halves(the first half,
if k <= n/2 , and the second half otherwise). This algorithm takes
time cn at the first level of recursion for some constant c, cn/2 at
the next level (since we recurse in a list of size n/2), cn/4 at the
third level, and so on. The total time taken is cn + cn/2 + cn/4 +
.... = 2cn = o(n).
Why partition size is taken 5 and not 3?
As mentioned in original paper:
Dividing the list by 5 assures a worst-case split of 70 − 30. Atleast
half of the medians greater than the median-of-medians, hence atleast
half of the n/5 blocks have atleast 3 elements and this gives a
3n/10 split, which means the other partition is 7n/10 in worst case.
That gives T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1, the
worst-case running time isO(n).
Now I have tried to implement the above algorithm as:
public static int findKthLargestUsingMedian(Integer[] array, int k) {
// Step 1: Divide the list into n/5 lists of 5 element each.
int noOfRequiredLists = (int) Math.ceil(array.length / 5.0);
// Step 2: Find pivotal element aka median of medians.
int medianOfMedian = findMedianOfMedians(array, noOfRequiredLists);
//Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian.
List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian
List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian
for (Integer element : array) {
if (element < medianOfMedian) {
listWithSmallerNumbers.add(element);
} else if (element > medianOfMedian) {
listWithGreaterNumbers.add(element);
}
}
// Next step.
if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k);
else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian;
else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1);
return -1;
}
public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) {
int[] medians = new int[noOfRequiredLists];
for (int count = 0; count < noOfRequiredLists; count++) {
int startOfPartialArray = 5 * count;
int endOfPartialArray = startOfPartialArray + 5;
Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray);
// Step 2: Find median of each of these sublists.
int medianIndex = partialArray.length/2;
medians[count] = partialArray[medianIndex];
}
// Step 3: Find median of the medians.
return medians[medians.length / 2];
}
Just for sake of completion, another algorithm makes use of Priority Queue and takes time O(nlogn).
public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) {
int p = 0;
int numElements = nums.length;
// create priority queue where all the elements of nums will be stored
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
// place all the elements of the array to this priority queue
for (int n : nums) {
pq.add(n);
}
// extract the kth largest element
while (numElements - k + 1 > 0) {
p = pq.poll();
k++;
}
return p;
}
Both of these algorithms can be tested as:
public static void main(String[] args) throws IOException {
Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
System.out.println(findKthLargestUsingMedian(numbers, 8));
System.out.println(findKthLargestUsingPriorityQueue(numbers, 8));
}
As expected output is:
18
18
Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies,
Now recurrence becomes:
T(n) = T(n/2) + cn
which gives me O (n) overal.
Below is the link to full implementation with quite an extensive explanation how the algorithm for finding Kth element in an unsorted algorithm works. Basic idea is to partition the array like in QuickSort. But in order to avoid extreme cases (e.g. when smallest element is chosen as pivot in every step, so that algorithm degenerates into O(n^2) running time), special pivot selection is applied, called median-of-medians algorithm. The whole solution runs in O(n) time in worst and in average case.
Here is link to the full article (it is about finding Kth smallest element, but the principle is the same for finding Kth largest):
Finding Kth Smallest Element in an Unsorted Array
How about this kinda approach
Maintain a buffer of length k and a tmp_max, getting tmp_max is O(k) and is done n times so something like O(kn)
Is it right or am i missing something ?
Although it doesn't beat average case of quickselect and worst case of median statistics method but its pretty easy to understand and implement.
There is also one algorithm, that outperforms quickselect algorithm. It's called Floyd-Rivets (FR) algorithm.
Original article: https://doi.org/10.1145/360680.360694
Downloadable version: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf
Wikipedia article https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm
I tried to implement quickselect and FR algorithm in C++. Also I compared them to the standard C++ library implementations std::nth_element (which is basically introselect hybrid of quickselect and heapselect). The result was quickselect and nth_element ran comparably on average, but FR algorithm ran approx. twice as fast compared to them.
Sample code that I used for FR algorithm:
template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
if (n == 0)
return *(std::min_element(data.begin(), data.end()));
else if (n == data.size() - 1)
return *(std::max_element(data.begin(), data.end()));
else
return _FRselect(data, 0, data.size() - 1, n);
}
template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
size_t leftIdx = left;
size_t rightIdx = right;
while (rightIdx > leftIdx)
{
if (rightIdx - leftIdx > 600)
{
size_t range = rightIdx - leftIdx + 1;
long long i = n - (long long)leftIdx + 1;
long long z = log(range);
long long s = 0.5 * exp(2 * z / 3);
long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);
size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);
_FRselect(data, newLeft, newRight, n);
}
T t = data[n];
size_t i = leftIdx;
size_t j = rightIdx;
// arrange pivot and right index
std::swap(data[leftIdx], data[n]);
if (data[rightIdx] > t)
std::swap(data[rightIdx], data[leftIdx]);
while (i < j)
{
std::swap(data[i], data[j]);
++i; --j;
while (data[i] < t) ++i;
while (data[j] > t) --j;
}
if (data[leftIdx] == t)
std::swap(data[leftIdx], data[j]);
else
{
++j;
std::swap(data[j], data[rightIdx]);
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= n)
leftIdx = j + 1;
if (n <= j)
rightIdx = j - 1;
}
return data[leftIdx];
}
template <typename T>
int sgn(T val) {
return (T(0) < val) - (val < T(0));
}
iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)
i would like to suggest one answer
if we take the first k elements and sort them into a linked list of k values
now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)
cheers
Explanation of the median - of - medians algorithm to find the k-th largest integer out of n can be found here:
http://cs.indstate.edu/~spitla/presentation.pdf
Implementation in c++ is below:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int findMedian(vector<int> vec){
// Find median of a vector
int median;
size_t size = vec.size();
median = vec[(size/2)];
return median;
}
int findMedianOfMedians(vector<vector<int> > values){
vector<int> medians;
for (int i = 0; i < values.size(); i++) {
int m = findMedian(values[i]);
medians.push_back(m);
}
return findMedian(medians);
}
void selectionByMedianOfMedians(const vector<int> values, int k){
// Divide the list into n/5 lists of 5 elements each
vector<vector<int> > vec2D;
int count = 0;
while (count != values.size()) {
int countRow = 0;
vector<int> row;
while ((countRow < 5) && (count < values.size())) {
row.push_back(values[count]);
count++;
countRow++;
}
vec2D.push_back(row);
}
cout<<endl<<endl<<"Printing 2D vector : "<<endl;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
cout<<vec2D[i][j]<<" ";
}
cout<<endl;
}
cout<<endl;
// Calculating a new pivot for making splits
int m = findMedianOfMedians(vec2D);
cout<<"Median of medians is : "<<m<<endl;
// Partition the list into unique elements larger than 'm' (call this sublist L1) and
// those smaller them 'm' (call this sublist L2)
vector<int> L1, L2;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
if (vec2D[i][j] > m) {
L1.push_back(vec2D[i][j]);
}else if (vec2D[i][j] < m){
L2.push_back(vec2D[i][j]);
}
}
}
// Checking the splits as per the new pivot 'm'
cout<<endl<<"Printing L1 : "<<endl;
for (int i = 0; i < L1.size(); i++) {
cout<<L1[i]<<" ";
}
cout<<endl<<endl<<"Printing L2 : "<<endl;
for (int i = 0; i < L2.size(); i++) {
cout<<L2[i]<<" ";
}
// Recursive calls
if ((k - 1) == L1.size()) {
cout<<endl<<endl<<"Answer :"<<m;
}else if (k <= L1.size()) {
return selectionByMedianOfMedians(L1, k);
}else if (k > (L1.size() + 1)){
return selectionByMedianOfMedians(L2, k-((int)L1.size())-1);
}
}
int main()
{
int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
vector<int> vec(values, values + 25);
cout<<"The given array is : "<<endl;
for (int i = 0; i < vec.size(); i++) {
cout<<vec[i]<<" ";
}
selectionByMedianOfMedians(vec, 8);
return 0;
}
There is also Wirth's selection algorithm, which has a simpler implementation than QuickSelect. Wirth's selection algorithm is slower than QuickSelect, but with some improvements it becomes faster.
In more detail. Using Vladimir Zabrodsky's MODIFIND optimization and the median-of-3 pivot selection and paying some attention to the final steps of the partitioning part of the algorithm, i've came up with the following algorithm (imaginably named "LefSelect"):
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
int l=0, m = n-1, i=l, j=m;
float x;
while (l<m) {
if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
if( a[j] < a[k] ) F_SWAP(a[k],a[j]);
x=a[k];
while (j>k & i<k) {
do i++; while (a[i]<x);
do j--; while (a[j]>x);
F_SWAP(a[i],a[j]);
}
i++; j--;
if (j<k) {
while (a[i]<x) i++;
l=i; j=m;
}
if (k<i) {
while (x<a[j]) j--;
m=j; i=l;
}
}
return a[k];
}
In benchmarks that i did here, LefSelect is 20-30% faster than QuickSelect.
Haskell Solution:
kthElem index list = sort list !! index
withShape ~[] [] = []
withShape ~(x:xs) (y:ys) = x : withShape xs ys
sort [] = []
sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs)
where
ls = filter (< x)
rs = filter (>= x)
This implements the median of median solutions by using the withShape method to discover the size of a partition without actually computing it.
Here is a C++ implementation of Randomized QuickSelect. The idea is to randomly pick a pivot element. To implement randomized partition, we use a random function, rand() to generate index between l and r, swap the element at randomly generated index with the last element, and finally call the standard partition process which uses last element as pivot.
#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;
int randomPartition(int arr[], int l, int r);
// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method. ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
// If k is smaller than number of elements in array
if (k > 0 && k <= r - l + 1)
{
// Partition the array around a random element and
// get position of pivot element in sorted array
int pos = randomPartition(arr, l, r);
// If position is same as k
if (pos-l == k-1)
return arr[pos];
if (pos-l > k-1) // If position is more, recur for left subarray
return kthSmallest(arr, l, pos-1, k);
// Else recur for right subarray
return kthSmallest(arr, pos+1, r, k-pos+l-1);
}
// If k is more than number of elements in array
return INT_MAX;
}
void swap(int *a, int *b)
{
int temp = *a;
*a = *b;
*b = temp;
}
// Standard partition process of QuickSort(). It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
int x = arr[r], i = l;
for (int j = l; j <= r - 1; j++)
{
if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
{
swap(&arr[i], &arr[j]);
i++;
}
}
swap(&arr[i], &arr[r]); // swap the pivot
return i;
}
// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
int n = r-l+1;
int pivot = rand() % n;
swap(&arr[l + pivot], &arr[r]);
return partition(arr, l, r);
}
// Driver program to test above methods
int main()
{
int arr[] = {12, 3, 5, 7, 4, 19, 26};
int n = sizeof(arr)/sizeof(arr[0]), k = 3;
cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
return 0;
}
The worst case time complexity of the above solution is still O(n2).In worst case, the randomized function may always pick a corner element. The expected time complexity of above randomized QuickSelect is Θ(n)
Have Priority queue created.
Insert all the elements into heap.
Call poll() k times.
public static int getKthLargestElements(int[] arr)
{
PriorityQueue<Integer> pq = new PriorityQueue<>((x , y) -> (y-x));
//insert all the elements into heap
for(int ele : arr)
pq.offer(ele);
// call poll() k times
int i=0;
while(i<k)
{
int result = pq.poll();
}
return result;
}
This is an implementation in Javascript.
If you release the constraint that you cannot modify the array, you can prevent the use of extra memory using two indexes to identify the "current partition" (in classic quicksort style - http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/).
function kthMax(a, k){
var size = a.length;
var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2)
//Create an array with all element lower than the pivot and an array with all element higher than the pivot
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
lowerArray.push(current);
} else if (current > pivot) {
upperArray.push(current);
}
}
//Which one should I continue with?
if(k <= upperArray.length) {
//Upper
return kthMax(upperArray, k);
} else {
var newK = k - (size - lowerArray.length);
if (newK > 0) {
///Lower
return kthMax(lowerArray, newK);
} else {
//None ... it's the current pivot!
return pivot;
}
}
}
If you want to test how it perform, you can use this variation:
function kthMax (a, k, logging) {
var comparisonCount = 0; //Number of comparison that the algorithm uses
var memoryCount = 0; //Number of integers in memory that the algorithm uses
var _log = logging;
if(k < 0 || k >= a.length) {
if (_log) console.log ("k is out of range");
return false;
}
function _kthmax(a, k){
var size = a.length;
var pivot = a[parseInt(Math.random()*size)];
if(_log) console.log("Inputs:", a, "size="+size, "k="+k, "pivot="+pivot);
// This should never happen. Just a nice check in this exercise
// if you are playing with the code to avoid never ending recursion
if(typeof pivot === "undefined") {
if (_log) console.log ("Ops...");
return false;
}
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
comparisonCount += 1;
memoryCount++;
lowerArray.push(current);
} else if (current > pivot) {
comparisonCount += 2;
memoryCount++;
upperArray.push(current);
}
}
if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);
if(k <= upperArray.length) {
comparisonCount += 1;
return _kthmax(upperArray, k);
} else if (k > size - lowerArray.length) {
comparisonCount += 2;
return _kthmax(lowerArray, k - (size - lowerArray.length));
} else {
comparisonCount += 2;
return pivot;
}
/*
* BTW, this is the logic for kthMin if we want to implement that... ;-)
*
if(k <= lowerArray.length) {
return kthMin(lowerArray, k);
} else if (k > size - upperArray.length) {
return kthMin(upperArray, k - (size - upperArray.length));
} else
return pivot;
*/
}
var result = _kthmax(a, k);
return {result: result, iterations: comparisonCount, memory: memoryCount};
}
The rest of the code is just to create some playground:
function getRandomArray (n){
var ar = [];
for (var i = 0, l = n; i < l; i++) {
ar.push(Math.round(Math.random() * l))
}
return ar;
}
//Create a random array of 50 numbers
var ar = getRandomArray (50);
Now, run you tests a few time.
Because of the Math.random() it will produce every time different results:
kthMax(ar, 2, true);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 34, true);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
If you test it a few times you can see even empirically that the number of iterations is, on average, O(n) ~= constant * n and the value of k does not affect the algorithm.
I came up with this algorithm and seems to be O(n):
Let's say k=3 and we want to find the 3rd largest item in the array. I would create three variables and compare each item of the array with the minimum of these three variables. If array item is greater than our minimum, we would replace the min variable with the item value. We continue the same thing until end of the array. The minimum of our three variables is the 3rd largest item in the array.
define variables a=0, b=0, c=0
iterate through the array items
find minimum a,b,c
if item > min then replace the min variable with item value
continue until end of array
the minimum of a,b,c is our answer
And, to find Kth largest item we need K variables.
Example: (k=3)
[1,2,4,1,7,3,9,5,6,2,9,8]
Final variable values:
a=7 (answer)
b=8
c=9
Can someone please review this and let me know what I am missing?
Here is the implementation of the algorithm eladv suggested(I also put here the implementation with random pivot):
public class Median {
public static void main(String[] s) {
int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
System.out.println(selectK(test,8));
/*
int n = 100000000;
int[] test = new int[n];
for(int i=0; i<test.length; i++)
test[i] = (int)(Math.random()*test.length);
long start = System.currentTimeMillis();
random_selectK(test, test.length/2);
long end = System.currentTimeMillis();
System.out.println(end - start);
*/
}
public static int random_selectK(int[] a, int k) {
if(a.length <= 1)
return a[0];
int r = (int)(Math.random() * a.length);
int p = a[r];
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return random_selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return random_selectK(temp,k-small-equal);
}
}
public static int selectK(int[] a, int k) {
if(a.length <= 5) {
Arrays.sort(a);
return a[k-1];
}
int p = median_of_medians(a);
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return selectK(temp,k-small-equal);
}
}
private static int median_of_medians(int[] a) {
int[] b = new int[a.length/5];
int[] temp = new int[5];
for(int i=0; i<b.length; i++) {
for(int j=0; j<5; j++)
temp[j] = a[5*i + j];
Arrays.sort(temp);
b[i] = temp[2];
}
return selectK(b, b.length/2 + 1);
}
}
it is similar to the quickSort strategy, where we pick an arbitrary pivot, and bring the smaller elements to its left, and the larger to the right
public static int kthElInUnsortedList(List<int> list, int k)
{
if (list.Count == 1)
return list[0];
List<int> left = new List<int>();
List<int> right = new List<int>();
int pivotIndex = list.Count / 2;
int pivot = list[pivotIndex]; //arbitrary
for (int i = 0; i < list.Count && i != pivotIndex; i++)
{
int currentEl = list[i];
if (currentEl < pivot)
left.Add(currentEl);
else
right.Add(currentEl);
}
if (k == left.Count + 1)
return pivot;
if (left.Count < k)
return kthElInUnsortedList(right, k - left.Count - 1);
else
return kthElInUnsortedList(left, k);
}
Go to the End of this link : ...........
http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/
You can find the kth smallest element in O(n) time and constant space. If we consider the array is only for integers.
The approach is to do a binary search on the range of Array values. If we have a min_value and a max_value both in integer range, we can do a binary search on that range.
We can write a comparator function which will tell us if any value is the kth-smallest or smaller than kth-smallest or bigger than kth-smallest.
Do the binary search until you reach the kth-smallest number
Here is the code for that
class Solution:
def _iskthsmallest(self, A, val, k):
less_count, equal_count = 0, 0
for i in range(len(A)):
if A[i] == val: equal_count += 1
if A[i] < val: less_count += 1
if less_count >= k: return 1
if less_count + equal_count < k: return -1
return 0
def kthsmallest_binary(self, A, min_val, max_val, k):
if min_val == max_val:
return min_val
mid = (min_val + max_val)/2
iskthsmallest = self._iskthsmallest(A, mid, k)
if iskthsmallest == 0: return mid
if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
return self.kthsmallest_binary(A, mid+1, max_val, k)
# #param A : tuple of integers
# #param B : integer
# #return an integer
def kthsmallest(self, A, k):
if not A: return 0
if k > len(A): return 0
min_val, max_val = min(A), max(A)
return self.kthsmallest_binary(A, min_val, max_val, k)
What I would do is this:
initialize empty doubly linked list l
for each element e in array
if e larger than head(l)
make e the new head of l
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.
Update:
initialize empty sorted tree l
for each element e in array
if e between head(l) and tail(l)
insert e into l // O(log k)
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
First we can build a BST from unsorted array which takes O(n) time and from the BST we can find the kth smallest element in O(log(n)) which over all counts to an order of O(n).

Resources