About the time complexity of Min Max Heap DeleteMin - algorithm

In the for loop of following code snippet about min-max heap delete-min procedure, why last=(*n)/2? Is that because at the worse case that the x has to be inserted to the grandchild of the grandchild of ..., and for example: tree height 5: min max min max min, floor(5/2)=2, correct since at worse case only two mins following the first level. Now another one: height 4: min max min max, floor(4/2)=2, this time it doesn't work. I think maybe last=(*n) will work, and even for(i=1;;) will work, since it just check something that won't happen? The reason of the title is that IIRC the time complexity of min-max heap deletion is
O(log n) but the (*n)/2 make it looks, hmm, like O(n).
element delete_min(element heap[], int *n) {
int i, last, k, parent;
element temp, x;
if (!(*n)) {
// checking
}
heap[0] = heap[1];
x = heap[(*n)--];
for (i=1, last=(*n)/2; i<=last; ) {
k = min_child_grandchild(i, *n);
if (x.key <= heap[k].key)
break;
heap[i] = heap[k];
if (k <= 2*i+1) {
i = k;
break;
}
parent = k/2;
if (x.key > heap[parent].key)
SWAP(heap[parent], x, temp);
i = k;
} // end for
heap[i] = x;
return heap[0];
}
source:

If you iterate linearly over a range of size n, then the loop executes O(n) times. Linearly here means that you have some loop index which is modified each time through the loop by adding a constant, typically but not necessarily 1:
for(i = 0; i < n; i += c) { /* Loop executes O(n) times */ }
But that's not what is happening here. i is not being incremented by a constant. It is being set to k, where k is the index of some child or grandchild of i. The child of i whose index is the smallest is at index 2i; the grandchild whose index is the smallest is at index 4i. (Those formula are for 1-based indices, which are common in algorithm descriptions.)
So the loop is more like this (where the constant c is usually 4 but never less than 2):
for(i = 0; i < n; i *= c) { /* Loop executes O(log n) times */ }

Related

How to find minimum positive contiguous sub sequence in O(n) time?

We have this algorithm for finding maximum positive sub sequence in given sequence in O(n) time. Can anybody suggest similar algorithm for finding minimum positive contiguous sub sequence.
For example
If given sequence is 1,2,3,4,5 answer should be 1.
[5,-4,3,5,4] ->1 is the minimum positive sum of elements [5,-4].
There cannot be such algorithm. The lower bound for this problem is O(n log n). I'll prove it by reducing the element distinctness problem to it (actually to the non-negative variant of it).
Let's suppose we have an O(n) algorithm for this problem (the minimum non-negative subarray).
We want to find out if an array (e.g. A=[1, 2, -3, 4, 2]) has only distinct elements. To solve this problem, I could construct an array with the difference between consecutive elements (e.g. A'=[1, -5, 7, -2]) and run the O(n) algorithm we have. The original array only has distinct elements if and only if the minimum non-negative subarray is greater than 0.
If we had an O(n) algorithm to your problem, we would have an O(n) algorithm to element distinctness problem, which we know is not possible on a Turing machine.
We can have a O(n log n) algorithm as follow:
Assuming that we have an array prefix, which index i stores the sum of array A from 0 to i, so the sum of sub-array (i, j) is prefix[j] - prefix[i - 1].
Thus, in order to find the minimum positive sub-array ending at index j, so, we need to find the maximum element prefix[x], which less than prefix[j] and x < j. We can find that element in O(log n) time if we use a binary search tree.
Pseudo code:
int[]prefix = new int[A.length];
prefix[0] = A[0];
for(int i = 1; i < A.length; i++)
prefix[i] = A[i] + prefix[i - 1];
int result = MAX_VALUE;
BinarySearchTree tree;
for(int i = 0; i < A.length; i++){
if(A[i] > 0)
result = min(result, A[i];
int v = tree.getMaximumElementLessThan(prefix[i]);
result = min(result, prefix[i] - v);
tree.add(prefix[i]);
}
I believe there's a O(n) algorithm, see below.
Note: it has a scale factor that might make it less attractive in practical applications: it depends on the (input) values to be processed, see remarks in the code.
private int GetMinimumPositiveContiguousSubsequenc(List<Int32> values)
{
// Note: this method has no precautions against integer over/underflow, which may occur
// if large (abs) values are present in the input-list.
// There must be at least 1 item.
if (values == null || values.Count == 0)
throw new ArgumentException("There must be at least one item provided to this method.");
// 1. Scan once to:
// a) Get the mimumum positive element;
// b) Get the value of the MAX contiguous sequence
// c) Get the value of the MIN contiguous sequence - allowing negative values: the mirror of the MAX contiguous sequence.
// d) Pinpoint the (index of the) first negative value.
int minPositive = 0;
int maxSequence = 0;
int currentMaxSequence = 0;
int minSequence = 0;
int currentMinSequence = 0;
int indxFirstNegative = -1;
for (int k = 0; k < values.Count; k++)
{
int value = values[k];
if (value > 0)
if (minPositive == 0 || value < minPositive)
minPositive = value;
else if (indxFirstNegative == -1 && value < 0)
indxFirstNegative = k;
currentMaxSequence += value;
if (currentMaxSequence <= 0)
currentMaxSequence = 0;
else if (currentMaxSequence > maxSequence)
maxSequence = currentMaxSequence;
currentMinSequence += value;
if (currentMinSequence >= 0)
currentMinSequence = 0;
else if (currentMinSequence < minSequence)
minSequence = currentMinSequence;
}
// 2. We're done if (a) there are no negatives, or (b) the minPositive (single) value is 1 (or 0...).
if (minSequence == 0 || minPositive <= 1)
return minPositive;
// 3. Real work to do.
// The strategy is as follows, iterating over the input values:
// a) Keep track of the cumulative value of ALL items - the sequence that starts with the very first item.
// b) Register each such cumulative value as "existing" in a bool array 'initialSequence' as we go along.
// We know already the max/min contiguous sequence values, so we can properly size that array in advance.
// Since negative sequence values occur we'll have an offset to match the index in that bool array
// with the corresponding value of the initial sequence.
// c) For each next input value to process scan the "initialSequence" bool array to see whether relevant entries are TRUE.
// We don't need to go over the complete array, as we're only interested in entries that would produce a subsequence with
// a value that is positive and also smaller than best-so-far.
// (As we go along, the range to check will normally shrink as we get better and better results.
// Also: initially the range is already limited by the single-minimum-positive value that we have found.)
// Performance-wise this approach (which is O(n)) is suitable IFF the number of input values is large (or at least: not small) relative to
// the spread between maxSequence and minSeqence: the latter two define the size of the array in which we will do (partial) linear traversals.
// If this condition is not met it may be more efficient to replace the bool array by a (binary) search tree.
// (which will result in O(n logn) performance).
// Since we know the relevant parameters at this point, we may below have the two strategies both implemented and decide run-time
// which to choose.
// The current implementation has only the fixed bool array approach.
// Initialize a variable to keep track of the best result 'so far'; it will also be the return value.
int minPositiveSequence = minPositive;
// The bool array to keep track of which (total) cumulative values (always with the sequence starting at element #0) have occurred so far,
// and the 'offset' - see remark 3b above.
int offset = -minSequence;
bool[] initialSequence = new bool[maxSequence + offset + 1];
int valueCumulative = 0;
for (int k = 0; k < indxFirstNegative; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
}
for (int k = indxFirstNegative; k < values.Count; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
// Check whether the difference with any previous "cumulative" may improve the optimum-so-far.
// the index that, if the entry is TRUE, would yield the best possible result.
int indexHigh = valueCumulative + offset - 1;
// the last (lowest) index that, if the entry is TRUE, would still yield an improvement over what we have so far.
int indexLow = Math.Max(0, valueCumulative + offset - minPositiveSequence + 1);
for (int indx = indexHigh; indx >= indexLow; indx--)
{
if (initialSequence[indx])
{
minPositiveSequence = valueCumulative - indx + offset;
if (minPositiveSequence == 1)
return minPositiveSequence;
break;
}
}
}
return minPositiveSequence;
}
}

Interview Ques - find the KTHSMALLEST element in an unsorted array

This problem asked to find k'th smallest element in an unsorted array of non-negative integers.
Here main problem is memory limit :( Here we can use constant extra space.
First I tried a O(n^2) method [without any extra memory] which gave me TLE.
Then i tried to use priority queue [extra memory] which gave me MLE :(
Any idea how to solve the problem with constant extra space and within time limit.
You can use a O(n^2) method with some pruning, which will make the program like O(nlogn) :)
Declare two variable low = maximum value which position is less than k and high = lowest value which position is greater than k
Keep track of the low and high value you already processed.
Whenever a new value comes check if it is in the [low , high] boundary. If yes then process it otherwise skip the value.
That's it :) I think it will pass both TLE and MLE :)
Have a look at my code :
int low=0,high=1e9;
for(int i=0;i<n;i++) // n is the total number of element
{
if(!(A[i]>=low&&A[i]<=high)) // A is the array in which the element are saved
continue;
int cnt=0,cnt1=0; // cnt is for the strictly less value and cnt1 for same value. Because value can be duplicate.
for(int j=0;j<n;j++)
{
if(i!=j&&A[i]>A[j])
cnt++;
if(A[i]==A[j])
cnt1++;
if(cnt>k)
break;
}
if(cnt+cnt1<k)
low=A[i]+1;
else if(cnt>=k)
high=A[i]-1;
if(cnt<k&&(cnt+cnt1)>=k)
{
return A[i];
}
}
You can do an in-place Selection Algorithm.
The idea is similar to quicksort, but recurse only on the relevant part of the array, not all of it. Note that the algorithm can be implemented with O(1) extra space pretty easily - since it's recursive call is a tail call.
This leads to an O(n) solution on average case (just be sure to pick a pivot at random in order to make sure you don't fall into pre-designed edge cases such as a sorted list). That can be improved to worst case O(n) using median of median technique, but with significantly worse constants.
Binary search on the answer for the problem.
2 major observations here :
Given that all values in the array are of type 'int', their range can be defined as [0, 2^31]. That would be your search space.
Given a value x, I can always tell in O(n) if the kth smallest element is smaller than x or greater than x.
A rough pseudocode :
start = 0, end = 2^31 - 1
while start <= end
x = (start + end ) / 2
less = number of elements less than or equal to x
if less > k
end = x - 1
elif less < k
start = x + 1
else
ans = x
end = x - 1
return ans
Hope this helps.
I believe I found a solution that is similar to that of #AliAkber but is slightly easier to understand (I keep track of fewer variables).
It passed all tests on InterviewBit
Here's the code (Java):
public int kthsmallest(final List<Integer> a, int k) {
int lo = Integer.MIN_VALUE;
int hi = Integer.MAX_VALUE;
int champ = -1;
for (int i = 0; i < a.size(); i++) {
int iVal = a.get(i);
int count = 0;
if (!(iVal > lo && iVal < hi)) continue;
for (int j = 0; j < a.size(); j++) {
if (a.get(j) <= iVal) count++;
if (count > k) break;
}
if (count > k && iVal < hi) hi = iVal;
if (count < k && iVal > lo) lo = iVal;
if (count >= k && (champ == -1 || iVal < champ))
champ = iVal;
}
return champ;
}

Numbers of increasing squence using BIT

For EX: A sequence is giving 1 3 2 4 now i have to find the number of increasing sequences.
I came to know about BIT algorithm which is give me O(nlog2n) solution as compared to O(n2).
Code is as follow
void update(int idx ,int val){
while (idx <= MaxVal){
tree[idx] += val;
idx += (idx & -idx);
}
}
To read
int read(int idx){
int sum = 0;
while (idx > 0){
sum += tree[idx];
idx -= (idx & -idx);
}
return sum;
}
I can't understand how they are using BIT algorithms can you please help me
Binary indexed tree's read function will return the number of values which is equals or less than idx.
So, by insert each element one by one, from 0 to n (n is number of elements)
For each element, we need to know how many values that are less than this current element, and has already added to the BIT. Assume that this number is x, so the number of increasing sequence that end at this element is 2^x
After calculating all sequences that ended at this element, we need to add this element into BIT
Pseudo code:
long result = 0;
BIT tree = //initialize BIT tree
for(int i = 0; i < n; i++){
int number = tree.read(data[i] - 1);// Get the number of element that less than data[i];
result += 1L<< number;
tree.update(data[i], 1);
}
As update and read function has O(log n) time complexity, the above algo has time complexity O(n log n)

Find the top k sums of two sorted arrays

You are given two sorted arrays, of sizes n and m respectively. Your task (should you choose to accept it), is to output the largest k sums of the form a[i]+b[j].
A O(k log k) solution can be found here. There are rumors of a O(k) or O(n) solution. Does one exist?
I found the responses at your link mostly vague and poorly structured. Here's a start with a O(k * log(min(m, n))) O(k * log(m + n)) O(k * log(k)) algorithm.
Suppose they are sorted decreasing. Imagine you computed the m*n matrix of the sums as follows:
for i from 0 to m
for j from 0 to n
sums[i][j] = a[i] + b[j]
In this matrix, values monotonically decrease down and to the right. With that in mind, here is an algorithm which performs a graph search through this matrix in order of decreasing sums.
q : priority queue (decreasing) := empty priority queue
add (0, 0) to q with priority a[0] + b[0]
while k > 0:
k--
x := pop q
output x
(i, j) : tuple of int,int := position of x
if i < m:
add (i + 1, j) to q with priority a[i + 1] + b[j]
if j < n:
add (i, j + 1) to q with priority a[i] + b[j + 1]
Analysis:
The loop is executed k times.
There is one pop operation per iteration.
There are up to two insert operations per iteration.
The maximum size of the priority queue is O(min(m, n)) O(m + n) O(k).
The priority queue can be implemented with a binary heap giving log(size) pop and insert.
Therefore this algorithm is O(k * log(min(m, n))) O(k * log(m + n)) O(k * log(k)).
Note that the general priority queue abstract data type needs to be modified to ignore duplicate entries. Alternately, you could maintain a separate set structure that first checks for membership in the set before adding to the queue, and removes from the set after popping from the queue. Neither of these ideas would worsen the time or space complexity.
I could write this up in Java if there's any interest.
Edit: fixed complexity. There is an algorithm which has the complexity I described, but it is slightly different from this one. You would have to take care to avoid adding certain nodes. My simple solution adds many nodes to the queue prematurely.
private static class FrontierElem implements Comparable<FrontierElem> {
int value;
int aIdx;
int bIdx;
public FrontierElem(int value, int aIdx, int bIdx) {
this.value = value;
this.aIdx = aIdx;
this.bIdx = bIdx;
}
#Override
public int compareTo(FrontierElem o) {
return o.value - value;
}
}
public static void findMaxSum( int [] a, int [] b, int k ) {
Integer [] frontierA = new Integer[ a.length ];
Integer [] frontierB = new Integer[ b.length ];
PriorityQueue<FrontierElem> q = new PriorityQueue<MaxSum.FrontierElem>();
frontierA[0] = frontierB[0]=0;
q.add( new FrontierElem( a[0]+b[0], 0, 0));
while( k > 0 ) {
FrontierElem f = q.poll();
System.out.println( f.value+" "+q.size() );
k--;
frontierA[ f.aIdx ] = frontierB[ f.bIdx ] = null;
int fRight = f.aIdx+1;
int fDown = f.bIdx+1;
if( fRight < a.length && frontierA[ fRight ] == null ) {
q.add( new FrontierElem( a[fRight]+b[f.bIdx], fRight, f.bIdx));
frontierA[ fRight ] = f.bIdx;
frontierB[ f.bIdx ] = fRight;
}
if( fDown < b.length && frontierB[ fDown ] == null ) {
q.add( new FrontierElem( a[f.aIdx]+b[fDown], f.aIdx, fDown));
frontierA[ f.aIdx ] = fDown;
frontierB[ fDown ] = f.aIdx;
}
}
}
The idea is similar to the other solution, but with the observation that as you add to your result set from the matrix, at every step the next element in our set can only come from where the current set is concave. I called these elements frontier elements and I keep track of their position in two arrays and their values in a priority queue. This helps keep the queue size down, but by how much I've yet to figure out. It seems to be about sqrt( k ) but I'm not entirely sure about that.
(Of course the frontierA/B arrays could be simple boolean arrays, but this way they fully define my result set, This isn't used anywhere in this example but might be useful otherwise.)
As the pre-condition is the Array are sorted hence lets consider the following
for N= 5;
A[]={ 1,2,3,4,5}
B[]={ 496,497,498,499,500}
Now since we know Summation of N-1 of A&B would be highest hence just insert this in to heap along with the indexes of A & B element ( why, indexes? we'll come to know in a short while )
H.insert(A[N-1]+B[N-1],N-1,N-1);
now
while(!H.empty()) { // the time heap is not empty
H.pop(); // this will give you the sum you are looking for
The indexes which we got at the time of pop, we shall use them for selecting the next sum element.
Consider the following :
if we have i & j as the indexes in A & B , then the next element would be max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ,
So, insert the same if that has not been inserted in the heap
hence
(i,j)= max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ;
if(Hash[i,j]){ // not inserted
H.insert (i,j);
}else{
get the next max from max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ; and insert.
}
K pop-ing them will give you max elements required.
Hope this helps
Many thanks to #rlibby and #xuhdev with such an original idea to solve this kind of problem. I had a similar coding exercise interview require to find N largest sums formed by K elements in K descending sorted arrays - means we must pick 1 element from each sorted arrays to build the largest sum.
Example: List findHighestSums(int[][] lists, int n) {}
[5,4,3,2,1]
[4,1]
[5,0,0]
[6,4,2]
[1]
and a value of 5 for n, your procedure should return a List of size 5:
[21,20,19,19,18]
Below is my code, please take a look carefully for those block comments :D
private class Pair implements Comparable<Pair>{
String state;
int sum;
public Pair(String state, int sum) {
this.state = state;
this.sum = sum;
}
#Override
public int compareTo(Pair o) {
// Max heap
return o.sum - this.sum;
}
}
List<Integer> findHighestSums(int[][] lists, int n) {
int numOfLists = lists.length;
int totalCharacterInState = 0;
/*
* To represent State of combination of largest sum as String
* The number of characters for each list should be Math.ceil(log(list[i].length))
* For example:
* If list1 length contains from 11 to 100 elements
* Then the State represents for list1 will require 2 characters
*/
int[] positionStartingCharacterOfListState = new int[numOfLists + 1];
positionStartingCharacterOfListState[0] = 0;
// the reason to set less or equal here is to get the position starting character of the last list
for(int i = 1; i <= numOfLists; i++) {
int previousListNumOfCharacters = 1;
if(lists[i-1].length > 10) {
previousListNumOfCharacters = (int)Math.ceil(Math.log10(lists[i-1].length));
}
positionStartingCharacterOfListState[i] = positionStartingCharacterOfListState[i-1] + previousListNumOfCharacters;
totalCharacterInState += previousListNumOfCharacters;
}
// Check the state <---> make sure that combination of a sum is new
Set<String> states = new HashSet<>();
List<Integer> result = new ArrayList<>();
StringBuilder sb = new StringBuilder();
// This is a max heap contain <State, largestSum>
PriorityQueue<Pair> pq = new PriorityQueue<>();
char[] stateChars = new char[totalCharacterInState];
Arrays.fill(stateChars, '0');
sb.append(stateChars);
String firstState = sb.toString();
states.add(firstState);
int firstLargestSum = 0;
for(int i = 0; i < numOfLists; i++) firstLargestSum += lists[i][0];
// Imagine this is the initial state in a graph
pq.add(new Pair(firstState, firstLargestSum));
while(n > 0) {
// In case n is larger than the number of combinations of all list entries
if(pq.isEmpty()) break;
Pair top = pq.poll();
String currentState = top.state;
int currentSum = top.sum;
/*
* Loop for all lists and generate new states of which only 1 character is different from the former state
* For example: the initial state (Stage 0) 0 0 0 0 0
* So the next states (Stage 1) should be:
* 1 0 0 0 0
* 0 1 0 0 0 (choose element at index 2 from 2nd array)
* 0 0 1 0 0 (choose element at index 2 from 3rd array)
* 0 0 0 0 1
* But don't forget to check whether index in any lists have exceeded list's length
*/
for(int i = 0; i < numOfLists; i++) {
int indexInList = Integer.parseInt(
currentState.substring(positionStartingCharacterOfListState[i], positionStartingCharacterOfListState[i+1]));
if( indexInList < lists[i].length - 1) {
int numberOfCharacters = positionStartingCharacterOfListState[i+1] - positionStartingCharacterOfListState[i];
sb = new StringBuilder(currentState.substring(0, positionStartingCharacterOfListState[i]));
sb.append(String.format("%0" + numberOfCharacters + "d", indexInList + 1));
sb.append(currentState.substring(positionStartingCharacterOfListState[i+1]));
String newState = sb.toString();
if(!states.contains(newState)) {
// The newSum is always <= currentSum
int newSum = currentSum - lists[i][indexInList] + lists[i][indexInList+1];
states.add(newState);
// Using priority queue, we can immediately retrieve the largest Sum at Stage k and track all other unused states.
// From that Stage k largest Sum's state, then we can generate new states
// Those sums composed by recently generated states don't guarantee to be larger than those sums composed by old unused states.
pq.add(new Pair(newState, newSum));
}
}
}
result.add(currentSum);
n--;
}
return result;
}
Let me explain how I come up with the solution:
The while loop in my answer executes N times, consider the max heap
( priority queue).
Poll operation 1 time with complexity O(log(
sumOfListLength )) because the maximum element Pair in
heap is sumOfListLength.
Insertion operations might up to K times,
the complexity for each insertion is log(sumOfListLength).
Therefore, the complexity is O(N * log(sumOfListLength) ),

How to find the kth largest element in an unsorted array of length n in O(n)?

I believe there's a way to find the kth largest element in an unsorted array of length n in O(n). Or perhaps it's "expected" O(n) or something. How can we do this?
This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.
Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):
Select(A,n,i):
Divide input into ⌈n/5⌉ groups of size 5.
/* Partition on median-of-medians */
medians = array of each group’s median.
pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
Left Array L and Right Array G = partition(A, pivot)
/* Find ith element in L, pivot, or G */
k = |L| + 1
If i = k, return pivot
If i < k, return Select(L, k-1, i)
If i > k, return Select(G, n-k, i-k)
It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.
If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)
The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.
Here is the algorithm.
QuickSelect(A, k)
let r be chosen uniformly at random in the range 1 to length(A)
let pivot = A[r]
let A1, A2 be new arrays
# split into a pile A1 of small elements and A2 of big elements
for i = 1 to n
if A[i] < pivot then
append A[i] to A1
else if A[i] > pivot then
append A[i] to A2
else
# do nothing
end for
if k <= length(A1):
# it's in the pile of small elements
return QuickSelect(A1, k)
else if k > length(A) - length(A2)
# it's in the pile of big elements
return QuickSelect(A2, k - (length(A) - length(A2))
else
# it's equal to the pivot
return pivot
What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of
T(n) = Theta(n) + T(n-1) = Theta(n2)
But if the choices are indeed random, the expected running time is given by
T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))
where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.
Let's guess that T(n) <= an for some a. Then we get
T(n)
<= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
= cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n ai
and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:
∑i=floor(n/2) to n i
= ∑i=1 to n i - ∑i=1 to floor(n/2) i
= n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2
<= n2/2 - (n/4)2/2
= (15/32)n2
where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with
cn + 2 (1/n) ∑i=floor(n/2) to n ai,
<= cn + (2a/n) (15/32) n2
= n (c + (15/16)a)
<= an
provided a > 16c.
This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).
A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17
"Make one pass through tracking the three largest values so far."
(it was specifically for 3d largest)
and this answer:
Build a heap/priority queue. O(n)
Pop top element. O(log n)
Pop top element. O(log n)
Pop top element. O(log n)
Total = O(n) + 3 O(log n) = O(n)
You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).
A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.
I answered the letter of your question :)
The C++ standard library has almost exactly that function call nth_element, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.
const int N = ...;
double a[N];
// ...
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a
You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen.
For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.
Warren's priority heap solution is neater though.
Although not very sure about O(n) complexity, but it will be sure to be between O(n) and nLog(n). Also sure to be closer to O(n) than nLog(n). Function is written in Java
public int quickSelect(ArrayList<Integer>list, int nthSmallest){
//Choose random number in range of 0 to array length
Random random = new Random();
//This will give random number which is not greater than length - 1
int pivotIndex = random.nextInt(list.size() - 1);
int pivot = list.get(pivotIndex);
ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();
//Split list into two.
//Value smaller than pivot should go to smallerNumberList
//Value greater than pivot should go to greaterNumberList
//Do nothing for value which is equal to pivot
for(int i=0; i<list.size(); i++){
if(list.get(i)<pivot){
smallerNumberList.add(list.get(i));
}
else if(list.get(i)>pivot){
greaterNumberList.add(list.get(i));
}
else{
//Do nothing
}
}
//If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list
if(nthSmallest < smallerNumberList.size()){
return quickSelect(smallerNumberList, nthSmallest);
}
//If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
//The step is bit tricky. If confusing, please see the above loop once again for clarification.
else if(nthSmallest > (list.size() - greaterNumberList.size())){
//nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in
//smallerNumberList
nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
return quickSelect(greaterNumberList,nthSmallest);
}
else{
return pivot;
}
}
I implemented finding kth minimimum in n unsorted elements using dynamic programming, specifically tournament method. The execution time is O(n + klog(n)). The mechanism used is listed as one of methods on Wikipedia page about Selection Algorithm (as indicated in one of the posting above). You can read about the algorithm and also find code (java) on my blog page Finding Kth Minimum. In addition the logic can do partial ordering of the list - return first K min (or max) in O(klog(n)) time.
Though the code provided result kth minimum, similar logic can be employed to find kth maximum in O(klog(n)), ignoring the pre-work done to create tournament tree.
Sexy quickselect in Python
def quickselect(arr, k):
'''
k = 1 returns first element in ascending order.
can be easily modified to return first element in descending order
'''
r = random.randrange(0, len(arr))
a1 = [i for i in arr if i < arr[r]] '''partition'''
a2 = [i for i in arr if i > arr[r]]
if k <= len(a1):
return quickselect(a1, k)
elif k > len(arr)-len(a2):
return quickselect(a2, k - (len(arr) - len(a2)))
else:
return arr[r]
As per this paper Finding the Kth largest item in a list of n items the following algorithm will take O(n) time in worst case.
Divide the array in to n/5 lists of 5 elements each.
Find the median in each sub array of 5 elements.
Recursively find the median of all the medians, lets call it M
Partition the array in to two sub array 1st sub-array contains the elements larger than M , lets say this sub-array is a1 , while other sub-array contains the elements smaller then M., lets call this sub-array a2.
If k <= |a1|, return selection (a1,k).
If k− 1 = |a1|, return M.
If k> |a1| + 1, return selection(a2,k −a1 − 1).
Analysis: As suggested in the original paper:
We use the median to partition the list into two halves(the first half,
if k <= n/2 , and the second half otherwise). This algorithm takes
time cn at the first level of recursion for some constant c, cn/2 at
the next level (since we recurse in a list of size n/2), cn/4 at the
third level, and so on. The total time taken is cn + cn/2 + cn/4 +
.... = 2cn = o(n).
Why partition size is taken 5 and not 3?
As mentioned in original paper:
Dividing the list by 5 assures a worst-case split of 70 − 30. Atleast
half of the medians greater than the median-of-medians, hence atleast
half of the n/5 blocks have atleast 3 elements and this gives a
3n/10 split, which means the other partition is 7n/10 in worst case.
That gives T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1, the
worst-case running time isO(n).
Now I have tried to implement the above algorithm as:
public static int findKthLargestUsingMedian(Integer[] array, int k) {
// Step 1: Divide the list into n/5 lists of 5 element each.
int noOfRequiredLists = (int) Math.ceil(array.length / 5.0);
// Step 2: Find pivotal element aka median of medians.
int medianOfMedian = findMedianOfMedians(array, noOfRequiredLists);
//Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian.
List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian
List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian
for (Integer element : array) {
if (element < medianOfMedian) {
listWithSmallerNumbers.add(element);
} else if (element > medianOfMedian) {
listWithGreaterNumbers.add(element);
}
}
// Next step.
if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k);
else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian;
else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1);
return -1;
}
public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) {
int[] medians = new int[noOfRequiredLists];
for (int count = 0; count < noOfRequiredLists; count++) {
int startOfPartialArray = 5 * count;
int endOfPartialArray = startOfPartialArray + 5;
Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray);
// Step 2: Find median of each of these sublists.
int medianIndex = partialArray.length/2;
medians[count] = partialArray[medianIndex];
}
// Step 3: Find median of the medians.
return medians[medians.length / 2];
}
Just for sake of completion, another algorithm makes use of Priority Queue and takes time O(nlogn).
public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) {
int p = 0;
int numElements = nums.length;
// create priority queue where all the elements of nums will be stored
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
// place all the elements of the array to this priority queue
for (int n : nums) {
pq.add(n);
}
// extract the kth largest element
while (numElements - k + 1 > 0) {
p = pq.poll();
k++;
}
return p;
}
Both of these algorithms can be tested as:
public static void main(String[] args) throws IOException {
Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
System.out.println(findKthLargestUsingMedian(numbers, 8));
System.out.println(findKthLargestUsingPriorityQueue(numbers, 8));
}
As expected output is:
18
18
Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies,
Now recurrence becomes:
T(n) = T(n/2) + cn
which gives me O (n) overal.
Below is the link to full implementation with quite an extensive explanation how the algorithm for finding Kth element in an unsorted algorithm works. Basic idea is to partition the array like in QuickSort. But in order to avoid extreme cases (e.g. when smallest element is chosen as pivot in every step, so that algorithm degenerates into O(n^2) running time), special pivot selection is applied, called median-of-medians algorithm. The whole solution runs in O(n) time in worst and in average case.
Here is link to the full article (it is about finding Kth smallest element, but the principle is the same for finding Kth largest):
Finding Kth Smallest Element in an Unsorted Array
How about this kinda approach
Maintain a buffer of length k and a tmp_max, getting tmp_max is O(k) and is done n times so something like O(kn)
Is it right or am i missing something ?
Although it doesn't beat average case of quickselect and worst case of median statistics method but its pretty easy to understand and implement.
There is also one algorithm, that outperforms quickselect algorithm. It's called Floyd-Rivets (FR) algorithm.
Original article: https://doi.org/10.1145/360680.360694
Downloadable version: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf
Wikipedia article https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm
I tried to implement quickselect and FR algorithm in C++. Also I compared them to the standard C++ library implementations std::nth_element (which is basically introselect hybrid of quickselect and heapselect). The result was quickselect and nth_element ran comparably on average, but FR algorithm ran approx. twice as fast compared to them.
Sample code that I used for FR algorithm:
template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
if (n == 0)
return *(std::min_element(data.begin(), data.end()));
else if (n == data.size() - 1)
return *(std::max_element(data.begin(), data.end()));
else
return _FRselect(data, 0, data.size() - 1, n);
}
template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
size_t leftIdx = left;
size_t rightIdx = right;
while (rightIdx > leftIdx)
{
if (rightIdx - leftIdx > 600)
{
size_t range = rightIdx - leftIdx + 1;
long long i = n - (long long)leftIdx + 1;
long long z = log(range);
long long s = 0.5 * exp(2 * z / 3);
long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);
size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);
_FRselect(data, newLeft, newRight, n);
}
T t = data[n];
size_t i = leftIdx;
size_t j = rightIdx;
// arrange pivot and right index
std::swap(data[leftIdx], data[n]);
if (data[rightIdx] > t)
std::swap(data[rightIdx], data[leftIdx]);
while (i < j)
{
std::swap(data[i], data[j]);
++i; --j;
while (data[i] < t) ++i;
while (data[j] > t) --j;
}
if (data[leftIdx] == t)
std::swap(data[leftIdx], data[j]);
else
{
++j;
std::swap(data[j], data[rightIdx]);
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= n)
leftIdx = j + 1;
if (n <= j)
rightIdx = j - 1;
}
return data[leftIdx];
}
template <typename T>
int sgn(T val) {
return (T(0) < val) - (val < T(0));
}
iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)
i would like to suggest one answer
if we take the first k elements and sort them into a linked list of k values
now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)
cheers
Explanation of the median - of - medians algorithm to find the k-th largest integer out of n can be found here:
http://cs.indstate.edu/~spitla/presentation.pdf
Implementation in c++ is below:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int findMedian(vector<int> vec){
// Find median of a vector
int median;
size_t size = vec.size();
median = vec[(size/2)];
return median;
}
int findMedianOfMedians(vector<vector<int> > values){
vector<int> medians;
for (int i = 0; i < values.size(); i++) {
int m = findMedian(values[i]);
medians.push_back(m);
}
return findMedian(medians);
}
void selectionByMedianOfMedians(const vector<int> values, int k){
// Divide the list into n/5 lists of 5 elements each
vector<vector<int> > vec2D;
int count = 0;
while (count != values.size()) {
int countRow = 0;
vector<int> row;
while ((countRow < 5) && (count < values.size())) {
row.push_back(values[count]);
count++;
countRow++;
}
vec2D.push_back(row);
}
cout<<endl<<endl<<"Printing 2D vector : "<<endl;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
cout<<vec2D[i][j]<<" ";
}
cout<<endl;
}
cout<<endl;
// Calculating a new pivot for making splits
int m = findMedianOfMedians(vec2D);
cout<<"Median of medians is : "<<m<<endl;
// Partition the list into unique elements larger than 'm' (call this sublist L1) and
// those smaller them 'm' (call this sublist L2)
vector<int> L1, L2;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
if (vec2D[i][j] > m) {
L1.push_back(vec2D[i][j]);
}else if (vec2D[i][j] < m){
L2.push_back(vec2D[i][j]);
}
}
}
// Checking the splits as per the new pivot 'm'
cout<<endl<<"Printing L1 : "<<endl;
for (int i = 0; i < L1.size(); i++) {
cout<<L1[i]<<" ";
}
cout<<endl<<endl<<"Printing L2 : "<<endl;
for (int i = 0; i < L2.size(); i++) {
cout<<L2[i]<<" ";
}
// Recursive calls
if ((k - 1) == L1.size()) {
cout<<endl<<endl<<"Answer :"<<m;
}else if (k <= L1.size()) {
return selectionByMedianOfMedians(L1, k);
}else if (k > (L1.size() + 1)){
return selectionByMedianOfMedians(L2, k-((int)L1.size())-1);
}
}
int main()
{
int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
vector<int> vec(values, values + 25);
cout<<"The given array is : "<<endl;
for (int i = 0; i < vec.size(); i++) {
cout<<vec[i]<<" ";
}
selectionByMedianOfMedians(vec, 8);
return 0;
}
There is also Wirth's selection algorithm, which has a simpler implementation than QuickSelect. Wirth's selection algorithm is slower than QuickSelect, but with some improvements it becomes faster.
In more detail. Using Vladimir Zabrodsky's MODIFIND optimization and the median-of-3 pivot selection and paying some attention to the final steps of the partitioning part of the algorithm, i've came up with the following algorithm (imaginably named "LefSelect"):
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
int l=0, m = n-1, i=l, j=m;
float x;
while (l<m) {
if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
if( a[j] < a[k] ) F_SWAP(a[k],a[j]);
x=a[k];
while (j>k & i<k) {
do i++; while (a[i]<x);
do j--; while (a[j]>x);
F_SWAP(a[i],a[j]);
}
i++; j--;
if (j<k) {
while (a[i]<x) i++;
l=i; j=m;
}
if (k<i) {
while (x<a[j]) j--;
m=j; i=l;
}
}
return a[k];
}
In benchmarks that i did here, LefSelect is 20-30% faster than QuickSelect.
Haskell Solution:
kthElem index list = sort list !! index
withShape ~[] [] = []
withShape ~(x:xs) (y:ys) = x : withShape xs ys
sort [] = []
sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs)
where
ls = filter (< x)
rs = filter (>= x)
This implements the median of median solutions by using the withShape method to discover the size of a partition without actually computing it.
Here is a C++ implementation of Randomized QuickSelect. The idea is to randomly pick a pivot element. To implement randomized partition, we use a random function, rand() to generate index between l and r, swap the element at randomly generated index with the last element, and finally call the standard partition process which uses last element as pivot.
#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;
int randomPartition(int arr[], int l, int r);
// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method. ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
// If k is smaller than number of elements in array
if (k > 0 && k <= r - l + 1)
{
// Partition the array around a random element and
// get position of pivot element in sorted array
int pos = randomPartition(arr, l, r);
// If position is same as k
if (pos-l == k-1)
return arr[pos];
if (pos-l > k-1) // If position is more, recur for left subarray
return kthSmallest(arr, l, pos-1, k);
// Else recur for right subarray
return kthSmallest(arr, pos+1, r, k-pos+l-1);
}
// If k is more than number of elements in array
return INT_MAX;
}
void swap(int *a, int *b)
{
int temp = *a;
*a = *b;
*b = temp;
}
// Standard partition process of QuickSort(). It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
int x = arr[r], i = l;
for (int j = l; j <= r - 1; j++)
{
if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
{
swap(&arr[i], &arr[j]);
i++;
}
}
swap(&arr[i], &arr[r]); // swap the pivot
return i;
}
// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
int n = r-l+1;
int pivot = rand() % n;
swap(&arr[l + pivot], &arr[r]);
return partition(arr, l, r);
}
// Driver program to test above methods
int main()
{
int arr[] = {12, 3, 5, 7, 4, 19, 26};
int n = sizeof(arr)/sizeof(arr[0]), k = 3;
cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
return 0;
}
The worst case time complexity of the above solution is still O(n2).In worst case, the randomized function may always pick a corner element. The expected time complexity of above randomized QuickSelect is Θ(n)
Have Priority queue created.
Insert all the elements into heap.
Call poll() k times.
public static int getKthLargestElements(int[] arr)
{
PriorityQueue<Integer> pq = new PriorityQueue<>((x , y) -> (y-x));
//insert all the elements into heap
for(int ele : arr)
pq.offer(ele);
// call poll() k times
int i=0;
while(i<k)
{
int result = pq.poll();
}
return result;
}
This is an implementation in Javascript.
If you release the constraint that you cannot modify the array, you can prevent the use of extra memory using two indexes to identify the "current partition" (in classic quicksort style - http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/).
function kthMax(a, k){
var size = a.length;
var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2)
//Create an array with all element lower than the pivot and an array with all element higher than the pivot
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
lowerArray.push(current);
} else if (current > pivot) {
upperArray.push(current);
}
}
//Which one should I continue with?
if(k <= upperArray.length) {
//Upper
return kthMax(upperArray, k);
} else {
var newK = k - (size - lowerArray.length);
if (newK > 0) {
///Lower
return kthMax(lowerArray, newK);
} else {
//None ... it's the current pivot!
return pivot;
}
}
}
If you want to test how it perform, you can use this variation:
function kthMax (a, k, logging) {
var comparisonCount = 0; //Number of comparison that the algorithm uses
var memoryCount = 0; //Number of integers in memory that the algorithm uses
var _log = logging;
if(k < 0 || k >= a.length) {
if (_log) console.log ("k is out of range");
return false;
}
function _kthmax(a, k){
var size = a.length;
var pivot = a[parseInt(Math.random()*size)];
if(_log) console.log("Inputs:", a, "size="+size, "k="+k, "pivot="+pivot);
// This should never happen. Just a nice check in this exercise
// if you are playing with the code to avoid never ending recursion
if(typeof pivot === "undefined") {
if (_log) console.log ("Ops...");
return false;
}
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
comparisonCount += 1;
memoryCount++;
lowerArray.push(current);
} else if (current > pivot) {
comparisonCount += 2;
memoryCount++;
upperArray.push(current);
}
}
if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);
if(k <= upperArray.length) {
comparisonCount += 1;
return _kthmax(upperArray, k);
} else if (k > size - lowerArray.length) {
comparisonCount += 2;
return _kthmax(lowerArray, k - (size - lowerArray.length));
} else {
comparisonCount += 2;
return pivot;
}
/*
* BTW, this is the logic for kthMin if we want to implement that... ;-)
*
if(k <= lowerArray.length) {
return kthMin(lowerArray, k);
} else if (k > size - upperArray.length) {
return kthMin(upperArray, k - (size - upperArray.length));
} else
return pivot;
*/
}
var result = _kthmax(a, k);
return {result: result, iterations: comparisonCount, memory: memoryCount};
}
The rest of the code is just to create some playground:
function getRandomArray (n){
var ar = [];
for (var i = 0, l = n; i < l; i++) {
ar.push(Math.round(Math.random() * l))
}
return ar;
}
//Create a random array of 50 numbers
var ar = getRandomArray (50);
Now, run you tests a few time.
Because of the Math.random() it will produce every time different results:
kthMax(ar, 2, true);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 34, true);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
If you test it a few times you can see even empirically that the number of iterations is, on average, O(n) ~= constant * n and the value of k does not affect the algorithm.
I came up with this algorithm and seems to be O(n):
Let's say k=3 and we want to find the 3rd largest item in the array. I would create three variables and compare each item of the array with the minimum of these three variables. If array item is greater than our minimum, we would replace the min variable with the item value. We continue the same thing until end of the array. The minimum of our three variables is the 3rd largest item in the array.
define variables a=0, b=0, c=0
iterate through the array items
find minimum a,b,c
if item > min then replace the min variable with item value
continue until end of array
the minimum of a,b,c is our answer
And, to find Kth largest item we need K variables.
Example: (k=3)
[1,2,4,1,7,3,9,5,6,2,9,8]
Final variable values:
a=7 (answer)
b=8
c=9
Can someone please review this and let me know what I am missing?
Here is the implementation of the algorithm eladv suggested(I also put here the implementation with random pivot):
public class Median {
public static void main(String[] s) {
int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
System.out.println(selectK(test,8));
/*
int n = 100000000;
int[] test = new int[n];
for(int i=0; i<test.length; i++)
test[i] = (int)(Math.random()*test.length);
long start = System.currentTimeMillis();
random_selectK(test, test.length/2);
long end = System.currentTimeMillis();
System.out.println(end - start);
*/
}
public static int random_selectK(int[] a, int k) {
if(a.length <= 1)
return a[0];
int r = (int)(Math.random() * a.length);
int p = a[r];
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return random_selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return random_selectK(temp,k-small-equal);
}
}
public static int selectK(int[] a, int k) {
if(a.length <= 5) {
Arrays.sort(a);
return a[k-1];
}
int p = median_of_medians(a);
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return selectK(temp,k-small-equal);
}
}
private static int median_of_medians(int[] a) {
int[] b = new int[a.length/5];
int[] temp = new int[5];
for(int i=0; i<b.length; i++) {
for(int j=0; j<5; j++)
temp[j] = a[5*i + j];
Arrays.sort(temp);
b[i] = temp[2];
}
return selectK(b, b.length/2 + 1);
}
}
it is similar to the quickSort strategy, where we pick an arbitrary pivot, and bring the smaller elements to its left, and the larger to the right
public static int kthElInUnsortedList(List<int> list, int k)
{
if (list.Count == 1)
return list[0];
List<int> left = new List<int>();
List<int> right = new List<int>();
int pivotIndex = list.Count / 2;
int pivot = list[pivotIndex]; //arbitrary
for (int i = 0; i < list.Count && i != pivotIndex; i++)
{
int currentEl = list[i];
if (currentEl < pivot)
left.Add(currentEl);
else
right.Add(currentEl);
}
if (k == left.Count + 1)
return pivot;
if (left.Count < k)
return kthElInUnsortedList(right, k - left.Count - 1);
else
return kthElInUnsortedList(left, k);
}
Go to the End of this link : ...........
http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/
You can find the kth smallest element in O(n) time and constant space. If we consider the array is only for integers.
The approach is to do a binary search on the range of Array values. If we have a min_value and a max_value both in integer range, we can do a binary search on that range.
We can write a comparator function which will tell us if any value is the kth-smallest or smaller than kth-smallest or bigger than kth-smallest.
Do the binary search until you reach the kth-smallest number
Here is the code for that
class Solution:
def _iskthsmallest(self, A, val, k):
less_count, equal_count = 0, 0
for i in range(len(A)):
if A[i] == val: equal_count += 1
if A[i] < val: less_count += 1
if less_count >= k: return 1
if less_count + equal_count < k: return -1
return 0
def kthsmallest_binary(self, A, min_val, max_val, k):
if min_val == max_val:
return min_val
mid = (min_val + max_val)/2
iskthsmallest = self._iskthsmallest(A, mid, k)
if iskthsmallest == 0: return mid
if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
return self.kthsmallest_binary(A, mid+1, max_val, k)
# #param A : tuple of integers
# #param B : integer
# #return an integer
def kthsmallest(self, A, k):
if not A: return 0
if k > len(A): return 0
min_val, max_val = min(A), max(A)
return self.kthsmallest_binary(A, min_val, max_val, k)
What I would do is this:
initialize empty doubly linked list l
for each element e in array
if e larger than head(l)
make e the new head of l
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.
Update:
initialize empty sorted tree l
for each element e in array
if e between head(l) and tail(l)
insert e into l // O(log k)
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
First we can build a BST from unsorted array which takes O(n) time and from the BST we can find the kth smallest element in O(log(n)) which over all counts to an order of O(n).

Resources