Find the top k sums of two sorted arrays - algorithm

You are given two sorted arrays, of sizes n and m respectively. Your task (should you choose to accept it), is to output the largest k sums of the form a[i]+b[j].
A O(k log k) solution can be found here. There are rumors of a O(k) or O(n) solution. Does one exist?

I found the responses at your link mostly vague and poorly structured. Here's a start with a O(k * log(min(m, n))) O(k * log(m + n)) O(k * log(k)) algorithm.
Suppose they are sorted decreasing. Imagine you computed the m*n matrix of the sums as follows:
for i from 0 to m
for j from 0 to n
sums[i][j] = a[i] + b[j]
In this matrix, values monotonically decrease down and to the right. With that in mind, here is an algorithm which performs a graph search through this matrix in order of decreasing sums.
q : priority queue (decreasing) := empty priority queue
add (0, 0) to q with priority a[0] + b[0]
while k > 0:
k--
x := pop q
output x
(i, j) : tuple of int,int := position of x
if i < m:
add (i + 1, j) to q with priority a[i + 1] + b[j]
if j < n:
add (i, j + 1) to q with priority a[i] + b[j + 1]
Analysis:
The loop is executed k times.
There is one pop operation per iteration.
There are up to two insert operations per iteration.
The maximum size of the priority queue is O(min(m, n)) O(m + n) O(k).
The priority queue can be implemented with a binary heap giving log(size) pop and insert.
Therefore this algorithm is O(k * log(min(m, n))) O(k * log(m + n)) O(k * log(k)).
Note that the general priority queue abstract data type needs to be modified to ignore duplicate entries. Alternately, you could maintain a separate set structure that first checks for membership in the set before adding to the queue, and removes from the set after popping from the queue. Neither of these ideas would worsen the time or space complexity.
I could write this up in Java if there's any interest.
Edit: fixed complexity. There is an algorithm which has the complexity I described, but it is slightly different from this one. You would have to take care to avoid adding certain nodes. My simple solution adds many nodes to the queue prematurely.

private static class FrontierElem implements Comparable<FrontierElem> {
int value;
int aIdx;
int bIdx;
public FrontierElem(int value, int aIdx, int bIdx) {
this.value = value;
this.aIdx = aIdx;
this.bIdx = bIdx;
}
#Override
public int compareTo(FrontierElem o) {
return o.value - value;
}
}
public static void findMaxSum( int [] a, int [] b, int k ) {
Integer [] frontierA = new Integer[ a.length ];
Integer [] frontierB = new Integer[ b.length ];
PriorityQueue<FrontierElem> q = new PriorityQueue<MaxSum.FrontierElem>();
frontierA[0] = frontierB[0]=0;
q.add( new FrontierElem( a[0]+b[0], 0, 0));
while( k > 0 ) {
FrontierElem f = q.poll();
System.out.println( f.value+" "+q.size() );
k--;
frontierA[ f.aIdx ] = frontierB[ f.bIdx ] = null;
int fRight = f.aIdx+1;
int fDown = f.bIdx+1;
if( fRight < a.length && frontierA[ fRight ] == null ) {
q.add( new FrontierElem( a[fRight]+b[f.bIdx], fRight, f.bIdx));
frontierA[ fRight ] = f.bIdx;
frontierB[ f.bIdx ] = fRight;
}
if( fDown < b.length && frontierB[ fDown ] == null ) {
q.add( new FrontierElem( a[f.aIdx]+b[fDown], f.aIdx, fDown));
frontierA[ f.aIdx ] = fDown;
frontierB[ fDown ] = f.aIdx;
}
}
}
The idea is similar to the other solution, but with the observation that as you add to your result set from the matrix, at every step the next element in our set can only come from where the current set is concave. I called these elements frontier elements and I keep track of their position in two arrays and their values in a priority queue. This helps keep the queue size down, but by how much I've yet to figure out. It seems to be about sqrt( k ) but I'm not entirely sure about that.
(Of course the frontierA/B arrays could be simple boolean arrays, but this way they fully define my result set, This isn't used anywhere in this example but might be useful otherwise.)

As the pre-condition is the Array are sorted hence lets consider the following
for N= 5;
A[]={ 1,2,3,4,5}
B[]={ 496,497,498,499,500}
Now since we know Summation of N-1 of A&B would be highest hence just insert this in to heap along with the indexes of A & B element ( why, indexes? we'll come to know in a short while )
H.insert(A[N-1]+B[N-1],N-1,N-1);
now
while(!H.empty()) { // the time heap is not empty
H.pop(); // this will give you the sum you are looking for
The indexes which we got at the time of pop, we shall use them for selecting the next sum element.
Consider the following :
if we have i & j as the indexes in A & B , then the next element would be max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ,
So, insert the same if that has not been inserted in the heap
hence
(i,j)= max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ;
if(Hash[i,j]){ // not inserted
H.insert (i,j);
}else{
get the next max from max ( A[i]+B[j-1], A[i-1]+B[j], A[i+1]+B[j+1] ) ; and insert.
}
K pop-ing them will give you max elements required.
Hope this helps

Many thanks to #rlibby and #xuhdev with such an original idea to solve this kind of problem. I had a similar coding exercise interview require to find N largest sums formed by K elements in K descending sorted arrays - means we must pick 1 element from each sorted arrays to build the largest sum.
Example: List findHighestSums(int[][] lists, int n) {}
[5,4,3,2,1]
[4,1]
[5,0,0]
[6,4,2]
[1]
and a value of 5 for n, your procedure should return a List of size 5:
[21,20,19,19,18]
Below is my code, please take a look carefully for those block comments :D
private class Pair implements Comparable<Pair>{
String state;
int sum;
public Pair(String state, int sum) {
this.state = state;
this.sum = sum;
}
#Override
public int compareTo(Pair o) {
// Max heap
return o.sum - this.sum;
}
}
List<Integer> findHighestSums(int[][] lists, int n) {
int numOfLists = lists.length;
int totalCharacterInState = 0;
/*
* To represent State of combination of largest sum as String
* The number of characters for each list should be Math.ceil(log(list[i].length))
* For example:
* If list1 length contains from 11 to 100 elements
* Then the State represents for list1 will require 2 characters
*/
int[] positionStartingCharacterOfListState = new int[numOfLists + 1];
positionStartingCharacterOfListState[0] = 0;
// the reason to set less or equal here is to get the position starting character of the last list
for(int i = 1; i <= numOfLists; i++) {
int previousListNumOfCharacters = 1;
if(lists[i-1].length > 10) {
previousListNumOfCharacters = (int)Math.ceil(Math.log10(lists[i-1].length));
}
positionStartingCharacterOfListState[i] = positionStartingCharacterOfListState[i-1] + previousListNumOfCharacters;
totalCharacterInState += previousListNumOfCharacters;
}
// Check the state <---> make sure that combination of a sum is new
Set<String> states = new HashSet<>();
List<Integer> result = new ArrayList<>();
StringBuilder sb = new StringBuilder();
// This is a max heap contain <State, largestSum>
PriorityQueue<Pair> pq = new PriorityQueue<>();
char[] stateChars = new char[totalCharacterInState];
Arrays.fill(stateChars, '0');
sb.append(stateChars);
String firstState = sb.toString();
states.add(firstState);
int firstLargestSum = 0;
for(int i = 0; i < numOfLists; i++) firstLargestSum += lists[i][0];
// Imagine this is the initial state in a graph
pq.add(new Pair(firstState, firstLargestSum));
while(n > 0) {
// In case n is larger than the number of combinations of all list entries
if(pq.isEmpty()) break;
Pair top = pq.poll();
String currentState = top.state;
int currentSum = top.sum;
/*
* Loop for all lists and generate new states of which only 1 character is different from the former state
* For example: the initial state (Stage 0) 0 0 0 0 0
* So the next states (Stage 1) should be:
* 1 0 0 0 0
* 0 1 0 0 0 (choose element at index 2 from 2nd array)
* 0 0 1 0 0 (choose element at index 2 from 3rd array)
* 0 0 0 0 1
* But don't forget to check whether index in any lists have exceeded list's length
*/
for(int i = 0; i < numOfLists; i++) {
int indexInList = Integer.parseInt(
currentState.substring(positionStartingCharacterOfListState[i], positionStartingCharacterOfListState[i+1]));
if( indexInList < lists[i].length - 1) {
int numberOfCharacters = positionStartingCharacterOfListState[i+1] - positionStartingCharacterOfListState[i];
sb = new StringBuilder(currentState.substring(0, positionStartingCharacterOfListState[i]));
sb.append(String.format("%0" + numberOfCharacters + "d", indexInList + 1));
sb.append(currentState.substring(positionStartingCharacterOfListState[i+1]));
String newState = sb.toString();
if(!states.contains(newState)) {
// The newSum is always <= currentSum
int newSum = currentSum - lists[i][indexInList] + lists[i][indexInList+1];
states.add(newState);
// Using priority queue, we can immediately retrieve the largest Sum at Stage k and track all other unused states.
// From that Stage k largest Sum's state, then we can generate new states
// Those sums composed by recently generated states don't guarantee to be larger than those sums composed by old unused states.
pq.add(new Pair(newState, newSum));
}
}
}
result.add(currentSum);
n--;
}
return result;
}
Let me explain how I come up with the solution:
The while loop in my answer executes N times, consider the max heap
( priority queue).
Poll operation 1 time with complexity O(log(
sumOfListLength )) because the maximum element Pair in
heap is sumOfListLength.
Insertion operations might up to K times,
the complexity for each insertion is log(sumOfListLength).
Therefore, the complexity is O(N * log(sumOfListLength) ),

Related

How to maintain a min sliding window for an unsorted array? [duplicate]

Given an array of size n and k, how do you find the maximum for every contiguous subarray of size k?
For example
arr = 1 5 2 6 3 1 24 7
k = 3
ans = 5 6 6 6 24 24
I was thinking of having an array of size k and each step evict the last element out and add the new element and find maximum among that. It leads to a running time of O(nk). Is there a better way to do this?
You have heard about doing it in O(n) using dequeue.
Well that is a well known algorithm for this question to do in O(n).
The method i am telling is quite simple and has time complexity O(n).
Your Sample Input:
n=10 , W = 3
10 3
1 -2 5 6 0 9 8 -1 2 0
Answer = 5 6 6 9 9 9 8 2
Concept: Dynamic Programming
Algorithm:
N is number of elements in an array and W is window size. So, Window number = N-W+1
Now divide array into blocks of W starting from index 1.
Here divide into blocks of size 'W'=3.
For your sample input:
We have divided into blocks because we will calculate maximum in 2 ways A.) by traversing from left to right B.) by traversing from right to left.
but how ??
Firstly, Traversing from Left to Right. For each element ai in block we will find maximum till that element ai starting from START of Block to END of that block.
So here,
Secondly, Traversing from Right to Left. For each element 'ai' in block we will find maximum till that element 'ai' starting from END of Block to START of that block.
So Here,
Now we have to find maximum for each subarray or window of size 'W'.
So, starting from index = 1 to index = N-W+1 .
max_val[index] = max(RL[index], LR[index+w-1]);
for index=1: max_val[1] = max(RL[1],LR[3]) = max(5,5)= 5
Simliarly, for all index i, (i<=(n-k+1)), value at RL[i] and LR[i+w-1]
are compared and maximum among those two is answer for that subarray.
So Final Answer : 5 6 6 9 9 9 8 2
Time Complexity: O(n)
Implementation code:
#include <iostream>
#include <cstdio>
#include <cstring>
#include <algorithm>
#define LIM 100001
using namespace std;
int arr[LIM]; // Input Array
int LR[LIM]; // maximum from Left to Right
int RL[LIM]; // maximum from Right to left
int max_val[LIM]; // number of subarrays(windows) will be n-k+1
int main(){
int n, w, i, k; // 'n' is number of elements in array
// 'w' is Window's Size
cin >> n >> w;
k = n - w + 1; // 'K' is number of Windows
for(i = 1; i <= n; i++)
cin >> arr[i];
for(i = 1; i <= n; i++){ // for maximum Left to Right
if(i % w == 1) // that means START of a block
LR[i] = arr[i];
else
LR[i] = max(LR[i - 1], arr[i]);
}
for(i = n; i >= 1; i--){ // for maximum Right to Left
if(i == n) // Maybe the last block is not of size 'W'.
RL[i] = arr[i];
else if(i % w == 0) // that means END of a block
RL[i] = arr[i];
else
RL[i] = max(RL[i+1], arr[i]);
}
for(i = 1; i <= k; i++) // maximum
max_val[i] = max(RL[i], LR[i + w - 1]);
for(i = 1; i <= k ; i++)
cout << max_val[i] << " ";
cout << endl;
return 0;
}
Running Code Link
I'll try to proof: (by #johnchen902)
If k % w != 1 (k is not the begin of a block)
Let k* = The begin of block containing k
ans[k] = max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k + w - 1])
= max( max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k*]),
max( arr[k*], arr[k* + 1], arr[k* + 2], ..., arr[k + w - 1]) )
= max( RL[k], LR[k+w-1] )
Otherwise (k is the begin of a block)
ans[k] = max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k + w - 1])
= RL[k] = LR[k+w-1]
= max( RL[k], LR[k+w-1] )
Dynamic programming approach is very neatly explained by Shashank Jain. I would like to explain how to do the same using dequeue.
The key is to maintain the max element at the top of the queue(for a window ) and discarding the useless elements and we also need to discard the elements that are out of index of current window.
useless elements = If Current element is greater than the last element of queue than the last element of queue is useless .
Note : We are storing the index in queue not the element itself. It will be more clear from the code itself.
1. If Current element is greater than the last element of queue than the last element of queue is useless . We need to delete that last element.
(and keep deleting until the last element of queue is smaller than current element).
2. If if current_index - k >= q.front() that means we are going out of window so we need to delete the element from front of queue.
vector<int> max_sub_deque(vector<int> &A,int k)
{
deque<int> q;
for(int i=0;i<k;i++)
{
while(!q.empty() && A[i] >= A[q.back()])
q.pop_back();
q.push_back(i);
}
vector<int> res;
for(int i=k;i<A.size();i++)
{
res.push_back(A[q.front()]);
while(!q.empty() && A[i] >= A[q.back()] )
q.pop_back();
while(!q.empty() && q.front() <= i-k)
q.pop_front();
q.push_back(i);
}
res.push_back(A[q.front()]);
return res;
}
Since each element is enqueued and dequeued atmost 1 time to time complexity is O(n+n) = O(2n) = O(n).
And the size of queue can not exceed the limit k . so space complexity = O(k).
An O(n) time solution is possible by combining the two classic interview questions:
Make a stack data-structure (called MaxStack) which supports push, pop and max in O(1) time.
This can be done using two stacks, the second one contains the minimum seen so far.
Model a queue with a stack.
This can done using two stacks. Enqueues go into one stack, and dequeues come from the other.
For this problem, we basically need a queue, which supports enqueue, dequeue and max in O(1) (amortized) time.
We combine the above two, by modelling a queue with two MaxStacks.
To solve the question, we queue k elements, query the max, dequeue, enqueue k+1 th element, query the max etc. This will give you the max for every k sized sub-array.
I believe there are other solutions too.
1)
I believe the queue idea can be simplified. We maintain a queue and a max for every k. We enqueue a new element, and dequeu all elements which are not greater than the new element.
2) Maintain two new arrays which maintain the running max for each block of k, one array for one direction (left to right/right to left).
3) Use a hammer: Preprocess in O(n) time for range maximum queries.
The 1) solution above might be the most optimal.
You need a fast data structure that can add, remove and query for the max element in less than O(n) time (you can just use an array if O(n) or O(nlogn) is acceptable). You can use a heap, a balanced binary search tree, a skip list, or any other sorted data structure that performs these operations in O(log(n)).
The good news is that most popular languages have a sorted data structure implemented that supports these operations for you. C++ has std::set and std::multiset (you probably need the latter) and Java has PriorityQueue and TreeSet.
Here is the java implementation
public static Integer[] maxsInEveryWindows(int[] arr, int k) {
Deque<Integer> deque = new ArrayDeque<Integer>();
/* Process first k (or first window) elements of array */
for (int i = 0; i < k; i++) {
// For very element, the previous smaller elements are useless so
// remove them from deque
while (!deque.isEmpty() && arr[i] >= arr[deque.peekLast()]) {
deque.removeLast(); // Remove from rear
}
// Add new element at rear of queue
deque.addLast(i);
}
List<Integer> result = new ArrayList<Integer>();
// Process rest of the elements, i.e., from arr[k] to arr[n-1]
for (int i = k; i < arr.length; i++) {
// The element at the front of the queue is the largest element of
// previous window, so add to result.
result.add(arr[deque.getFirst()]);
// Remove all elements smaller than the currently
// being added element (remove useless elements)
while (!deque.isEmpty() && arr[i] >= arr[deque.peekLast()]) {
deque.removeLast();
}
// Remove the elements which are out of this window
while (!deque.isEmpty() && deque.getFirst() <= i - k) {
deque.removeFirst();
}
// Add current element at the rear of deque
deque.addLast(i);
}
// Print the maximum element of last window
result.add(arr[deque.getFirst()]);
return result.toArray(new Integer[0]);
}
Here is the corresponding test case
#Test
public void maxsInWindowsOfSizeKTest() {
Integer[] result = ArrayUtils.maxsInEveryWindows(new int[]{1, 2, 3, 1, 4, 5, 2, 3, 6}, 3);
assertThat(result, equalTo(new Integer[]{3, 3, 4, 5, 5, 5, 6}));
result = ArrayUtils.maxsInEveryWindows(new int[]{8, 5, 10, 7, 9, 4, 15, 12, 90, 13}, 4);
assertThat(result, equalTo(new Integer[]{10, 10, 10, 15, 15, 90, 90}));
}
Using a heap (or tree), you should be able to do it in O(n * log(k)). I'm not sure if this would be indeed better.
here is the Python implementation in O(1)...Thanks to #Shahshank Jain in advance..
from sys import stdin,stdout
from operator import *
n,w=map(int , stdin.readline().strip().split())
Arr=list(map(int , stdin.readline().strip().split()))
k=n-w+1 # window size = k
leftA=[0]*n
rightA=[0]*n
result=[0]*k
for i in range(n):
if i%w==0:
leftA[i]=Arr[i]
else:
leftA[i]=max(Arr[i],leftA[i-1])
for i in range(n-1,-1,-1):
if i%w==(w-1) or i==n-1:
rightA[i]=Arr[i]
else:
rightA[i]=max(Arr[i],rightA[i+1])
for i in range(k):
result[i]=max(rightA[i],leftA[i+w-1])
print(*result,sep=' ')
Method 1: O(n) time, O(k) space
We use a deque (it is like a list but with constant-time insertion and deletion from both ends) to store the index of useful elements.
The index of the current max is kept at the leftmost element of deque. The rightmost element of deque is the smallest.
In the following, for easier explanation we say an element from the array is in the deque, while in fact the index of that element is in the deque.
Let's say {5, 3, 2} are already in the deque (again, if fact their indexes are).
If the next element we read from the array is bigger than 5 (remember, the leftmost element of deque holds the max), say 7: We delete the deque and create a new one with only 7 in it (we do this because the current elements are useless, we have found a new max).
If the next element is less than 2 (which is the smallest element of deque), say 1: We add it to the right ({5, 3, 2, 1})
If the next element is bigger than 2 but less than 5, say 4: We remove elements from right that are smaller than the element and then add the element from right ({5, 4}).
Also we keep elements of the current window only (we can do this in constant time because we are storing the indexes instead of elements).
from collections import deque
def max_subarray(array, k):
deq = deque()
for index, item in enumerate(array):
if len(deq) == 0:
deq.append(index)
elif index - deq[0] >= k: # the max element is out of the window
deq.popleft()
elif item > array[deq[0]]: # found a new max
deq = deque()
deq.append(index)
elif item < array[deq[-1]]: # the array item is smaller than all the deque elements
deq.append(index)
elif item > array[deq[-1]] and item < array[deq[0]]:
while item > array[deq[-1]]:
deq.pop()
deq.append(index)
if index >= k - 1: # start printing when the first window is filled
print(array[deq[0]])
Proof of O(n) time: The only part we need to check is the while loop. In the whole runtime of the code, the while loop can perform at most O(n) operations in total. The reason is that the while loop pops elements from the deque, and since in other parts of the code, we do at most O(n) insertions into the deque, the while loop cannot exceed O(n) operations in total. So the total runtime is O(n) + O(n) = O(n)
Method 2: O(n) time, O(n) space
This is the explanation of the method suggested by S Jain (as mentioned in the comments of his post, this method doesn't work with data streams, which most sliding window questions are designed for).
The reason that method works is explained using the following example:
array = [5, 6, 2, 3, 1, 4, 2, 3]
k = 4
[5, 6, 2, 3 1, 4, 2, 3 ]
LR: 5 6 6 6 1 4 4 4
RL: 6 6 3 3 4 4 3 3
6 6 4 4 4
To get the max for the window [2, 3, 1, 4],
we can get the max of [2, 3] and max of [1, 4], and return the bigger of the two.
Max of [2, 3] is calculated in the RL pass and max of [1, 4] is calculated in LR pass.
Using Fibonacci heap, you can do it in O(n + (n-k) log k), which is equal to O(n log k) for small k, for k close to n this becomes O(n).
The algorithm: in fact, you need:
n inserts to the heap
n-k deletions
n-k findmax's
How much these operations cost in Fibonacci heaps? Insert and findmax is O(1) amortized, deletion is O(log n) amortized. So, we have
O(n + (n-k) log k + (n-k)) = O(n + (n-k) log k)
Sorry, this should have been a comment but I am not allowed to comment for now.
#leo and #Clay Goddard
You can save yourselves from re-computing the maximum by storing both maximum and 2nd maximum of the window in the beginning
(2nd maximum will be the maximum only if there are two maximums in the initial window). If the maximum slides out of the window you still have the next best candidate to compare with the new entry. So you get O(n) , otherwise if you allowed the whole re-computation again the worst case order would be O(nk), k is the window size.
class MaxFinder
{
// finds the max and its index
static int[] findMaxByIteration(int arr[], int start, int end)
{
int max, max_ndx;
max = arr[start];
max_ndx = start;
for (int i=start; i<end; i++)
{
if (arr[i] > max)
{
max = arr[i];
max_ndx = i;
}
}
int result[] = {max, max_ndx};
return result;
}
// optimized to skip iteration, when previous windows max element
// is present in current window
static void optimizedPrintKMax(int arr[], int n, int k)
{
int i, j, max, max_ndx;
// for first window - find by iteration.
int result[] = findMaxByIteration(arr, 0, k);
System.out.printf("%d ", result[0]);
max = result[0];
max_ndx = result[1];
for (j=1; j <= (n-k); j++)
{
// if previous max has fallen out of current window, iterate and find
if (max_ndx < j)
{
result = findMaxByIteration(arr, j, j+k);
max = result[0];
max_ndx = result[1];
}
// optimized path, just compare max with new_elem that has come into the window
else
{
int new_elem_ndx = j + (k-1);
if (arr[new_elem_ndx] > max)
{
max = arr[new_elem_ndx];
max_ndx = new_elem_ndx;
}
}
System.out.printf("%d ", max);
}
}
public static void main(String[] args)
{
int arr[] = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
//int arr[] = {1,5,2,6,3,1,24,7};
int n = arr.length;
int k = 3;
optimizedPrintKMax(arr, n, k);
}
}
package com;
public class SlidingWindow {
public static void main(String[] args) {
int[] array = { 1, 5, 2, 6, 3, 1, 24, 7 };
int slide = 3;//say
List<Integer> result = new ArrayList<Integer>();
for (int i = 0; i < array.length - (slide-1); i++) {
result.add(getMax(array, i, slide));
}
System.out.println("MaxList->>>>" + result.toString());
}
private static Integer getMax(int[] array, int i, int slide) {
List<Integer> intermediate = new ArrayList<Integer>();
System.out.println("Initial::" + intermediate.size());
while (intermediate.size() < slide) {
intermediate.add(array[i]);
i++;
}
Collections.sort(intermediate);
return intermediate.get(slide - 1);
}
}
Here is the solution in O(n) time complexity with auxiliary deque
public class TestSlidingWindow {
public static void main(String[] args) {
int[] arr = { 1, 5, 7, 2, 1, 3, 4 };
int k = 3;
printMaxInSlidingWindow(arr, k);
}
public static void printMaxInSlidingWindow(int[] arr, int k) {
Deque<Integer> queue = new ArrayDeque<Integer>();
Deque<Integer> auxQueue = new ArrayDeque<Integer>();
int[] resultArr = new int[(arr.length - k) + 1];
int maxElement = 0;
int j = 0;
for (int i = 0; i < arr.length; i++) {
queue.add(arr[i]);
if (arr[i] > maxElement) {
maxElement = arr[i];
}
/** we need to maintain the auxiliary deque to maintain max element in case max element is removed.
We add the element to deque straight away if subsequent element is less than the last element
(as there is a probability if last element is removed this element can be max element) otherwise
remove all lesser element then insert current element **/
if (auxQueue.size() > 0) {
if (arr[i] < auxQueue.peek()) {
auxQueue.push(arr[i]);
} else {
while (auxQueue.size() > 0 && (arr[i] > auxQueue.peek())) {
auxQueue.pollLast();
}
auxQueue.push(arr[i]);
}
}else {
auxQueue.push(arr[i]);
}
if (queue.size() > 3) {
int removedEl = queue.removeFirst();
if (maxElement == removedEl) {
maxElement = auxQueue.pollFirst();
}
}
if (queue.size() == 3) {
resultArr[j++] = maxElement;
}
}
for (int i = 0; i < resultArr.length; i++) {
System.out.println(resultArr[i]);
}
}
}
static void countDistinct(int arr[], int n, int k)
{
System.out.print("\nMaximum integer in the window : ");
// Traverse through every window
for (int i = 0; i <= n - k; i++) {
System.out.print(findMaximuminAllWindow(Arrays.copyOfRange(arr, i, arr.length), k)+ " ");
}
}
private static int findMaximuminAllWindow(int[] win, int k) {
// TODO Auto-generated method stub
int max= Integer.MIN_VALUE;
for(int i=0; i<k;i++) {
if(win[i]>max)
max=win[i];
}
return max;
}
arr = 1 5 2 6 3 1 24 7
We have to find the maximum of subarray, Right?
So, What is meant by subarray?
SubArray = Partial set and it should be in order and contiguous.
From the above array
{1,5,2} {6,3,1} {1,24,7} all are the subarray examples
n = 8 // Array length
k = 3 // window size
For finding the maximum, we have to iterate through the array, and find the maximum.
From the window size k,
{1,5,2} = 5 is the maximum
{5,2,6} = 6 is the maximum
{2,6,3} = 6 is the maximum
and so on..
ans = 5 6 6 6 24 24
It can be evaluated as the n-k+1
Hence, 8-3+1 = 6
And the length of an answer is 6 as we seen.
How can we solve this now?
When the data is moving from the pipe, the first thought for the data structure came in mind is the Queue
But, rather we are not discussing much here, we directly jump on the deque
Thinking Would be:
Window is fixed and data is in and out
Data is fixed and window is sliding
EX: Time series database
While (Queue is not empty and arr[Queue.back() < arr[i]] {
Queue.pop_back();
Queue.push_back();
For the rest:
Print the front of queue
// purged expired element
While (queue not empty and queue.front() <= I-k) {
Queue.pop_front();
While (Queue is not empty and arr[Queue.back() < arr[i]] {
Queue.pop_back();
Queue.push_back();
}
}
arr = [1, 2, 3, 1, 4, 5, 2, 3, 6]
k = 3
for i in range(len(arr)-k):
k=k+1
print (max(arr[i:k]),end=' ') #3 3 4 5 5 5 6
Two approaches.
Segment Tree O(nlog(n-k))
Build a maximum segment-tree.
Query between [i, i+k)
Something like..
public static void printMaximums(int[] a, int k) {
int n = a.length;
SegmentTree tree = new SegmentTree(a);
for (int i=0; i<=n-k; i++) System.out.print(tree.query(i, i+k));
}
Deque O(n)
If the next element is greater than the rear element, remove the rear element.
If the element in the front of the deque is out of the window, remove the front element.
public static void printMaximums(int[] a, int k) {
int n = a.length;
Deque<int[]> deck = new ArrayDeque<>();
List<Integer> result = new ArrayList<>();
for (int i=0; i<n; i++) {
while (!deck.isEmpty() && a[i] >= deck.peekLast()[0]) deck.pollLast();
deck.offer(new int[] {a[i], i});
while (!deck.isEmpty() && deck.peekFirst()[1] <= i - k) deck.pollFirst();
if (i >= k - 1) result.add(deck.peekFirst()[0]);
}
System.out.println(result);
}
Here is an optimized version of the naive (conditional) nested loop approach I came up with which is much faster and doesn't require any auxiliary storage or data structure.
As the program moves from window to window, the start index and end index moves forward by 1. In other words, two consecutive windows have adjacent start and end indices.
For the first window of size W , the inner loop finds the maximum of elements with index (0 to W-1). (Hence i == 0 in the if in 4th line of the code).
Now instead of computing for the second window which only has one new element, since we have already computed the maximum for elements of indices 0 to W-1, we only need to compare this maximum to the only new element in the new window with the index W.
But if the element at 0 was the maximum which is the only element not part of the new window, we need to compute the maximum using the inner loop from 1 to W again using the inner loop (hence the second condition maxm == arr[i-1] in the if in line 4), otherwise just compare the maximum of the previous window and the only new element in the new window.
void print_max_for_each_subarray(int arr[], int n, int k)
{
int maxm;
for(int i = 0; i < n - k + 1 ; i++)
{
if(i == 0 || maxm == arr[i-1]) {
maxm = arr[i];
for(int j = i+1; j < i+k; j++)
if(maxm < arr[j]) maxm = arr[j];
}
else {
maxm = maxm < arr[i+k-1] ? arr[i+k-1] : maxm;
}
cout << maxm << ' ';
}
cout << '\n';
}
You can use Deque data structure to implement this. Deque has an unique facility that you can insert and remove elements from both the ends of the queue unlike the traditional queue where you can only insert from one end and remove from other.
Following is the code for the above problem.
public int[] maxSlidingWindow(int[] nums, int k) {
int n = nums.length;
int[] maxInWindow = new int[n - k + 1];
Deque<Integer> dq = new LinkedList<Integer>();
int i = 0;
for(; i<k; i++){
while(!dq.isEmpty() && nums[dq.peekLast()] <= nums[i]){
dq.removeLast();
}
dq.addLast(i);
}
for(; i <n; i++){
maxInWindow[i - k] = nums[dq.peekFirst()];
while(!dq.isEmpty() && dq.peekFirst() <= i - k){
dq.removeFirst();
}
while(!dq.isEmpty() && nums[dq.peekLast()] <= nums[i]){
dq.removeLast();
}
dq.addLast(i);
}
maxInWindow[i - k] = nums[dq.peekFirst()];
return maxInWindow;
}
the resultant array will have n - k + 1 elements where n is length of the given array, k is the given window size.
We can solve it using the Python , applying the slicing.
def sliding_window(a,k,n):
max_val =[]
val =[]
val1=[]
for i in range(n-k-1):
if i==0:
val = a[0:k+1]
print("The value in val variable",val)
val1 = max(val)
max_val.append(val1)
else:
val = a[i:i*k+1]
val1 =max(val)
max_val.append(val1)
return max_val
Driver Code
a = [15,2,3,4,5,6,2,4,9,1,5]
n = len(a)
k = 3
sl=s liding_window(a,k,n)
print(sl)
Create a TreeMap of size k. Put first k elements as keys in it and assign any value like 1(doesn't matter). TreeMap has the property to sort the elements based on key so now, first element in map will be min and last element will be max element. Then remove 1 element from the map whose index in the arr is i-k. Here, I have considered that Input elements are taken in array arr and from that array we are filling the map of size k. Since, we can't do anything with sorting happening inside TreeMap, therefore this approach will also take O(n) time.
100% working Tested (Swift)
func maxOfSubArray(arr:[Int],n:Int,k:Int)->[Int]{
var lenght = arr.count
var resultArray = [Int]()
for i in 0..<arr.count{
if lenght+1 > k{
let tempArray = Array(arr[i..<k+i])
resultArray.append(tempArray.max()!)
}
lenght = lenght - 1
}
print(resultArray)
return resultArray
}
This way we can use:
maxOfSubArray(arr: [1,2,3,1,4,5,2,3,6], n: 9, k: 3)
Result:
[3, 3, 4, 5, 5, 5, 6]
Just notice that you only have to find in the new window if:
* The new element in the window is smaller than the previous one (if it's bigger, it's for sure this one).
OR
* The element that just popped out of the window was the current bigger.
In this case, re-scan the window.
for how big k? for reasonable-sized k. you can create k k-sized buffers and just iterate over the array keeping track of max element pointers in the buffers - needs no data structures and is O(n) k^2 pre-allocation.
A complete working solution in Amortised Constant O(1) Complexity.
https://github.com/varoonverma/code-challenge.git
Compare the first k elements and find the max, this is your first number
then compare the next element to the previous max. If the next element is bigger, that is your max of the next subarray, if its equal or smaller, the max for that sub array is the same
then move on to the next number
max(1 5 2) = 5
max(5 6) = 6
max(6 6) = 6
... and so on
max(3 24) = 24
max(24 7) = 24
It's only slightly better than your answer

How to find minimum positive contiguous sub sequence in O(n) time?

We have this algorithm for finding maximum positive sub sequence in given sequence in O(n) time. Can anybody suggest similar algorithm for finding minimum positive contiguous sub sequence.
For example
If given sequence is 1,2,3,4,5 answer should be 1.
[5,-4,3,5,4] ->1 is the minimum positive sum of elements [5,-4].
There cannot be such algorithm. The lower bound for this problem is O(n log n). I'll prove it by reducing the element distinctness problem to it (actually to the non-negative variant of it).
Let's suppose we have an O(n) algorithm for this problem (the minimum non-negative subarray).
We want to find out if an array (e.g. A=[1, 2, -3, 4, 2]) has only distinct elements. To solve this problem, I could construct an array with the difference between consecutive elements (e.g. A'=[1, -5, 7, -2]) and run the O(n) algorithm we have. The original array only has distinct elements if and only if the minimum non-negative subarray is greater than 0.
If we had an O(n) algorithm to your problem, we would have an O(n) algorithm to element distinctness problem, which we know is not possible on a Turing machine.
We can have a O(n log n) algorithm as follow:
Assuming that we have an array prefix, which index i stores the sum of array A from 0 to i, so the sum of sub-array (i, j) is prefix[j] - prefix[i - 1].
Thus, in order to find the minimum positive sub-array ending at index j, so, we need to find the maximum element prefix[x], which less than prefix[j] and x < j. We can find that element in O(log n) time if we use a binary search tree.
Pseudo code:
int[]prefix = new int[A.length];
prefix[0] = A[0];
for(int i = 1; i < A.length; i++)
prefix[i] = A[i] + prefix[i - 1];
int result = MAX_VALUE;
BinarySearchTree tree;
for(int i = 0; i < A.length; i++){
if(A[i] > 0)
result = min(result, A[i];
int v = tree.getMaximumElementLessThan(prefix[i]);
result = min(result, prefix[i] - v);
tree.add(prefix[i]);
}
I believe there's a O(n) algorithm, see below.
Note: it has a scale factor that might make it less attractive in practical applications: it depends on the (input) values to be processed, see remarks in the code.
private int GetMinimumPositiveContiguousSubsequenc(List<Int32> values)
{
// Note: this method has no precautions against integer over/underflow, which may occur
// if large (abs) values are present in the input-list.
// There must be at least 1 item.
if (values == null || values.Count == 0)
throw new ArgumentException("There must be at least one item provided to this method.");
// 1. Scan once to:
// a) Get the mimumum positive element;
// b) Get the value of the MAX contiguous sequence
// c) Get the value of the MIN contiguous sequence - allowing negative values: the mirror of the MAX contiguous sequence.
// d) Pinpoint the (index of the) first negative value.
int minPositive = 0;
int maxSequence = 0;
int currentMaxSequence = 0;
int minSequence = 0;
int currentMinSequence = 0;
int indxFirstNegative = -1;
for (int k = 0; k < values.Count; k++)
{
int value = values[k];
if (value > 0)
if (minPositive == 0 || value < minPositive)
minPositive = value;
else if (indxFirstNegative == -1 && value < 0)
indxFirstNegative = k;
currentMaxSequence += value;
if (currentMaxSequence <= 0)
currentMaxSequence = 0;
else if (currentMaxSequence > maxSequence)
maxSequence = currentMaxSequence;
currentMinSequence += value;
if (currentMinSequence >= 0)
currentMinSequence = 0;
else if (currentMinSequence < minSequence)
minSequence = currentMinSequence;
}
// 2. We're done if (a) there are no negatives, or (b) the minPositive (single) value is 1 (or 0...).
if (minSequence == 0 || minPositive <= 1)
return minPositive;
// 3. Real work to do.
// The strategy is as follows, iterating over the input values:
// a) Keep track of the cumulative value of ALL items - the sequence that starts with the very first item.
// b) Register each such cumulative value as "existing" in a bool array 'initialSequence' as we go along.
// We know already the max/min contiguous sequence values, so we can properly size that array in advance.
// Since negative sequence values occur we'll have an offset to match the index in that bool array
// with the corresponding value of the initial sequence.
// c) For each next input value to process scan the "initialSequence" bool array to see whether relevant entries are TRUE.
// We don't need to go over the complete array, as we're only interested in entries that would produce a subsequence with
// a value that is positive and also smaller than best-so-far.
// (As we go along, the range to check will normally shrink as we get better and better results.
// Also: initially the range is already limited by the single-minimum-positive value that we have found.)
// Performance-wise this approach (which is O(n)) is suitable IFF the number of input values is large (or at least: not small) relative to
// the spread between maxSequence and minSeqence: the latter two define the size of the array in which we will do (partial) linear traversals.
// If this condition is not met it may be more efficient to replace the bool array by a (binary) search tree.
// (which will result in O(n logn) performance).
// Since we know the relevant parameters at this point, we may below have the two strategies both implemented and decide run-time
// which to choose.
// The current implementation has only the fixed bool array approach.
// Initialize a variable to keep track of the best result 'so far'; it will also be the return value.
int minPositiveSequence = minPositive;
// The bool array to keep track of which (total) cumulative values (always with the sequence starting at element #0) have occurred so far,
// and the 'offset' - see remark 3b above.
int offset = -minSequence;
bool[] initialSequence = new bool[maxSequence + offset + 1];
int valueCumulative = 0;
for (int k = 0; k < indxFirstNegative; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
}
for (int k = indxFirstNegative; k < values.Count; k++)
{
int value = values[k];
valueCumulative += value;
initialSequence[offset + valueCumulative] = true;
// Check whether the difference with any previous "cumulative" may improve the optimum-so-far.
// the index that, if the entry is TRUE, would yield the best possible result.
int indexHigh = valueCumulative + offset - 1;
// the last (lowest) index that, if the entry is TRUE, would still yield an improvement over what we have so far.
int indexLow = Math.Max(0, valueCumulative + offset - minPositiveSequence + 1);
for (int indx = indexHigh; indx >= indexLow; indx--)
{
if (initialSequence[indx])
{
minPositiveSequence = valueCumulative - indx + offset;
if (minPositiveSequence == 1)
return minPositiveSequence;
break;
}
}
}
return minPositiveSequence;
}
}

How to find all taxicab numbers less than N?

A taxicab number is an integer that can be expressed as the sum of two cubes of integers in two different ways: a^3+b^3 = c^3+d^3. Design an algorithm to find all taxicab numbers with a, b, c, and d less than N.
Please give both the space and time complexity in terms of N.
I could do it in o(N^2.logN) time with O(N^2) space.
Best algorithm I've found so far:
Form all pairs: N^2
Sort the sum: N^2 logN
Find duplicates less than N
But this takes N^2 space. Can we do better?
But this takes N^2 space. Can we do better?
There exists an O(N) space solution based on a priority queue. Time complexity is O(N^2 logN). To sketch out the idea of the algorithm, here is the matrix M such that M[i][j] = i^3 + j^3 (of course, the matrix is never created in memory):
0 1 8 27 64 125
1 2 9 28 65 126
8 9 16 35 72 133
27 28 35 54 91 152
64 65 72 91 128 189
125 126 133 152 189 250
Observe that every line and every row is sorted in ascending order. Let PQ be the priority queue. First we put the biggest element in the priority queue. Then perform the following, as long as the PQ is not empty:
Pop the biggest element from PQ
add adjacent element above if the PQ doesn't have any element from that row
add adjacent element on the left if the PQ doesn't have any element from that column, and if it is not under the diagonal of the matrix (to avoid redundant elements)
Note that
You don't need to create the matrix in memory to implement the algorithm
The elements will be popped from the PQ in descending order, from the biggest element of the matrix to its smallest one (avoiding elements from the redundant half part of the matrix).
Everytime the PQ issues the same value twice then we have found a taxicab number.
As an illustration, here is an implementation in C++. The time complexity is O(N^2 logN) and space complexity O(N).
#include <iostream>
#include <cassert>
#include <queue>
using namespace std;
typedef unsigned int value_type;
struct Square
{
value_type i;
value_type j;
value_type sum_of_cubes;
Square(value_type i, value_type j) : i(i), j(j), sum_of_cubes(i*i*i+j*j*j) {}
friend class SquareCompare;
bool taxicab(const Square& sq) const
{
return sum_of_cubes == sq.sum_of_cubes && i != sq.i && i != sq.j;
}
friend ostream& operator<<(ostream& os, const Square& sq);
};
class SquareCompare
{
public:
bool operator()(const Square& a, const Square& b)
{
return a.sum_of_cubes < b.sum_of_cubes;
}
};
ostream& operator<<(ostream& os, const Square& sq)
{
return os << sq.i << "^3 + " << sq.j << "^3 = " << sq.sum_of_cubes;
}
int main()
{
const value_type N=2001;
value_type count = 0;
bool in_i [N];
bool in_j [N];
for (value_type i=0; i<N; i++) {
in_i[i] = false;
in_j[i] = false;
}
priority_queue<Square, vector<Square>, SquareCompare> p_queue;
p_queue.push(Square(N-1, N-1));
in_i[N-1] = true;
in_j[N-1] = true;
while(!p_queue.empty()) {
Square sq = p_queue.top();
p_queue.pop();
in_i[sq.i] = false;
in_j[sq.j] = false;
// cout << "pop " << sq.i << " " << sq.j << endl;
if (sq.i > 0 && !in_i[sq.i - 1] && sq.i-1 >= sq.j) {
p_queue.push(Square(sq.i-1, sq.j));
in_i[sq.i-1] = true;
in_j[sq.j] = true;
// cout << "push " << sq.i-1 << " " << sq.j << endl;
}
if (sq.j > 0 && !in_j[sq.j-1] && sq.i >= sq.j - 1) {
p_queue.push(Square(sq.i, sq.j-1));
in_i[sq.i] = true;
in_j[sq.j - 1] = true;
// cout << "push " << sq.i << " " << sq.j-1 << endl;
}
if (sq.taxicab(p_queue.top())) {
/* taxicab number */
cout << sq << " " << p_queue.top() << endl;
count++;
}
}
cout << endl;
cout << "there are " << count << " taxicab numbers with a, b, c, d < " << N << endl;
return 0;
}
The answers given by Novneet Nov and user3017842 are both correct ideas for finding the taxicab numbers with storage O(N) using minHeap.
Just a little bit more explanation why the minHeap of size N works.
First, if you had all the sums (O(N^2)) and could sort them (O(N^2lgN)) you would just pick the duplicates as you traverse the sorted array. Well, in our case using a minHeap we can traverse in-order all the sums: we just need to ensure that the minHeap always contains the minimum unprocessed sum.
Now, we have a huge number of sums (O(N^2)). But, notice that this number can be split into N groups each of which has an easily defined minimum!
(fix a, change b from 0 to N-1 => here are your N groups. The sum in one group with a smaller b is smaller than one with a bigger b in the same group - because a is the same).
The minimum of union of these groups is in the union of mins of these
groups. Therefore, if you keep all minimums of these groups in the
minHeap you are guaranteed to have the total minimum in the minHeap.
Now, when you extract Min from the heap, you just add next smallest element from the group of this extracted min (so if you extracted (a, b) you add (a, b+1)) and you are guaranteed that your minHeap still contains the next unprocessed min of all the sums.
I found the solution/code here : Time complexity O(N^2 logN), space complexity O(N)
The solution is implemented by help of priority queues.
Reverse thinking can be easily done by looking at the code. It can be done in an array of size N because the min sums are deleted from the array after comparing to the next minimum and then the array is made to size N by adding a new sum - (i^3 + (j+1)^3).
A intuitive proof is here :
Initially, we have added (1,1),(2,2),(3,3),...,(N,N) in the min-priority queue.
Suppose a^+b^3=c^3+d^3, and (a,b) is the minimum that will be taken out of the priority queue next. To be able to detect this taxicab number, (c,d) must also be in the priority queue which would be taken out after (a,b).
Note: We would be adding (a,b+1) after extracting (a,b) so there is no way that extraction of (a,b) would result in addition of (c,d) to the priority queue, so it must already exist in the priority queue.
Now lets assume that (c,d) is not in the priority queue, because we haven't gotten to it yet. Instead, there is some (c,d−k) in the priority queue where k>0.
Since (a,b) is being taken out,
a^3+b^3≤c^3+(d−k)^3
However, a^3+b^3=c^3+d^3
Therefore,
c^3+d^3≤c^3+(d−k)^3
d≤d−k
k≤0
Since k>0, this is impossible. Thus our assumption can never come to pass.
Thus for every (a,b) which is being removed from the min-PQ, (c,d) is already in the min-PQ (or was just removed) if a^3+b^3=c^3+d^3
The time complexity of the algorithm can't be less than O(N2) in any case, since you might print up to O(N2) taxicab numbers.
To reduce space usage you could, in theory, use the suggestion mentioned here: little link. Basically, the idea is that first you try all possible pairs a, b and find the solution to this:
a = 1 − (p − 3 * q)(p2 + 3 * q2)
b = −1 + (p + 3 * q)(p2 + 3q2)
Then you can find the appropriate c, d pair using:
c = (p + 3 * q) - (p2 + 3 * q2)
d = -(p - 3 * q) + (p2 + 3 * q2)
and check whether they are both less than N. The issue here is that solving that system of equations might get a bit messy (by 'a bit' I mean very tedious).
The O(N2) space solution is much simpler, and it'd probably be efficient enough since anything of quadratic time complexity that can run in reasonable time limits will probably be fine with quadratic space usage.
I hope that helped!
version1 uses List and sorting
O(n^2*logn) time and O(n^2) space
public static void Taxicab1(int n)
{
// O(n^2) time and O(n^2) space
var list = new List<int>();
for (int i = 1; i <= n; i++)
{
for (int j = i; j <= n; j++)
{
list.Add(i * i * i + j * j * j);
}
}
// O(n^2*log(n^2)) time
list.Sort();
// O(n^2) time
int prev = -1;
foreach (var next in list)
{
if (prev == next)
{
Console.WriteLine(prev);
}
prev = next;
}
}
version2 uses HashSet
O(n^2) time and O(n^2) space
public static void Taxicab2(int n)
{
// O(n^2) time and O(n^2) space
var set = new HashSet<int>();
for (int i = 1; i <= n; i++)
{
for (int j = i; j <= n; j++)
{
int x = i * i * i + j * j * j;
if (!set.Add(x))
{
Console.WriteLine(x);
}
}
}
}
version3 uses min oriented Priority Queue
O(n^2*logn) time and O(n) space
public static void Taxicab3(int n)
{
// O(n) time and O(n) space
var pq = new MinPQ<SumOfCubes>();
for (int i = 1; i <= n; i++)
{
pq.Push(new SumOfCubes(i, i));
}
// O(n^2*logn) time
var sentinel = new SumOfCubes(0, 0);
while (pq.Count > 0)
{
var current = pq.Pop();
if (current.Result == sentinel.Result)
Console.WriteLine($"{sentinel.A}^3+{sentinel.B}^3 = {current.A}^3+{current.B}^3 = {current.Result}");
if (current.B <= n)
pq.Push(new SumOfCubes(current.A, current.B + 1));
sentinel = current;
}
}
where SummOfCubes
public class SumOfCubes : IComparable<SumOfCubes>
{
public int A { get; private set; }
public int B { get; private set; }
public int Result { get; private set; }
public SumOfCubes(int a, int b)
{
A = a;
B = b;
Result = a * a * a + b * b * b;
}
public int CompareTo(SumOfCubes other)
{
return Result.CompareTo(other.Result);
}
}
github
create an array: 1^3, 2^3, 3^3, 4^3, ....... k^3. such that k^3 < N and (k+1)^3 > N. the array size would be ~ (N)^(1/3). the array is sorted order.
use 2sum technique (link) in lineal time proportional to the array size. if we find 2 pairs of numbers, that is a hit.
looping through step 2 by decreasing N by 1 each time.
This will use O(N^(1/3)) extra space and ~ O(N^(4/3)) time.
A easy way of understanding Time complexity O(N^2 logN), space complexity O(N) is to think it as a merge of N sorted arrays plus a bookkeeping of the previously merged element.
It seems like a simple brute-force algorithm with proper bounds solves it in time proportional to n^1.33 and space proportional to n. Or could anyone point me to the place where I'm mistaken?
Consider 4 nested loops, each running from 1 to cubic root of n. Using these loops we can go over all possible combinations of 4 values and find the pairs forming taxicab numbers. It means each loop takes time proportional to cubic root of n, or n^(1/3). Multiply this value 4 times and get:
(n^(1/3)^4 = n^(4/3) = n^1.33
I wrote a solution in JavaScript and benchmarked it, and it seems to be working. One caveat is that the result is only partially sorted.
Here is my JavaScript code (it's not optimal yet, could be optimized even more):
function taxicab(n) {
let a = 1, b = 1, c = 1, d = 1,
cubeA = a**3 + b**3,
cubeB = c**3 + d**3,
results = [];
while (cubeA < n) { // loop over a
while (cubeA < n) { // loop over b
// avoid running nested loops if this number is already in results
if (results.indexOf(cubeA) === -1) {
while (cubeB <= cubeA) { // loop over c
while (cubeB <= cubeA) { // loop over d
if (cubeB === cubeA && a!=c && a!=d) { // found a taxicab number!
results.push(cubeA);
}
d++;
cubeB = c**3 + d**3;
} // end loop over d
c++;
d = c;
cubeB = c**3 + d**3;
} // end loop over c
}
b++;
cubeA = a**3 + b**3;
c = d = 1;
cubeB = c**3 + d**3;
} // end loop over d
a++;
b = a;
cubeA = a**3 + b**3;
} // end loop over a
return results;
}
Running taxicab(1E8) takes around 30 seconds in a browser console and yields 485 numbers as a result. Ten times smaller value taxicab(1E7) (10 millions) takes almost 1.4 seconds and yields 150 numbers. 10^1.33 * 1.4 = 29.9, i.e. multiplying n by 10 leads to the running time increased by 10^1.33 times. The result array is unsorted, but after quickly sorting it we get correct result, as it seems:
[1729, 4104, 13832, 20683, 32832, 39312, 40033, 46683, 64232, 65728,
110656, 110808, 134379, 149389, 165464, 171288, 195841, 216027, 216125,
262656, 314496, 320264, 327763, 373464, 402597, 439101, 443889, 513000,
513856, 515375, 525824, 558441, 593047, 684019, 704977, 805688, 842751,
885248, 886464, 920673, 955016, 984067, 994688, 1009736, 1016496, 1061424,
1073375, 1075032, 1080891, 1092728, 1195112, 1260441, 1323712, 1331064,
1370304, 1407672, 1533357, 1566728, 1609272, 1728216, 1729000, 1734264,
1774656, 1845649, 2048391, 2101248, 2301299, 2418271, 2515968, 2562112,
2585375, 2622104, 2691451, 2864288, 2987712, 2991816, 3220776, 3242197,
3375001, 3375008, 3511872, 3512808, 3551112, 3587409, 3628233, 3798613,
3813992, 4033503, 4104000, 4110848, 4123000, 4174281, 4206592, 4342914,
4467528, 4505949, 4511808, 4607064, 4624776, 4673088, …]
Here is a code for benchmarking:
// run taxicab(n) for k trials and return the average running time
function benchmark(n, k) {
let t = 0;
k = k || 1; // how many times to repeat the trial to get an averaged result
for(let i = 0; i < k; i++) {
let t1 = new Date();
taxicab(n);
let t2 = new Date();
t += t2 - t1;
}
return Math.round(t/k);
}
Finally, I tested it:
let T = benchmark(1E7, 3); // 1376 - running time for n = 10 million
let T2 = benchmark(2E7, 3);// 4821 - running time for n = 20 million
let powerLaw = Math.log2(T2/T); // 1.3206693816701993
So it means time is proportional to n^1.32 in this test. Repeating this many times with different values always yields around the same result: from 1.3 to 1.4.
First of all, we will construct the taxicab numbers instead of searching for them. The range we will use to construct a taxicab number i.e Ta(2) will go up to n^1/3 not n. Because if you cube a number bigger than n^1/3 it will be bigger than n and also we can't cube negative numbers to prevent that case by definition. We will use a HashSet to remember the sums of two cubed numbers in the algorithm. This will help us to lookup previous cubed sums in O(1) time while we are iterating over every possible pair of numbers in the range I mentioned earlier.
Time complexity: O(n^2/3)
Space complexity: O(n^1/3)
def taxicab_numbers(n: int) -> list[int]:
taxicab_numbers = []
max_num = math.floor(n ** (1. / 3.))
seen_sums = set()
for i in range(1, max_num + 1):
for j in range(i, max_num + 1):
cube_sum = i ** 3 + j ** 3
if cube_sum in seen_sums:
taxicab_numbers.append(cube_sum)
else:
seen_sums.add(cube_sum)
return taxicab_numbers
import java.util.*;
public class A5Q24 {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter number:");
int n = sc.nextInt();
// start checking every int less than the input
for (int a = 2;a <= n;a++) {
int count = 0;
// number of ways that number be expressed in sum of two number cubes
for (int i = 1; Math.pow(i, 3) < a; i++) {
// if the cube of number smaller is greater than the number than it goes out
for (int j = 1; j <= i; j++) {
if (Math.pow(i, 3) + Math.pow(j, 3) == a)
count++;
}
}
if (count == 2)
System.out.println(a);
}
sc.close();
}
}
I think we can also do better on time (O (N ^ 2)) with O(N ^ 2) memory, using a hashmap to check if a pair of cubes has already be seen. In Python:
def find_taxicab_numbers(n: int) -> List[Tuple[int, int, int, int, int]]:
"""
find all taxicab numbers smaller than n, i.e. integers that can be expressed as the sum of two cubes of positive
integers in two different ways so that a^3 + b^3 = c^3 + d^3.
Time: O(n ^ 2) (two loops, one dict lookup). Space: O(n ^ 2)) (all possible cubes)
:param n: upper bound for a, b, c, d
:return: list of tuples of int: a, b, c, d, and taxicab numbers
"""
cubes = [i ** 3 for i in range(n)]
seen_sum_cubes = dict() # mapping sum cubes -> a, b
taxicabs = list() # list of a, b, c, d, taxicab
# check all possible sums of cubes
for i in range(n):
for j in range(i):
sum_cubes = cubes[i] + cubes[j]
if sum_cubes in seen_sum_cubes:
prev_i, prev_j = seen_sum_cubes[sum_cubes]
taxicabs.append((i, j, prev_i, prev_j, sum_cubes))
else:
seen_sum_cubes[sum_cubes] = (i, j)
return taxicabs

Finding maximum for every window of size k in an array

Given an array of size n and k, how do you find the maximum for every contiguous subarray of size k?
For example
arr = 1 5 2 6 3 1 24 7
k = 3
ans = 5 6 6 6 24 24
I was thinking of having an array of size k and each step evict the last element out and add the new element and find maximum among that. It leads to a running time of O(nk). Is there a better way to do this?
You have heard about doing it in O(n) using dequeue.
Well that is a well known algorithm for this question to do in O(n).
The method i am telling is quite simple and has time complexity O(n).
Your Sample Input:
n=10 , W = 3
10 3
1 -2 5 6 0 9 8 -1 2 0
Answer = 5 6 6 9 9 9 8 2
Concept: Dynamic Programming
Algorithm:
N is number of elements in an array and W is window size. So, Window number = N-W+1
Now divide array into blocks of W starting from index 1.
Here divide into blocks of size 'W'=3.
For your sample input:
We have divided into blocks because we will calculate maximum in 2 ways A.) by traversing from left to right B.) by traversing from right to left.
but how ??
Firstly, Traversing from Left to Right. For each element ai in block we will find maximum till that element ai starting from START of Block to END of that block.
So here,
Secondly, Traversing from Right to Left. For each element 'ai' in block we will find maximum till that element 'ai' starting from END of Block to START of that block.
So Here,
Now we have to find maximum for each subarray or window of size 'W'.
So, starting from index = 1 to index = N-W+1 .
max_val[index] = max(RL[index], LR[index+w-1]);
for index=1: max_val[1] = max(RL[1],LR[3]) = max(5,5)= 5
Simliarly, for all index i, (i<=(n-k+1)), value at RL[i] and LR[i+w-1]
are compared and maximum among those two is answer for that subarray.
So Final Answer : 5 6 6 9 9 9 8 2
Time Complexity: O(n)
Implementation code:
#include <iostream>
#include <cstdio>
#include <cstring>
#include <algorithm>
#define LIM 100001
using namespace std;
int arr[LIM]; // Input Array
int LR[LIM]; // maximum from Left to Right
int RL[LIM]; // maximum from Right to left
int max_val[LIM]; // number of subarrays(windows) will be n-k+1
int main(){
int n, w, i, k; // 'n' is number of elements in array
// 'w' is Window's Size
cin >> n >> w;
k = n - w + 1; // 'K' is number of Windows
for(i = 1; i <= n; i++)
cin >> arr[i];
for(i = 1; i <= n; i++){ // for maximum Left to Right
if(i % w == 1) // that means START of a block
LR[i] = arr[i];
else
LR[i] = max(LR[i - 1], arr[i]);
}
for(i = n; i >= 1; i--){ // for maximum Right to Left
if(i == n) // Maybe the last block is not of size 'W'.
RL[i] = arr[i];
else if(i % w == 0) // that means END of a block
RL[i] = arr[i];
else
RL[i] = max(RL[i+1], arr[i]);
}
for(i = 1; i <= k; i++) // maximum
max_val[i] = max(RL[i], LR[i + w - 1]);
for(i = 1; i <= k ; i++)
cout << max_val[i] << " ";
cout << endl;
return 0;
}
Running Code Link
I'll try to proof: (by #johnchen902)
If k % w != 1 (k is not the begin of a block)
Let k* = The begin of block containing k
ans[k] = max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k + w - 1])
= max( max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k*]),
max( arr[k*], arr[k* + 1], arr[k* + 2], ..., arr[k + w - 1]) )
= max( RL[k], LR[k+w-1] )
Otherwise (k is the begin of a block)
ans[k] = max( arr[k], arr[k + 1], arr[k + 2], ..., arr[k + w - 1])
= RL[k] = LR[k+w-1]
= max( RL[k], LR[k+w-1] )
Dynamic programming approach is very neatly explained by Shashank Jain. I would like to explain how to do the same using dequeue.
The key is to maintain the max element at the top of the queue(for a window ) and discarding the useless elements and we also need to discard the elements that are out of index of current window.
useless elements = If Current element is greater than the last element of queue than the last element of queue is useless .
Note : We are storing the index in queue not the element itself. It will be more clear from the code itself.
1. If Current element is greater than the last element of queue than the last element of queue is useless . We need to delete that last element.
(and keep deleting until the last element of queue is smaller than current element).
2. If if current_index - k >= q.front() that means we are going out of window so we need to delete the element from front of queue.
vector<int> max_sub_deque(vector<int> &A,int k)
{
deque<int> q;
for(int i=0;i<k;i++)
{
while(!q.empty() && A[i] >= A[q.back()])
q.pop_back();
q.push_back(i);
}
vector<int> res;
for(int i=k;i<A.size();i++)
{
res.push_back(A[q.front()]);
while(!q.empty() && A[i] >= A[q.back()] )
q.pop_back();
while(!q.empty() && q.front() <= i-k)
q.pop_front();
q.push_back(i);
}
res.push_back(A[q.front()]);
return res;
}
Since each element is enqueued and dequeued atmost 1 time to time complexity is O(n+n) = O(2n) = O(n).
And the size of queue can not exceed the limit k . so space complexity = O(k).
An O(n) time solution is possible by combining the two classic interview questions:
Make a stack data-structure (called MaxStack) which supports push, pop and max in O(1) time.
This can be done using two stacks, the second one contains the minimum seen so far.
Model a queue with a stack.
This can done using two stacks. Enqueues go into one stack, and dequeues come from the other.
For this problem, we basically need a queue, which supports enqueue, dequeue and max in O(1) (amortized) time.
We combine the above two, by modelling a queue with two MaxStacks.
To solve the question, we queue k elements, query the max, dequeue, enqueue k+1 th element, query the max etc. This will give you the max for every k sized sub-array.
I believe there are other solutions too.
1)
I believe the queue idea can be simplified. We maintain a queue and a max for every k. We enqueue a new element, and dequeu all elements which are not greater than the new element.
2) Maintain two new arrays which maintain the running max for each block of k, one array for one direction (left to right/right to left).
3) Use a hammer: Preprocess in O(n) time for range maximum queries.
The 1) solution above might be the most optimal.
You need a fast data structure that can add, remove and query for the max element in less than O(n) time (you can just use an array if O(n) or O(nlogn) is acceptable). You can use a heap, a balanced binary search tree, a skip list, or any other sorted data structure that performs these operations in O(log(n)).
The good news is that most popular languages have a sorted data structure implemented that supports these operations for you. C++ has std::set and std::multiset (you probably need the latter) and Java has PriorityQueue and TreeSet.
Here is the java implementation
public static Integer[] maxsInEveryWindows(int[] arr, int k) {
Deque<Integer> deque = new ArrayDeque<Integer>();
/* Process first k (or first window) elements of array */
for (int i = 0; i < k; i++) {
// For very element, the previous smaller elements are useless so
// remove them from deque
while (!deque.isEmpty() && arr[i] >= arr[deque.peekLast()]) {
deque.removeLast(); // Remove from rear
}
// Add new element at rear of queue
deque.addLast(i);
}
List<Integer> result = new ArrayList<Integer>();
// Process rest of the elements, i.e., from arr[k] to arr[n-1]
for (int i = k; i < arr.length; i++) {
// The element at the front of the queue is the largest element of
// previous window, so add to result.
result.add(arr[deque.getFirst()]);
// Remove all elements smaller than the currently
// being added element (remove useless elements)
while (!deque.isEmpty() && arr[i] >= arr[deque.peekLast()]) {
deque.removeLast();
}
// Remove the elements which are out of this window
while (!deque.isEmpty() && deque.getFirst() <= i - k) {
deque.removeFirst();
}
// Add current element at the rear of deque
deque.addLast(i);
}
// Print the maximum element of last window
result.add(arr[deque.getFirst()]);
return result.toArray(new Integer[0]);
}
Here is the corresponding test case
#Test
public void maxsInWindowsOfSizeKTest() {
Integer[] result = ArrayUtils.maxsInEveryWindows(new int[]{1, 2, 3, 1, 4, 5, 2, 3, 6}, 3);
assertThat(result, equalTo(new Integer[]{3, 3, 4, 5, 5, 5, 6}));
result = ArrayUtils.maxsInEveryWindows(new int[]{8, 5, 10, 7, 9, 4, 15, 12, 90, 13}, 4);
assertThat(result, equalTo(new Integer[]{10, 10, 10, 15, 15, 90, 90}));
}
Using a heap (or tree), you should be able to do it in O(n * log(k)). I'm not sure if this would be indeed better.
here is the Python implementation in O(1)...Thanks to #Shahshank Jain in advance..
from sys import stdin,stdout
from operator import *
n,w=map(int , stdin.readline().strip().split())
Arr=list(map(int , stdin.readline().strip().split()))
k=n-w+1 # window size = k
leftA=[0]*n
rightA=[0]*n
result=[0]*k
for i in range(n):
if i%w==0:
leftA[i]=Arr[i]
else:
leftA[i]=max(Arr[i],leftA[i-1])
for i in range(n-1,-1,-1):
if i%w==(w-1) or i==n-1:
rightA[i]=Arr[i]
else:
rightA[i]=max(Arr[i],rightA[i+1])
for i in range(k):
result[i]=max(rightA[i],leftA[i+w-1])
print(*result,sep=' ')
Method 1: O(n) time, O(k) space
We use a deque (it is like a list but with constant-time insertion and deletion from both ends) to store the index of useful elements.
The index of the current max is kept at the leftmost element of deque. The rightmost element of deque is the smallest.
In the following, for easier explanation we say an element from the array is in the deque, while in fact the index of that element is in the deque.
Let's say {5, 3, 2} are already in the deque (again, if fact their indexes are).
If the next element we read from the array is bigger than 5 (remember, the leftmost element of deque holds the max), say 7: We delete the deque and create a new one with only 7 in it (we do this because the current elements are useless, we have found a new max).
If the next element is less than 2 (which is the smallest element of deque), say 1: We add it to the right ({5, 3, 2, 1})
If the next element is bigger than 2 but less than 5, say 4: We remove elements from right that are smaller than the element and then add the element from right ({5, 4}).
Also we keep elements of the current window only (we can do this in constant time because we are storing the indexes instead of elements).
from collections import deque
def max_subarray(array, k):
deq = deque()
for index, item in enumerate(array):
if len(deq) == 0:
deq.append(index)
elif index - deq[0] >= k: # the max element is out of the window
deq.popleft()
elif item > array[deq[0]]: # found a new max
deq = deque()
deq.append(index)
elif item < array[deq[-1]]: # the array item is smaller than all the deque elements
deq.append(index)
elif item > array[deq[-1]] and item < array[deq[0]]:
while item > array[deq[-1]]:
deq.pop()
deq.append(index)
if index >= k - 1: # start printing when the first window is filled
print(array[deq[0]])
Proof of O(n) time: The only part we need to check is the while loop. In the whole runtime of the code, the while loop can perform at most O(n) operations in total. The reason is that the while loop pops elements from the deque, and since in other parts of the code, we do at most O(n) insertions into the deque, the while loop cannot exceed O(n) operations in total. So the total runtime is O(n) + O(n) = O(n)
Method 2: O(n) time, O(n) space
This is the explanation of the method suggested by S Jain (as mentioned in the comments of his post, this method doesn't work with data streams, which most sliding window questions are designed for).
The reason that method works is explained using the following example:
array = [5, 6, 2, 3, 1, 4, 2, 3]
k = 4
[5, 6, 2, 3 1, 4, 2, 3 ]
LR: 5 6 6 6 1 4 4 4
RL: 6 6 3 3 4 4 3 3
6 6 4 4 4
To get the max for the window [2, 3, 1, 4],
we can get the max of [2, 3] and max of [1, 4], and return the bigger of the two.
Max of [2, 3] is calculated in the RL pass and max of [1, 4] is calculated in LR pass.
Using Fibonacci heap, you can do it in O(n + (n-k) log k), which is equal to O(n log k) for small k, for k close to n this becomes O(n).
The algorithm: in fact, you need:
n inserts to the heap
n-k deletions
n-k findmax's
How much these operations cost in Fibonacci heaps? Insert and findmax is O(1) amortized, deletion is O(log n) amortized. So, we have
O(n + (n-k) log k + (n-k)) = O(n + (n-k) log k)
Sorry, this should have been a comment but I am not allowed to comment for now.
#leo and #Clay Goddard
You can save yourselves from re-computing the maximum by storing both maximum and 2nd maximum of the window in the beginning
(2nd maximum will be the maximum only if there are two maximums in the initial window). If the maximum slides out of the window you still have the next best candidate to compare with the new entry. So you get O(n) , otherwise if you allowed the whole re-computation again the worst case order would be O(nk), k is the window size.
class MaxFinder
{
// finds the max and its index
static int[] findMaxByIteration(int arr[], int start, int end)
{
int max, max_ndx;
max = arr[start];
max_ndx = start;
for (int i=start; i<end; i++)
{
if (arr[i] > max)
{
max = arr[i];
max_ndx = i;
}
}
int result[] = {max, max_ndx};
return result;
}
// optimized to skip iteration, when previous windows max element
// is present in current window
static void optimizedPrintKMax(int arr[], int n, int k)
{
int i, j, max, max_ndx;
// for first window - find by iteration.
int result[] = findMaxByIteration(arr, 0, k);
System.out.printf("%d ", result[0]);
max = result[0];
max_ndx = result[1];
for (j=1; j <= (n-k); j++)
{
// if previous max has fallen out of current window, iterate and find
if (max_ndx < j)
{
result = findMaxByIteration(arr, j, j+k);
max = result[0];
max_ndx = result[1];
}
// optimized path, just compare max with new_elem that has come into the window
else
{
int new_elem_ndx = j + (k-1);
if (arr[new_elem_ndx] > max)
{
max = arr[new_elem_ndx];
max_ndx = new_elem_ndx;
}
}
System.out.printf("%d ", max);
}
}
public static void main(String[] args)
{
int arr[] = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
//int arr[] = {1,5,2,6,3,1,24,7};
int n = arr.length;
int k = 3;
optimizedPrintKMax(arr, n, k);
}
}
package com;
public class SlidingWindow {
public static void main(String[] args) {
int[] array = { 1, 5, 2, 6, 3, 1, 24, 7 };
int slide = 3;//say
List<Integer> result = new ArrayList<Integer>();
for (int i = 0; i < array.length - (slide-1); i++) {
result.add(getMax(array, i, slide));
}
System.out.println("MaxList->>>>" + result.toString());
}
private static Integer getMax(int[] array, int i, int slide) {
List<Integer> intermediate = new ArrayList<Integer>();
System.out.println("Initial::" + intermediate.size());
while (intermediate.size() < slide) {
intermediate.add(array[i]);
i++;
}
Collections.sort(intermediate);
return intermediate.get(slide - 1);
}
}
Here is the solution in O(n) time complexity with auxiliary deque
public class TestSlidingWindow {
public static void main(String[] args) {
int[] arr = { 1, 5, 7, 2, 1, 3, 4 };
int k = 3;
printMaxInSlidingWindow(arr, k);
}
public static void printMaxInSlidingWindow(int[] arr, int k) {
Deque<Integer> queue = new ArrayDeque<Integer>();
Deque<Integer> auxQueue = new ArrayDeque<Integer>();
int[] resultArr = new int[(arr.length - k) + 1];
int maxElement = 0;
int j = 0;
for (int i = 0; i < arr.length; i++) {
queue.add(arr[i]);
if (arr[i] > maxElement) {
maxElement = arr[i];
}
/** we need to maintain the auxiliary deque to maintain max element in case max element is removed.
We add the element to deque straight away if subsequent element is less than the last element
(as there is a probability if last element is removed this element can be max element) otherwise
remove all lesser element then insert current element **/
if (auxQueue.size() > 0) {
if (arr[i] < auxQueue.peek()) {
auxQueue.push(arr[i]);
} else {
while (auxQueue.size() > 0 && (arr[i] > auxQueue.peek())) {
auxQueue.pollLast();
}
auxQueue.push(arr[i]);
}
}else {
auxQueue.push(arr[i]);
}
if (queue.size() > 3) {
int removedEl = queue.removeFirst();
if (maxElement == removedEl) {
maxElement = auxQueue.pollFirst();
}
}
if (queue.size() == 3) {
resultArr[j++] = maxElement;
}
}
for (int i = 0; i < resultArr.length; i++) {
System.out.println(resultArr[i]);
}
}
}
static void countDistinct(int arr[], int n, int k)
{
System.out.print("\nMaximum integer in the window : ");
// Traverse through every window
for (int i = 0; i <= n - k; i++) {
System.out.print(findMaximuminAllWindow(Arrays.copyOfRange(arr, i, arr.length), k)+ " ");
}
}
private static int findMaximuminAllWindow(int[] win, int k) {
// TODO Auto-generated method stub
int max= Integer.MIN_VALUE;
for(int i=0; i<k;i++) {
if(win[i]>max)
max=win[i];
}
return max;
}
arr = 1 5 2 6 3 1 24 7
We have to find the maximum of subarray, Right?
So, What is meant by subarray?
SubArray = Partial set and it should be in order and contiguous.
From the above array
{1,5,2} {6,3,1} {1,24,7} all are the subarray examples
n = 8 // Array length
k = 3 // window size
For finding the maximum, we have to iterate through the array, and find the maximum.
From the window size k,
{1,5,2} = 5 is the maximum
{5,2,6} = 6 is the maximum
{2,6,3} = 6 is the maximum
and so on..
ans = 5 6 6 6 24 24
It can be evaluated as the n-k+1
Hence, 8-3+1 = 6
And the length of an answer is 6 as we seen.
How can we solve this now?
When the data is moving from the pipe, the first thought for the data structure came in mind is the Queue
But, rather we are not discussing much here, we directly jump on the deque
Thinking Would be:
Window is fixed and data is in and out
Data is fixed and window is sliding
EX: Time series database
While (Queue is not empty and arr[Queue.back() < arr[i]] {
Queue.pop_back();
Queue.push_back();
For the rest:
Print the front of queue
// purged expired element
While (queue not empty and queue.front() <= I-k) {
Queue.pop_front();
While (Queue is not empty and arr[Queue.back() < arr[i]] {
Queue.pop_back();
Queue.push_back();
}
}
arr = [1, 2, 3, 1, 4, 5, 2, 3, 6]
k = 3
for i in range(len(arr)-k):
k=k+1
print (max(arr[i:k]),end=' ') #3 3 4 5 5 5 6
Two approaches.
Segment Tree O(nlog(n-k))
Build a maximum segment-tree.
Query between [i, i+k)
Something like..
public static void printMaximums(int[] a, int k) {
int n = a.length;
SegmentTree tree = new SegmentTree(a);
for (int i=0; i<=n-k; i++) System.out.print(tree.query(i, i+k));
}
Deque O(n)
If the next element is greater than the rear element, remove the rear element.
If the element in the front of the deque is out of the window, remove the front element.
public static void printMaximums(int[] a, int k) {
int n = a.length;
Deque<int[]> deck = new ArrayDeque<>();
List<Integer> result = new ArrayList<>();
for (int i=0; i<n; i++) {
while (!deck.isEmpty() && a[i] >= deck.peekLast()[0]) deck.pollLast();
deck.offer(new int[] {a[i], i});
while (!deck.isEmpty() && deck.peekFirst()[1] <= i - k) deck.pollFirst();
if (i >= k - 1) result.add(deck.peekFirst()[0]);
}
System.out.println(result);
}
Here is an optimized version of the naive (conditional) nested loop approach I came up with which is much faster and doesn't require any auxiliary storage or data structure.
As the program moves from window to window, the start index and end index moves forward by 1. In other words, two consecutive windows have adjacent start and end indices.
For the first window of size W , the inner loop finds the maximum of elements with index (0 to W-1). (Hence i == 0 in the if in 4th line of the code).
Now instead of computing for the second window which only has one new element, since we have already computed the maximum for elements of indices 0 to W-1, we only need to compare this maximum to the only new element in the new window with the index W.
But if the element at 0 was the maximum which is the only element not part of the new window, we need to compute the maximum using the inner loop from 1 to W again using the inner loop (hence the second condition maxm == arr[i-1] in the if in line 4), otherwise just compare the maximum of the previous window and the only new element in the new window.
void print_max_for_each_subarray(int arr[], int n, int k)
{
int maxm;
for(int i = 0; i < n - k + 1 ; i++)
{
if(i == 0 || maxm == arr[i-1]) {
maxm = arr[i];
for(int j = i+1; j < i+k; j++)
if(maxm < arr[j]) maxm = arr[j];
}
else {
maxm = maxm < arr[i+k-1] ? arr[i+k-1] : maxm;
}
cout << maxm << ' ';
}
cout << '\n';
}
You can use Deque data structure to implement this. Deque has an unique facility that you can insert and remove elements from both the ends of the queue unlike the traditional queue where you can only insert from one end and remove from other.
Following is the code for the above problem.
public int[] maxSlidingWindow(int[] nums, int k) {
int n = nums.length;
int[] maxInWindow = new int[n - k + 1];
Deque<Integer> dq = new LinkedList<Integer>();
int i = 0;
for(; i<k; i++){
while(!dq.isEmpty() && nums[dq.peekLast()] <= nums[i]){
dq.removeLast();
}
dq.addLast(i);
}
for(; i <n; i++){
maxInWindow[i - k] = nums[dq.peekFirst()];
while(!dq.isEmpty() && dq.peekFirst() <= i - k){
dq.removeFirst();
}
while(!dq.isEmpty() && nums[dq.peekLast()] <= nums[i]){
dq.removeLast();
}
dq.addLast(i);
}
maxInWindow[i - k] = nums[dq.peekFirst()];
return maxInWindow;
}
the resultant array will have n - k + 1 elements where n is length of the given array, k is the given window size.
We can solve it using the Python , applying the slicing.
def sliding_window(a,k,n):
max_val =[]
val =[]
val1=[]
for i in range(n-k-1):
if i==0:
val = a[0:k+1]
print("The value in val variable",val)
val1 = max(val)
max_val.append(val1)
else:
val = a[i:i*k+1]
val1 =max(val)
max_val.append(val1)
return max_val
Driver Code
a = [15,2,3,4,5,6,2,4,9,1,5]
n = len(a)
k = 3
sl=s liding_window(a,k,n)
print(sl)
Create a TreeMap of size k. Put first k elements as keys in it and assign any value like 1(doesn't matter). TreeMap has the property to sort the elements based on key so now, first element in map will be min and last element will be max element. Then remove 1 element from the map whose index in the arr is i-k. Here, I have considered that Input elements are taken in array arr and from that array we are filling the map of size k. Since, we can't do anything with sorting happening inside TreeMap, therefore this approach will also take O(n) time.
100% working Tested (Swift)
func maxOfSubArray(arr:[Int],n:Int,k:Int)->[Int]{
var lenght = arr.count
var resultArray = [Int]()
for i in 0..<arr.count{
if lenght+1 > k{
let tempArray = Array(arr[i..<k+i])
resultArray.append(tempArray.max()!)
}
lenght = lenght - 1
}
print(resultArray)
return resultArray
}
This way we can use:
maxOfSubArray(arr: [1,2,3,1,4,5,2,3,6], n: 9, k: 3)
Result:
[3, 3, 4, 5, 5, 5, 6]
Just notice that you only have to find in the new window if:
* The new element in the window is smaller than the previous one (if it's bigger, it's for sure this one).
OR
* The element that just popped out of the window was the current bigger.
In this case, re-scan the window.
for how big k? for reasonable-sized k. you can create k k-sized buffers and just iterate over the array keeping track of max element pointers in the buffers - needs no data structures and is O(n) k^2 pre-allocation.
A complete working solution in Amortised Constant O(1) Complexity.
https://github.com/varoonverma/code-challenge.git
Compare the first k elements and find the max, this is your first number
then compare the next element to the previous max. If the next element is bigger, that is your max of the next subarray, if its equal or smaller, the max for that sub array is the same
then move on to the next number
max(1 5 2) = 5
max(5 6) = 6
max(6 6) = 6
... and so on
max(3 24) = 24
max(24 7) = 24
It's only slightly better than your answer

How to find the kth largest element in an unsorted array of length n in O(n)?

I believe there's a way to find the kth largest element in an unsorted array of length n in O(n). Or perhaps it's "expected" O(n) or something. How can we do this?
This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.
Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):
Select(A,n,i):
Divide input into ⌈n/5⌉ groups of size 5.
/* Partition on median-of-medians */
medians = array of each group’s median.
pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
Left Array L and Right Array G = partition(A, pivot)
/* Find ith element in L, pivot, or G */
k = |L| + 1
If i = k, return pivot
If i < k, return Select(L, k-1, i)
If i > k, return Select(G, n-k, i-k)
It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.
If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)
The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.
Here is the algorithm.
QuickSelect(A, k)
let r be chosen uniformly at random in the range 1 to length(A)
let pivot = A[r]
let A1, A2 be new arrays
# split into a pile A1 of small elements and A2 of big elements
for i = 1 to n
if A[i] < pivot then
append A[i] to A1
else if A[i] > pivot then
append A[i] to A2
else
# do nothing
end for
if k <= length(A1):
# it's in the pile of small elements
return QuickSelect(A1, k)
else if k > length(A) - length(A2)
# it's in the pile of big elements
return QuickSelect(A2, k - (length(A) - length(A2))
else
# it's equal to the pivot
return pivot
What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of
T(n) = Theta(n) + T(n-1) = Theta(n2)
But if the choices are indeed random, the expected running time is given by
T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))
where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.
Let's guess that T(n) <= an for some a. Then we get
T(n)
<= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
= cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n ai
and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:
∑i=floor(n/2) to n i
= ∑i=1 to n i - ∑i=1 to floor(n/2) i
= n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2
<= n2/2 - (n/4)2/2
= (15/32)n2
where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with
cn + 2 (1/n) ∑i=floor(n/2) to n ai,
<= cn + (2a/n) (15/32) n2
= n (c + (15/16)a)
<= an
provided a > 16c.
This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).
A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17
"Make one pass through tracking the three largest values so far."
(it was specifically for 3d largest)
and this answer:
Build a heap/priority queue. O(n)
Pop top element. O(log n)
Pop top element. O(log n)
Pop top element. O(log n)
Total = O(n) + 3 O(log n) = O(n)
You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).
A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.
I answered the letter of your question :)
The C++ standard library has almost exactly that function call nth_element, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.
const int N = ...;
double a[N];
// ...
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a
You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen.
For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.
Warren's priority heap solution is neater though.
Although not very sure about O(n) complexity, but it will be sure to be between O(n) and nLog(n). Also sure to be closer to O(n) than nLog(n). Function is written in Java
public int quickSelect(ArrayList<Integer>list, int nthSmallest){
//Choose random number in range of 0 to array length
Random random = new Random();
//This will give random number which is not greater than length - 1
int pivotIndex = random.nextInt(list.size() - 1);
int pivot = list.get(pivotIndex);
ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();
//Split list into two.
//Value smaller than pivot should go to smallerNumberList
//Value greater than pivot should go to greaterNumberList
//Do nothing for value which is equal to pivot
for(int i=0; i<list.size(); i++){
if(list.get(i)<pivot){
smallerNumberList.add(list.get(i));
}
else if(list.get(i)>pivot){
greaterNumberList.add(list.get(i));
}
else{
//Do nothing
}
}
//If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list
if(nthSmallest < smallerNumberList.size()){
return quickSelect(smallerNumberList, nthSmallest);
}
//If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
//The step is bit tricky. If confusing, please see the above loop once again for clarification.
else if(nthSmallest > (list.size() - greaterNumberList.size())){
//nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in
//smallerNumberList
nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
return quickSelect(greaterNumberList,nthSmallest);
}
else{
return pivot;
}
}
I implemented finding kth minimimum in n unsorted elements using dynamic programming, specifically tournament method. The execution time is O(n + klog(n)). The mechanism used is listed as one of methods on Wikipedia page about Selection Algorithm (as indicated in one of the posting above). You can read about the algorithm and also find code (java) on my blog page Finding Kth Minimum. In addition the logic can do partial ordering of the list - return first K min (or max) in O(klog(n)) time.
Though the code provided result kth minimum, similar logic can be employed to find kth maximum in O(klog(n)), ignoring the pre-work done to create tournament tree.
Sexy quickselect in Python
def quickselect(arr, k):
'''
k = 1 returns first element in ascending order.
can be easily modified to return first element in descending order
'''
r = random.randrange(0, len(arr))
a1 = [i for i in arr if i < arr[r]] '''partition'''
a2 = [i for i in arr if i > arr[r]]
if k <= len(a1):
return quickselect(a1, k)
elif k > len(arr)-len(a2):
return quickselect(a2, k - (len(arr) - len(a2)))
else:
return arr[r]
As per this paper Finding the Kth largest item in a list of n items the following algorithm will take O(n) time in worst case.
Divide the array in to n/5 lists of 5 elements each.
Find the median in each sub array of 5 elements.
Recursively find the median of all the medians, lets call it M
Partition the array in to two sub array 1st sub-array contains the elements larger than M , lets say this sub-array is a1 , while other sub-array contains the elements smaller then M., lets call this sub-array a2.
If k <= |a1|, return selection (a1,k).
If k− 1 = |a1|, return M.
If k> |a1| + 1, return selection(a2,k −a1 − 1).
Analysis: As suggested in the original paper:
We use the median to partition the list into two halves(the first half,
if k <= n/2 , and the second half otherwise). This algorithm takes
time cn at the first level of recursion for some constant c, cn/2 at
the next level (since we recurse in a list of size n/2), cn/4 at the
third level, and so on. The total time taken is cn + cn/2 + cn/4 +
.... = 2cn = o(n).
Why partition size is taken 5 and not 3?
As mentioned in original paper:
Dividing the list by 5 assures a worst-case split of 70 − 30. Atleast
half of the medians greater than the median-of-medians, hence atleast
half of the n/5 blocks have atleast 3 elements and this gives a
3n/10 split, which means the other partition is 7n/10 in worst case.
That gives T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1, the
worst-case running time isO(n).
Now I have tried to implement the above algorithm as:
public static int findKthLargestUsingMedian(Integer[] array, int k) {
// Step 1: Divide the list into n/5 lists of 5 element each.
int noOfRequiredLists = (int) Math.ceil(array.length / 5.0);
// Step 2: Find pivotal element aka median of medians.
int medianOfMedian = findMedianOfMedians(array, noOfRequiredLists);
//Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian.
List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian
List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian
for (Integer element : array) {
if (element < medianOfMedian) {
listWithSmallerNumbers.add(element);
} else if (element > medianOfMedian) {
listWithGreaterNumbers.add(element);
}
}
// Next step.
if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k);
else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian;
else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1);
return -1;
}
public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) {
int[] medians = new int[noOfRequiredLists];
for (int count = 0; count < noOfRequiredLists; count++) {
int startOfPartialArray = 5 * count;
int endOfPartialArray = startOfPartialArray + 5;
Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray);
// Step 2: Find median of each of these sublists.
int medianIndex = partialArray.length/2;
medians[count] = partialArray[medianIndex];
}
// Step 3: Find median of the medians.
return medians[medians.length / 2];
}
Just for sake of completion, another algorithm makes use of Priority Queue and takes time O(nlogn).
public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) {
int p = 0;
int numElements = nums.length;
// create priority queue where all the elements of nums will be stored
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
// place all the elements of the array to this priority queue
for (int n : nums) {
pq.add(n);
}
// extract the kth largest element
while (numElements - k + 1 > 0) {
p = pq.poll();
k++;
}
return p;
}
Both of these algorithms can be tested as:
public static void main(String[] args) throws IOException {
Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
System.out.println(findKthLargestUsingMedian(numbers, 8));
System.out.println(findKthLargestUsingPriorityQueue(numbers, 8));
}
As expected output is:
18
18
Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies,
Now recurrence becomes:
T(n) = T(n/2) + cn
which gives me O (n) overal.
Below is the link to full implementation with quite an extensive explanation how the algorithm for finding Kth element in an unsorted algorithm works. Basic idea is to partition the array like in QuickSort. But in order to avoid extreme cases (e.g. when smallest element is chosen as pivot in every step, so that algorithm degenerates into O(n^2) running time), special pivot selection is applied, called median-of-medians algorithm. The whole solution runs in O(n) time in worst and in average case.
Here is link to the full article (it is about finding Kth smallest element, but the principle is the same for finding Kth largest):
Finding Kth Smallest Element in an Unsorted Array
How about this kinda approach
Maintain a buffer of length k and a tmp_max, getting tmp_max is O(k) and is done n times so something like O(kn)
Is it right or am i missing something ?
Although it doesn't beat average case of quickselect and worst case of median statistics method but its pretty easy to understand and implement.
There is also one algorithm, that outperforms quickselect algorithm. It's called Floyd-Rivets (FR) algorithm.
Original article: https://doi.org/10.1145/360680.360694
Downloadable version: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf
Wikipedia article https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm
I tried to implement quickselect and FR algorithm in C++. Also I compared them to the standard C++ library implementations std::nth_element (which is basically introselect hybrid of quickselect and heapselect). The result was quickselect and nth_element ran comparably on average, but FR algorithm ran approx. twice as fast compared to them.
Sample code that I used for FR algorithm:
template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
if (n == 0)
return *(std::min_element(data.begin(), data.end()));
else if (n == data.size() - 1)
return *(std::max_element(data.begin(), data.end()));
else
return _FRselect(data, 0, data.size() - 1, n);
}
template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
size_t leftIdx = left;
size_t rightIdx = right;
while (rightIdx > leftIdx)
{
if (rightIdx - leftIdx > 600)
{
size_t range = rightIdx - leftIdx + 1;
long long i = n - (long long)leftIdx + 1;
long long z = log(range);
long long s = 0.5 * exp(2 * z / 3);
long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);
size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);
_FRselect(data, newLeft, newRight, n);
}
T t = data[n];
size_t i = leftIdx;
size_t j = rightIdx;
// arrange pivot and right index
std::swap(data[leftIdx], data[n]);
if (data[rightIdx] > t)
std::swap(data[rightIdx], data[leftIdx]);
while (i < j)
{
std::swap(data[i], data[j]);
++i; --j;
while (data[i] < t) ++i;
while (data[j] > t) --j;
}
if (data[leftIdx] == t)
std::swap(data[leftIdx], data[j]);
else
{
++j;
std::swap(data[j], data[rightIdx]);
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= n)
leftIdx = j + 1;
if (n <= j)
rightIdx = j - 1;
}
return data[leftIdx];
}
template <typename T>
int sgn(T val) {
return (T(0) < val) - (val < T(0));
}
iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)
i would like to suggest one answer
if we take the first k elements and sort them into a linked list of k values
now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)
cheers
Explanation of the median - of - medians algorithm to find the k-th largest integer out of n can be found here:
http://cs.indstate.edu/~spitla/presentation.pdf
Implementation in c++ is below:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int findMedian(vector<int> vec){
// Find median of a vector
int median;
size_t size = vec.size();
median = vec[(size/2)];
return median;
}
int findMedianOfMedians(vector<vector<int> > values){
vector<int> medians;
for (int i = 0; i < values.size(); i++) {
int m = findMedian(values[i]);
medians.push_back(m);
}
return findMedian(medians);
}
void selectionByMedianOfMedians(const vector<int> values, int k){
// Divide the list into n/5 lists of 5 elements each
vector<vector<int> > vec2D;
int count = 0;
while (count != values.size()) {
int countRow = 0;
vector<int> row;
while ((countRow < 5) && (count < values.size())) {
row.push_back(values[count]);
count++;
countRow++;
}
vec2D.push_back(row);
}
cout<<endl<<endl<<"Printing 2D vector : "<<endl;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
cout<<vec2D[i][j]<<" ";
}
cout<<endl;
}
cout<<endl;
// Calculating a new pivot for making splits
int m = findMedianOfMedians(vec2D);
cout<<"Median of medians is : "<<m<<endl;
// Partition the list into unique elements larger than 'm' (call this sublist L1) and
// those smaller them 'm' (call this sublist L2)
vector<int> L1, L2;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
if (vec2D[i][j] > m) {
L1.push_back(vec2D[i][j]);
}else if (vec2D[i][j] < m){
L2.push_back(vec2D[i][j]);
}
}
}
// Checking the splits as per the new pivot 'm'
cout<<endl<<"Printing L1 : "<<endl;
for (int i = 0; i < L1.size(); i++) {
cout<<L1[i]<<" ";
}
cout<<endl<<endl<<"Printing L2 : "<<endl;
for (int i = 0; i < L2.size(); i++) {
cout<<L2[i]<<" ";
}
// Recursive calls
if ((k - 1) == L1.size()) {
cout<<endl<<endl<<"Answer :"<<m;
}else if (k <= L1.size()) {
return selectionByMedianOfMedians(L1, k);
}else if (k > (L1.size() + 1)){
return selectionByMedianOfMedians(L2, k-((int)L1.size())-1);
}
}
int main()
{
int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
vector<int> vec(values, values + 25);
cout<<"The given array is : "<<endl;
for (int i = 0; i < vec.size(); i++) {
cout<<vec[i]<<" ";
}
selectionByMedianOfMedians(vec, 8);
return 0;
}
There is also Wirth's selection algorithm, which has a simpler implementation than QuickSelect. Wirth's selection algorithm is slower than QuickSelect, but with some improvements it becomes faster.
In more detail. Using Vladimir Zabrodsky's MODIFIND optimization and the median-of-3 pivot selection and paying some attention to the final steps of the partitioning part of the algorithm, i've came up with the following algorithm (imaginably named "LefSelect"):
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
int l=0, m = n-1, i=l, j=m;
float x;
while (l<m) {
if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
if( a[j] < a[k] ) F_SWAP(a[k],a[j]);
x=a[k];
while (j>k & i<k) {
do i++; while (a[i]<x);
do j--; while (a[j]>x);
F_SWAP(a[i],a[j]);
}
i++; j--;
if (j<k) {
while (a[i]<x) i++;
l=i; j=m;
}
if (k<i) {
while (x<a[j]) j--;
m=j; i=l;
}
}
return a[k];
}
In benchmarks that i did here, LefSelect is 20-30% faster than QuickSelect.
Haskell Solution:
kthElem index list = sort list !! index
withShape ~[] [] = []
withShape ~(x:xs) (y:ys) = x : withShape xs ys
sort [] = []
sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs)
where
ls = filter (< x)
rs = filter (>= x)
This implements the median of median solutions by using the withShape method to discover the size of a partition without actually computing it.
Here is a C++ implementation of Randomized QuickSelect. The idea is to randomly pick a pivot element. To implement randomized partition, we use a random function, rand() to generate index between l and r, swap the element at randomly generated index with the last element, and finally call the standard partition process which uses last element as pivot.
#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;
int randomPartition(int arr[], int l, int r);
// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method. ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
// If k is smaller than number of elements in array
if (k > 0 && k <= r - l + 1)
{
// Partition the array around a random element and
// get position of pivot element in sorted array
int pos = randomPartition(arr, l, r);
// If position is same as k
if (pos-l == k-1)
return arr[pos];
if (pos-l > k-1) // If position is more, recur for left subarray
return kthSmallest(arr, l, pos-1, k);
// Else recur for right subarray
return kthSmallest(arr, pos+1, r, k-pos+l-1);
}
// If k is more than number of elements in array
return INT_MAX;
}
void swap(int *a, int *b)
{
int temp = *a;
*a = *b;
*b = temp;
}
// Standard partition process of QuickSort(). It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
int x = arr[r], i = l;
for (int j = l; j <= r - 1; j++)
{
if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
{
swap(&arr[i], &arr[j]);
i++;
}
}
swap(&arr[i], &arr[r]); // swap the pivot
return i;
}
// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
int n = r-l+1;
int pivot = rand() % n;
swap(&arr[l + pivot], &arr[r]);
return partition(arr, l, r);
}
// Driver program to test above methods
int main()
{
int arr[] = {12, 3, 5, 7, 4, 19, 26};
int n = sizeof(arr)/sizeof(arr[0]), k = 3;
cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
return 0;
}
The worst case time complexity of the above solution is still O(n2).In worst case, the randomized function may always pick a corner element. The expected time complexity of above randomized QuickSelect is Θ(n)
Have Priority queue created.
Insert all the elements into heap.
Call poll() k times.
public static int getKthLargestElements(int[] arr)
{
PriorityQueue<Integer> pq = new PriorityQueue<>((x , y) -> (y-x));
//insert all the elements into heap
for(int ele : arr)
pq.offer(ele);
// call poll() k times
int i=0;
while(i<k)
{
int result = pq.poll();
}
return result;
}
This is an implementation in Javascript.
If you release the constraint that you cannot modify the array, you can prevent the use of extra memory using two indexes to identify the "current partition" (in classic quicksort style - http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/).
function kthMax(a, k){
var size = a.length;
var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2)
//Create an array with all element lower than the pivot and an array with all element higher than the pivot
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
lowerArray.push(current);
} else if (current > pivot) {
upperArray.push(current);
}
}
//Which one should I continue with?
if(k <= upperArray.length) {
//Upper
return kthMax(upperArray, k);
} else {
var newK = k - (size - lowerArray.length);
if (newK > 0) {
///Lower
return kthMax(lowerArray, newK);
} else {
//None ... it's the current pivot!
return pivot;
}
}
}
If you want to test how it perform, you can use this variation:
function kthMax (a, k, logging) {
var comparisonCount = 0; //Number of comparison that the algorithm uses
var memoryCount = 0; //Number of integers in memory that the algorithm uses
var _log = logging;
if(k < 0 || k >= a.length) {
if (_log) console.log ("k is out of range");
return false;
}
function _kthmax(a, k){
var size = a.length;
var pivot = a[parseInt(Math.random()*size)];
if(_log) console.log("Inputs:", a, "size="+size, "k="+k, "pivot="+pivot);
// This should never happen. Just a nice check in this exercise
// if you are playing with the code to avoid never ending recursion
if(typeof pivot === "undefined") {
if (_log) console.log ("Ops...");
return false;
}
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
comparisonCount += 1;
memoryCount++;
lowerArray.push(current);
} else if (current > pivot) {
comparisonCount += 2;
memoryCount++;
upperArray.push(current);
}
}
if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);
if(k <= upperArray.length) {
comparisonCount += 1;
return _kthmax(upperArray, k);
} else if (k > size - lowerArray.length) {
comparisonCount += 2;
return _kthmax(lowerArray, k - (size - lowerArray.length));
} else {
comparisonCount += 2;
return pivot;
}
/*
* BTW, this is the logic for kthMin if we want to implement that... ;-)
*
if(k <= lowerArray.length) {
return kthMin(lowerArray, k);
} else if (k > size - upperArray.length) {
return kthMin(upperArray, k - (size - upperArray.length));
} else
return pivot;
*/
}
var result = _kthmax(a, k);
return {result: result, iterations: comparisonCount, memory: memoryCount};
}
The rest of the code is just to create some playground:
function getRandomArray (n){
var ar = [];
for (var i = 0, l = n; i < l; i++) {
ar.push(Math.round(Math.random() * l))
}
return ar;
}
//Create a random array of 50 numbers
var ar = getRandomArray (50);
Now, run you tests a few time.
Because of the Math.random() it will produce every time different results:
kthMax(ar, 2, true);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 34, true);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
If you test it a few times you can see even empirically that the number of iterations is, on average, O(n) ~= constant * n and the value of k does not affect the algorithm.
I came up with this algorithm and seems to be O(n):
Let's say k=3 and we want to find the 3rd largest item in the array. I would create three variables and compare each item of the array with the minimum of these three variables. If array item is greater than our minimum, we would replace the min variable with the item value. We continue the same thing until end of the array. The minimum of our three variables is the 3rd largest item in the array.
define variables a=0, b=0, c=0
iterate through the array items
find minimum a,b,c
if item > min then replace the min variable with item value
continue until end of array
the minimum of a,b,c is our answer
And, to find Kth largest item we need K variables.
Example: (k=3)
[1,2,4,1,7,3,9,5,6,2,9,8]
Final variable values:
a=7 (answer)
b=8
c=9
Can someone please review this and let me know what I am missing?
Here is the implementation of the algorithm eladv suggested(I also put here the implementation with random pivot):
public class Median {
public static void main(String[] s) {
int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
System.out.println(selectK(test,8));
/*
int n = 100000000;
int[] test = new int[n];
for(int i=0; i<test.length; i++)
test[i] = (int)(Math.random()*test.length);
long start = System.currentTimeMillis();
random_selectK(test, test.length/2);
long end = System.currentTimeMillis();
System.out.println(end - start);
*/
}
public static int random_selectK(int[] a, int k) {
if(a.length <= 1)
return a[0];
int r = (int)(Math.random() * a.length);
int p = a[r];
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return random_selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return random_selectK(temp,k-small-equal);
}
}
public static int selectK(int[] a, int k) {
if(a.length <= 5) {
Arrays.sort(a);
return a[k-1];
}
int p = median_of_medians(a);
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return selectK(temp,k-small-equal);
}
}
private static int median_of_medians(int[] a) {
int[] b = new int[a.length/5];
int[] temp = new int[5];
for(int i=0; i<b.length; i++) {
for(int j=0; j<5; j++)
temp[j] = a[5*i + j];
Arrays.sort(temp);
b[i] = temp[2];
}
return selectK(b, b.length/2 + 1);
}
}
it is similar to the quickSort strategy, where we pick an arbitrary pivot, and bring the smaller elements to its left, and the larger to the right
public static int kthElInUnsortedList(List<int> list, int k)
{
if (list.Count == 1)
return list[0];
List<int> left = new List<int>();
List<int> right = new List<int>();
int pivotIndex = list.Count / 2;
int pivot = list[pivotIndex]; //arbitrary
for (int i = 0; i < list.Count && i != pivotIndex; i++)
{
int currentEl = list[i];
if (currentEl < pivot)
left.Add(currentEl);
else
right.Add(currentEl);
}
if (k == left.Count + 1)
return pivot;
if (left.Count < k)
return kthElInUnsortedList(right, k - left.Count - 1);
else
return kthElInUnsortedList(left, k);
}
Go to the End of this link : ...........
http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/
You can find the kth smallest element in O(n) time and constant space. If we consider the array is only for integers.
The approach is to do a binary search on the range of Array values. If we have a min_value and a max_value both in integer range, we can do a binary search on that range.
We can write a comparator function which will tell us if any value is the kth-smallest or smaller than kth-smallest or bigger than kth-smallest.
Do the binary search until you reach the kth-smallest number
Here is the code for that
class Solution:
def _iskthsmallest(self, A, val, k):
less_count, equal_count = 0, 0
for i in range(len(A)):
if A[i] == val: equal_count += 1
if A[i] < val: less_count += 1
if less_count >= k: return 1
if less_count + equal_count < k: return -1
return 0
def kthsmallest_binary(self, A, min_val, max_val, k):
if min_val == max_val:
return min_val
mid = (min_val + max_val)/2
iskthsmallest = self._iskthsmallest(A, mid, k)
if iskthsmallest == 0: return mid
if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
return self.kthsmallest_binary(A, mid+1, max_val, k)
# #param A : tuple of integers
# #param B : integer
# #return an integer
def kthsmallest(self, A, k):
if not A: return 0
if k > len(A): return 0
min_val, max_val = min(A), max(A)
return self.kthsmallest_binary(A, min_val, max_val, k)
What I would do is this:
initialize empty doubly linked list l
for each element e in array
if e larger than head(l)
make e the new head of l
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.
Update:
initialize empty sorted tree l
for each element e in array
if e between head(l) and tail(l)
insert e into l // O(log k)
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
First we can build a BST from unsorted array which takes O(n) time and from the BST we can find the kth smallest element in O(log(n)) which over all counts to an order of O(n).

Resources