Related
Most of us are familiar with the maximum sum subarray problem. I came across a variant of this problem which asks the programmer to output the maximum of all subarray sums modulo some number M.
The naive approach to solve this variant would be to find all possible subarray sums (which would be of the order of N^2 where N is the size of the array). Of course, this is not good enough. The question is - how can we do better?
Example: Let us consider the following array:
6 6 11 15 12 1
Let M = 13. In this case, subarray 6 6 (or 12 or 6 6 11 15 or 11 15 12) will yield maximum sum ( = 12 ).
We can do this as follow:
Maintaining an array sum which at index ith, it contains the modulus sum from 0 to ith.
For each index ith, we need to find the maximum sub sum that end at this index:
For each subarray (start + 1 , i ), we know that the mod sum of this sub array is
int a = (sum[i] - sum[start] + M) % M
So, we can only achieve a sub-sum larger than sum[i] if sum[start] is larger than sum[i] and as close to sum[i] as possible.
This can be done easily if you using a binary search tree.
Pseudo code:
int[] sum;
sum[0] = A[0];
Tree tree;
tree.add(sum[0]);
int result = sum[0];
for(int i = 1; i < n; i++){
sum[i] = sum[i - 1] + A[i];
sum[i] %= M;
int a = tree.getMinimumValueLargerThan(sum[i]);
result = max((sum[i] - a + M) % M, result);
tree.add(sum[i]);
}
print result;
Time complexity :O(n log n)
Let A be our input array with zero-based indexing. We can reduce A modulo M without changing the result.
First of all, let's reduce the problem to a slightly easier one by computing an array P representing the prefix sums of A, modulo M:
A = 6 6 11 2 12 1
P = 6 12 10 12 11 12
Now let's process the possible left borders of our solution subarrays in decreasing order. This means that we will first determine the optimal solution that starts at index n - 1, then the one that starts at index n - 2 etc.
In our example, if we chose i = 3 as our left border, the possible subarray sums are represented by the suffix P[3..n-1] plus a constant a = A[i] - P[i]:
a = A[3] - P[3] = 2 - 12 = 3 (mod 13)
P + a = * * * 2 1 2
The global maximum will occur at one point too. Since we can insert the suffix values from right to left, we have now reduced the problem to the following:
Given a set of values S and integers x and M, find the maximum of S + x modulo M
This one is easy: Just use a balanced binary search tree to manage the elements of S. Given a query x, we want to find the largest value in S that is smaller than M - x (that is the case where no overflow occurs when adding x). If there is no such value, just use the largest value of S. Both can be done in O(log |S|) time.
Total runtime of this solution: O(n log n)
Here's some C++ code to compute the maximum sum. It would need some minor adaptions to also return the borders of the optimal subarray:
#include <bits/stdc++.h>
using namespace std;
int max_mod_sum(const vector<int>& A, int M) {
vector<int> P(A.size());
for (int i = 0; i < A.size(); ++i)
P[i] = (A[i] + (i > 0 ? P[i-1] : 0)) % M;
set<int> S;
int res = 0;
for (int i = A.size() - 1; i >= 0; --i) {
S.insert(P[i]);
int a = (A[i] - P[i] + M) % M;
auto it = S.lower_bound(M - a);
if (it != begin(S))
res = max(res, *prev(it) + a);
res = max(res, (*prev(end(S)) + a) % M);
}
return res;
}
int main() {
// random testing to the rescue
for (int i = 0; i < 1000; ++i) {
int M = rand() % 1000 + 1, n = rand() % 1000 + 1;
vector<int> A(n);
for (int i = 0; i< n; ++i)
A[i] = rand() % M;
int should_be = 0;
for (int i = 0; i < n; ++i) {
int sum = 0;
for (int j = i; j < n; ++j) {
sum = (sum + A[j]) % M;
should_be = max(should_be, sum);
}
}
assert(should_be == max_mod_sum(A, M));
}
}
For me, all explanations here were awful, since I didn't get the searching/sorting part. How do we search/sort, was unclear.
We all know that we need to build prefixSum, meaning sum of all elems from 0 to i with modulo m
I guess, what we are looking for is clear.
Knowing that subarray[i][j] = (prefix[i] - prefix[j] + m) % m (indicating the modulo sum from index i to j), our maxima when given prefix[i] is always that prefix[j] which is as close as possible to prefix[i], but slightly bigger.
E.g. for m = 8, prefix[i] being 5, we are looking for the next value after 5, which is in our prefixArray.
For efficient search (binary search) we sort the prefixes.
What we can not do is, build the prefixSum first, then iterate again from 0 to n and look for index in the sorted prefix array, because we can find and endIndex which is smaller than our startIndex, which is no good.
Therefore, what we do is we iterate from 0 to n indicating the endIndex of our potential max subarray sum and then look in our sorted prefix array, (which is empty at the beginning) which contains sorted prefixes between 0 and endIndex.
def maximumSum(coll, m):
n = len(coll)
maxSum, prefixSum = 0, 0
sortedPrefixes = []
for endIndex in range(n):
prefixSum = (prefixSum + coll[endIndex]) % m
maxSum = max(maxSum, prefixSum)
startIndex = bisect.bisect_right(sortedPrefixes, prefixSum)
if startIndex < len(sortedPrefixes):
maxSum = max(maxSum, prefixSum - sortedPrefixes[startIndex] + m)
bisect.insort(sortedPrefixes, prefixSum)
return maxSum
From your question, it seems that you have created an array to store the cumulative sums (Prefix Sum Array), and are calculating the sum of the sub-array arr[i:j] as (sum[j] - sum[i] + M) % M. (arr and sum denote the given array and the prefix sum array respectively)
Calculating the sum of every sub-array results in a O(n*n) algorithm.
The question that arises is -
Do we really need to consider the sum of every sub-array to reach the desired maximum?
No!
For a value of j the value (sum[j] - sum[i] + M) % M will be maximum when sum[i] is just greater than sum[j] or the difference is M - 1.
This would reduce the algorithm to O(nlogn).
You can take a look at this explanation! https://www.youtube.com/watch?v=u_ft5jCDZXk
There are already a bunch of great solutions listed here, but I wanted to add one that has O(nlogn) runtime without using a balanced binary tree, which isn't in the Python standard library. This solution isn't my idea, but I had to think a bit as to why it worked. Here's the code, explanation below:
def maximumSum(a, m):
prefixSums = [(0, -1)]
for idx, el in enumerate(a):
prefixSums.append(((prefixSums[-1][0] + el) % m, idx))
prefixSums = sorted(prefixSums)
maxSeen = prefixSums[-1][0]
for (a, a_idx), (b, b_idx) in zip(prefixSums[:-1], prefixSums[1:]):
if a_idx > b_idx and b > a:
maxSeen = max((a-b) % m, maxSeen)
return maxSeen
As with the other solutions, we first calculate the prefix sums, but this time we also keep track of the index of the prefix sum. We then sort the prefix sums, as we want to find the smallest difference between prefix sums modulo m - sorting lets us just look at adjacent elements as they have the smallest difference.
At this point you might think we're neglecting an essential part of the problem - we want the smallest difference between prefix sums, but the larger prefix sum needs to appear before the smaller prefix sum (meaning it has a smaller index). In the solutions using trees, we ensure that by adding prefix sums one by one and recalculating the best solution.
However, it turns out that we can look at adjacent elements and just ignore ones that don't satisfy our index requirement. This confused me for some time, but the key realization is that the optimal solution will always come from two adjacent elements. I'll prove this via a contradiction. Let's say that the optimal solution comes from two non-adjacent prefix sums x and z with indices i and k, where z > x (it's sorted!) and k > i:
x ... z
k ... i
Let's consider one of the numbers between x and z, and let's call it y with index j. Since the list is sorted, x < y < z.
x ... y ... z
k ... j ... i
The prefix sum y must have index j < i, otherwise it would be part of a better solution with z. But if j < i, then j < k and y and x form a better solution than z and x! So any elements between x and z must form a better solution with one of the two, which contradicts our original assumption. Therefore the optimal solution must come from adjacent prefix sums in the sorted list.
Here is Java code for maximum sub array sum modulo. We handle the case we can not find least element in the tree strictly greater than s[i]
public static long maxModulo(long[] a, final long k) {
long[] s = new long[a.length];
TreeSet<Long> tree = new TreeSet<>();
s[0] = a[0] % k;
tree.add(s[0]);
long result = s[0];
for (int i = 1; i < a.length; i++) {
s[i] = (s[i - 1] + a[i]) % k;
// find least element in the tree strictly greater than s[i]
Long v = tree.higher(s[i]);
if (v == null) {
// can't find v, then compare v and s[i]
result = Math.max(s[i], result);
} else {
result = Math.max((s[i] - v + k) % k, result);
}
tree.add(s[i]);
}
return result;
}
Few points from my side that might hopefully help someone understand the problem better.
You do not need to add +M to the modulo calculation, as mentioned, % operator handles negative numbers well, so a % M = (a + M) % M
As mentioned, the trick is to build the proxy sum table such that
proxy[n] = (a[1] + ... a[n]) % M
This then allows one to represent the maxSubarraySum[i, j] as
maxSubarraySum[i, j] = (proxy[j] - proxy[j]) % M
The implementation trick is to build the proxy table as we iterate through the elements, instead of first pre-building it and then using. This is because for each new element in the array a[i] we want to compute proxy[i] and find proxy[j] that is bigger than but as close as possible to proxy[i] (ideally bigger by 1 because this results in a reminder of M - 1). For this we need to use a clever data structure for building proxy table while keeping it sorted and
being able to quickly find a closest bigger element to proxy[i]. bisect.bisect_right is a good choice in Python.
See my Python implementation below (hope this helps but I am aware this might not necessarily be as concise as others' solutions):
def maximumSum(a, m):
prefix_sum = [a[0] % m]
prefix_sum_sorted = [a[0] % m]
current_max = prefix_sum_sorted[0]
for elem in a[1:]:
prefix_sum_next = (prefix_sum[-1] + elem) % m
prefix_sum.append(prefix_sum_next)
idx_closest_bigger = bisect.bisect_right(prefix_sum_sorted, prefix_sum_next)
if idx_closest_bigger >= len(prefix_sum_sorted):
current_max = max(current_max, prefix_sum_next)
bisect.insort_right(prefix_sum_sorted, prefix_sum_next)
continue
if prefix_sum_sorted[idx_closest_bigger] > prefix_sum_next:
current_max = max(current_max, (prefix_sum_next - prefix_sum_sorted[idx_closest_bigger]) % m)
bisect.insort_right(prefix_sum_sorted, prefix_sum_next)
return current_max
Total java implementation with O(n*log(n))
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.TreeSet;
import java.util.stream.Stream;
public class MaximizeSumMod {
public static void main(String[] args) throws Exception{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
Long times = Long.valueOf(in.readLine());
while(times --> 0){
long[] pair = Stream.of(in.readLine().split(" ")).mapToLong(Long::parseLong).toArray();
long mod = pair[1];
long[] numbers = Stream.of(in.readLine().split(" ")).mapToLong(Long::parseLong).toArray();
printMaxMod(numbers,mod);
}
}
private static void printMaxMod(long[] numbers, Long mod) {
Long maxSoFar = (numbers[numbers.length-1] + numbers[numbers.length-2])%mod;
maxSoFar = (maxSoFar > (numbers[0]%mod)) ? maxSoFar : numbers[0]%mod;
numbers[0] %=mod;
for (Long i = 1L; i < numbers.length; i++) {
long currentNumber = numbers[i.intValue()]%mod;
maxSoFar = maxSoFar > currentNumber ? maxSoFar : currentNumber;
numbers[i.intValue()] = (currentNumber + numbers[i.intValue()-1])%mod;
maxSoFar = maxSoFar > numbers[i.intValue()] ? maxSoFar : numbers[i.intValue()];
}
if(mod.equals(maxSoFar+1) || numbers.length == 2){
System.out.println(maxSoFar);
return;
}
long previousNumber = numbers[0];
TreeSet<Long> set = new TreeSet<>();
set.add(previousNumber);
for (Long i = 2L; i < numbers.length; i++) {
Long currentNumber = numbers[i.intValue()];
Long ceiling = set.ceiling(currentNumber);
if(ceiling == null){
set.add(numbers[i.intValue()-1]);
continue;
}
if(ceiling.equals(currentNumber)){
set.remove(ceiling);
Long greaterCeiling = set.ceiling(currentNumber);
if(greaterCeiling == null){
set.add(ceiling);
set.add(numbers[i.intValue()-1]);
continue;
}
set.add(ceiling);
ceiling = greaterCeiling;
}
Long newMax = (currentNumber - ceiling + mod);
maxSoFar = maxSoFar > newMax ? maxSoFar :newMax;
set.add(numbers[i.intValue()-1]);
}
System.out.println(maxSoFar);
}
}
Adding STL C++11 code based on the solution suggested by #Pham Trung. Might be handy.
#include <iostream>
#include <set>
int main() {
int N;
std::cin>>N;
for (int nn=0;nn<N;nn++){
long long n,m;
std::set<long long> mSet;
long long maxVal = 0; //positive input values
long long sumVal = 0;
std::cin>>n>>m;
mSet.insert(m);
for (long long q=0;q<n;q++){
long long tmp;
std::cin>>tmp;
sumVal = (sumVal + tmp)%m;
auto itSub = mSet.upper_bound(sumVal);
maxVal = std::max(maxVal,(m + sumVal - *itSub)%m);
mSet.insert(sumVal);
}
std::cout<<maxVal<<"\n";
}
}
As you can read in Wikipedia exists a solution called Kadane's algorithm, which compute the maximum subarray sum watching ate the maximum subarray ending at position i for all positions i by iterating once over the array. Then this solve the problem with with runtime complexity O(n).
Unfortunately, I think that Kadane's algorithm isn't able to find all possible solution when more than one solution exists.
An implementation in Java, I didn't tested it:
public int[] kadanesAlgorithm (int[] array) {
int start_old = 0;
int start = 0;
int end = 0;
int found_max = 0;
int max = array[0];
for(int i = 0; i<array.length; i++) {
max = Math.max(array[i], max + array[i]);
found_max = Math.max(found_max, max);
if(max < 0)
start = i+1;
else if(max == found_max) {
start_old=start;
end = i;
}
}
return Arrays.copyOfRange(array, start_old, end+1);
}
I feel my thoughts are aligned with what have been posted already, but just in case - Kotlin O(NlogN) solution:
val seen = sortedSetOf(0L)
var prev = 0L
return max(a.map { x ->
val z = (prev + x) % m
prev = z
seen.add(z)
seen.higher(z)?.let{ y ->
(z - y + m) % m
} ?: z
})
Implementation in java using treeset...
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.TreeSet;
public class Main {
public static void main(String[] args) throws IOException {
BufferedReader read = new BufferedReader(new InputStreamReader(System.in)) ;
String[] str = read.readLine().trim().split(" ") ;
int n = Integer.parseInt(str[0]) ;
long m = Long.parseLong(str[1]) ;
str = read.readLine().trim().split(" ") ;
long[] arr = new long[n] ;
for(int i=0; i<n; i++) {
arr[i] = Long.parseLong(str[i]) ;
}
long maxCount = 0L ;
TreeSet<Long> tree = new TreeSet<>() ;
tree.add(0L) ;
long prefix = 0L ;
for(int i=0; i<n; i++) {
prefix = (prefix + arr[i]) % m ;
maxCount = Math.max(prefix, maxCount) ;
Long temp = tree.higher(prefix) ;
System.out.println(temp);
if(temp != null) {
maxCount = Math.max((prefix-temp+m)%m, maxCount) ;
}
//System.out.println(maxCount);
tree.add(prefix) ;
}
System.out.println(maxCount);
}
}
Here is one implementation of solution in java for this problem which works using TreeSet in java for optimized solution !
public static long maximumSum2(long[] arr, long n, long m)
{
long x = 0;
long prefix = 0;
long maxim = 0;
TreeSet<Long> S = new TreeSet<Long>();
S.add((long)0);
// Traversing the array.
for (int i = 0; i < n; i++)
{
// Finding prefix sum.
prefix = (prefix + arr[i]) % m;
// Finding maximum of prefix sum.
maxim = Math.max(maxim, prefix);
// Finding iterator poing to the first
// element that is not less than value
// "prefix + 1", i.e., greater than or
// equal to this value.
long it = S.higher(prefix)!=null?S.higher(prefix):0;
// boolean isFound = false;
// for (long j : S)
// {
// if (j >= prefix + 1)
// if(isFound == false) {
// it = j;
// isFound = true;
// }
// else {
// if(j < it) {
// it = j;
// }
// }
// }
if (it != 0)
{
maxim = Math.max(maxim, prefix - it + m);
}
// adding prefix in the set.
S.add(prefix);
}
return maxim;
}
public static int MaxSequence(int[] arr)
{
int maxSum = 0;
int partialSum = 0;
int negative = 0;
for (int i = 0; i < arr.Length; i++)
{
if (arr[i] < 0)
{
negative++;
}
}
if (negative == arr.Length)
{
return 0;
}
foreach (int item in arr)
{
partialSum += item;
maxSum = Math.Max(maxSum, partialSum);
if (partialSum < 0)
{
partialSum = 0;
}
}
return maxSum;
}
Modify Kadane algorithm to keep track of #occurrence. Below is the code.
#python3
#source: https://github.com/harishvc/challenges/blob/master/dp-largest-sum-sublist-modulo.py
#Time complexity: O(n)
#Space complexity: O(n)
def maxContiguousSum(a,K):
sum_so_far =0
max_sum = 0
count = {} #keep track of occurrence
for i in range(0,len(a)):
sum_so_far += a[i]
sum_so_far = sum_so_far%K
if sum_so_far > 0:
max_sum = max(max_sum,sum_so_far)
if sum_so_far in count.keys():
count[sum_so_far] += 1
else:
count[sum_so_far] = 1
else:
assert sum_so_far < 0 , "Logic error"
#IMPORTANT: reset sum_so_far
sum_so_far = 0
return max_sum,count[max_sum]
a = [6, 6, 11, 15, 12, 1]
K = 13
max_sum,count = maxContiguousSum(a,K)
print("input >>> %s max sum=%d #occurrence=%d" % (a,max_sum,count))
Magnitude Pole: An element in an array whose left hand side elements are lesser than or equal to it and whose right hand side element are greater than or equal to it.
example input
3,1,4,5,9,7,6,11
desired output
4,5,11
I was asked this question in an interview and I have to return the index of the element and only return the first element that met the condition.
My logic
Take two MultiSet (So that we can consider duplicate as well), one for right hand side of the element and one for left hand side of the
element(the pole).
Start with 0th element and put rest all elements in the "right set".
Base condition if this 0th element is lesser or equal to all element on "right set" then return its index.
Else put this into "left set" and start with element at index 1.
Traverse the Array and each time pick the maximum value from "left set" and minimum value from "right set" and compare.
At any instant of time for any element all the value to its left are in the "left set" and value to its right are in the "right set"
Code
int magnitudePole (const vector<int> &A) {
multiset<int> left, right;
int left_max, right_min;
int size = A.size();
for (int i = 1; i < size; ++i)
right.insert(A[i]);
right_min = *(right.begin());
if(A[0] <= right_min)
return 0;
left.insert(A[0]);
for (int i = 1; i < size; ++i) {
right.erase(right.find(A[i]));
left_max = *(--left.end());
if (right.size() > 0)
right_min = *(right.begin());
if (A[i] > left_max && A[i] <= right_min)
return i;
else
left.insert(A[i]);
}
return -1;
}
My questions
I was told that my logic is incorrect, I am not able to understand why this logic is incorrect (though I have checked for some cases and
it is returning right index)
For my own curiosity how to do this without using any set/multiset in O(n) time.
For an O(n) algorithm:
Count the largest element from n[0] to n[k] for all k in [0, length(n)), save the answer in an array maxOnTheLeft. This costs O(n);
Count the smallest element from n[k] to n[length(n)-1] for all k in [0, length(n)), save the answer in an array minOnTheRight. This costs O(n);
Loop through the whole thing and find any n[k] with maxOnTheLeft <= n[k] <= minOnTheRight. This costs O(n).
And you code is (at least) wrong here:
if (A[i] > left_max && A[i] <= right_min) // <-- should be >= and <=
Create two bool[N] called NorthPole and SouthPole (just to be humorous.
step forward through A[]tracking maximum element found so far, and set SouthPole[i] true if A[i] > Max(A[0..i-1])
step backward through A[] and set NorthPole[i] true if A[i] < Min(A[i+1..N-1)
step forward through NorthPole and SouthPole to find first element with both set true.
O(N) in each step above, as visiting each node once, so O(N) overall.
Java implementation:
Collection<Integer> magnitudes(int[] A) {
int length = A.length;
// what's the maximum number from the beginning of the array till the current position
int[] maxes = new int[A.length];
// what's the minimum number from the current position till the end of the array
int[] mins = new int[A.length];
// build mins
int min = mins[length - 1] = A[length - 1];
for (int i = length - 2; i >= 0; i--) {
if (A[i] < min) {
min = A[i];
}
mins[i] = min;
}
// build maxes
int max = maxes[0] = A[0];
for (int i = 1; i < length; i++) {
if (A[i] > max) {
max = A[i];
}
maxes[i] = max;
}
Collection<Integer> result = new ArrayList<>();
// use them to find the magnitudes if any exists
for (int i = 0; i < length; i++) {
if (A[i] >= maxes[i] && A[i] <= mins[i]) {
// return here if first one only is needed
result.add(A[i]);
}
}
return result;
}
Your logic seems perfectly correct (didn't check the implementation, though) and can be implemented to give an O(n) time algorithm! Nice job thinking in terms of sets.
Your right set can be implemented as a stack which supports a min, and the left set can be implemented as a stack which supports a max and this gives an O(n) time algorithm.
Having a stack which supports max/min is a well known interview question and can be done so each operation (push/pop/min/max is O(1)).
To use this for your logic, the pseudo code will look something like this
foreach elem in a[n-1 to 0]
right_set.push(elem)
while (right_set.has_elements()) {
candidate = right_set.pop();
if (left_set.has_elements() && left_set.max() <= candidate <= right_set.min()) {
break;
} else if (!left.has_elements() && candidate <= right_set.min() {
break;
}
left_set.push(candidate);
}
return candidate
I saw this problem on Codility, solved it with Perl:
sub solution {
my (#A) = #_;
my ($max, $min) = ($A[0], $A[-1]);
my %candidates;
for my $i (0..$#A) {
if ($A[$i] >= $max) {
$max = $A[$i];
$candidates{$i}++;
}
}
for my $i (reverse 0..$#A) {
if ($A[$i] <= $min) {
$min = $A[$i];
return $i if $candidates{$i};
}
}
return -1;
}
How about the following code? I think its efficiency is not good in the worst case, but it's expected efficiency would be good.
int getFirstPole(int* a, int n)
{
int leftPole = a[0];
for(int i = 1; i < n; i++)
{
if(a[j] >= leftPole)
{
int j = i;
for(; j < n; j++)
{
if(a[j] < a[i])
{
i = j+1; //jump the elements between i and j
break;
}
else if (a[j] > a[i])
leftPole = a[j];
}
if(j == n) // if no one is less than a[i] then return i
return i;
}
}
return 0;
}
Create array of ints called mags, and int variable called maxMag.
For each element in source array check if element is greater or equal to maxMag.
If is: add element to mags array and set maxMag = element.
If isn't: loop through mags array and remove all elements lesser.
Result: array of magnitude poles
Interesting question, I am having my own solution in C# which I have given below, read the comments to understand my approach.
public int MagnitudePoleFinder(int[] A)
{
//Create a variable to store Maximum Valued Item i.e. maxOfUp
int maxOfUp = A[0];
//if list has only one value return this value
if (A.Length <= 1) return A[0];
//create a collection for all candidates for magnitude pole that will be found in the iteration
var magnitudeCandidates = new List<KeyValuePair<int, int>>();
//add the first element as first candidate
var a = A[0];
magnitudeCandidates.Add(new KeyValuePair<int, int>(0, a));
//lets iterate
for (int i = 1; i < A.Length; i++)
{
a = A[i];
//if this item is maximum or equal to all above items ( maxofUp will hold max value of all the above items)
if (a >= maxOfUp)
{
//add it to candidate list
magnitudeCandidates.Add(new KeyValuePair<int, int>(i, a));
maxOfUp = a;
}
else
{
//remote all the candidates having greater values to this item
magnitudeCandidates = magnitudeCandidates.Except(magnitudeCandidates.Where(c => c.Value > a)).ToList();
}
}
//if no candidate return -1
if (magnitudeCandidates.Count == 0) return -1;
else
//return value of first candidate
return magnitudeCandidates.First().Key;
}
We need to find pair of numbers in an array whose sum is equal to a given value.
A = {6,4,5,7,9,1,2}
Sum = 10
Then the pairs are - {6,4} , {9,1}
I have two solutions for this .
an O(nlogn) solution - sort + check sum with 2 iterators (beginning and end).
an O(n) solution - hashing the array. Then checking if sum-hash[i] exists in the hash table or not.
But , the problem is that although the second solution is O(n) time , but uses O(n) space as well.
So , I was wondering if we could do it in O(n) time and O(1) space. And this is NOT homework!
Use in-place radix sort and OP's first solution with 2 iterators, coming towards each other.
If numbers in the array are not some sort of multi-precision numbers and are, for example, 32-bit integers, you can sort them in 2*32 passes using practically no additional space (1 bit per pass). Or 2*8 passes and 16 integer counters (4 bits per pass).
Details for the 2 iterators solution:
First iterator initially points to first element of the sorted array and advances forward. Second iterator initially points to last element of the array and advances backward.
If sum of elements, referenced by iterators, is less than the required value, advance first iterator. If it is greater than the required value, advance second iterator. If it is equal to the required value, success.
Only one pass is needed, so time complexity is O(n). Space complexity is O(1). If radix sort is used, complexities of the whole algorithm are the same.
If you are interested in related problems (with sum of more than 2 numbers), see "Sum-subset with a fixed subset size" and "Finding three elements in an array whose sum is closest to an given number".
This is a classic interview question from Microsoft research Asia.
How to Find 2 numbers in an unsorted array equal to a given sum.
[1]brute force solution
This algorithm is very simple. The time complexity is O(N^2)
[2]Using binary search
Using bianry searching to find the Sum-arr[i] with every arr[i], The time complexity can be reduced to O(N*logN)
[3]Using Hash
Base on [2] algorithm and use hash, the time complexity can be reduced to O(N), but this solution will add the O(N) space of hash.
[4]Optimal algorithm:
Pseduo-code:
for(i=0;j=n-1;i<j)
if(arr[i]+arr[j]==sum) return (i,j);
else if(arr[i]+arr[j]<sum) i++;
else j--;
return(-1,-1);
or
If a[M] + a[m] > I then M--
If a[M] + a[m] < I then m++
If a[M] + a[m] == I you have found it
If m > M, no such numbers exist.
And, Is this quesiton completely solved? No. If the number is N. This problem will become very complex.
The quesiton then:
How can I find all the combination cases with a given number?
This is a classic NP-Complete problem which is called subset-sum.
To understand NP/NPC/NP-Hard you'd better to read some professional books.
References:
[1]http://www.quora.com/Mathematics/How-can-I-find-all-the-combination-cases-with-a-given-number
[2]http://en.wikipedia.org/wiki/Subset_sum_problem
for (int i=0; i < array.size(); i++){
int value = array[i];
int diff = sum - value;
if (! hashSet.contains(diffvalue)){
hashSet.put(value,value);
} else{
printf(sum = diffvalue + hashSet.get(diffvalue));
}
}
--------
Sum being sum of 2 numbers.
public void printPairsOfNumbers(int[] a, int sum){
//O(n2)
for (int i = 0; i < a.length; i++) {
for (int j = i+1; j < a.length; j++) {
if(sum - a[i] == a[j]){
//match..
System.out.println(a[i]+","+a[j]);
}
}
}
//O(n) time and O(n) space
Set<Integer> cache = new HashSet<Integer>();
cache.add(a[0]);
for (int i = 1; i < a.length; i++) {
if(cache.contains(sum - a[i])){
//match//
System.out.println(a[i]+","+(sum-a[i]));
}else{
cache.add(a[i]);
}
}
}
Create a dictionary with pairs Key (number from the list) and the Value is the number which is necessary to obtain a desired value. Next, check the presence of the pairs of numbers in the list.
def check_sum_in_list(p_list, p_check_sum):
l_dict = {i: (p_check_sum - i) for i in p_list}
for key, value in l_dict.items():
if key in p_list and value in p_list:
return True
return False
if __name__ == '__main__':
l1 = [1, 3, 7, 12, 72, 2, 8]
l2 = [1, 2, 2, 4, 7, 4, 13, 32]
print(check_sum_in_list(l1, 10))
print(check_sum_in_list(l2, 99))
Output:
True
Flase
version 2
import random
def check_sum_in_list(p_list, p_searched_sum):
print(list(p_list))
l_dict = {i: p_searched_sum - i for i in set(p_list)}
for key, value in l_dict.items():
if key in p_list and value in p_list:
if p_list.index(key) != p_list.index(value):
print(key, value)
return True
return False
if __name__ == '__main__':
l1 = []
for i in range(1, 2000000):
l1.append(random.randrange(1, 1000))
j = 0
i = 9
while i < len(l1):
if check_sum_in_list(l1[j:i], 100):
print('Found')
break
else:
print('Continue searching')
j = i
i = i + 10
Output:
...
[154, 596, 758, 924, 797, 379, 731, 278, 992, 167]
Continue searching
[808, 730, 216, 15, 261, 149, 65, 386, 670, 770]
Continue searching
[961, 632, 39, 888, 61, 18, 166, 167, 474, 108]
39 61
Finded
[Finished in 3.9s]
If you assume that the value M to which the pairs are suppose to sum is constant and that the entries in the array are positive, then you can do this in one pass (O(n) time) using M/2 pointers (O(1) space) as follows. The pointers are labeled P1,P2,...,Pk where k=floor(M/2). Then do something like this
for (int i=0; i<N; ++i) {
int j = array[i];
if (j < M/2) {
if (Pj == 0)
Pj = -(i+1); // found smaller unpaired
else if (Pj > 0)
print(Pj-1,i); // found a pair
Pj = 0;
} else
if (Pj == 0)
Pj = (i+1); // found larger unpaired
else if (Pj < 0)
print(Pj-1,i); // found a pair
Pj = 0;
}
}
You can handle repeated entries (e.g. two 6's) by storing the indices as digits in base N, for example. For M/2, you can add the conditional
if (j == M/2) {
if (Pj == 0)
Pj = i+1; // found unpaired middle
else
print(Pj-1,i); // found a pair
Pj = 0;
}
But now you have the problem of putting the pairs together.
Does the obvious solution not work (iterating over every consecutive pair) or are the two numbers in any order?
In that case, you could sort the list of numbers and use random sampling to partition the sorted list until you have a sublist that is small enough to be iterated over.
public static ArrayList<Integer> find(int[] A , int target){
HashSet<Integer> set = new HashSet<Integer>();
ArrayList<Integer> list = new ArrayList<Integer>();
int diffrence = 0;
for(Integer i : A){
set.add(i);
}
for(int i = 0; i <A.length; i++){
diffrence = target- A[i];
if(set.contains(diffrence)&&A[i]!=diffrence){
list.add(A[i]);
list.add(diffrence);
return list;
}
}
return null;
}
`package algorithmsDesignAnalysis;
public class USELESStemp {
public static void main(String[] args){
int A[] = {6, 8, 7, 5, 3, 11, 10};
int sum = 12;
int[] B = new int[A.length];
int Max =A.length;
for(int i=0; i<A.length; i++){
B[i] = sum - A[i];
if(B[i] > Max)
Max = B[i];
if(A[i] > Max)
Max = A[i];
System.out.print(" " + B[i] + "");
} // O(n) here;
System.out.println("\n Max = " + Max);
int[] Array = new int[Max+1];
for(int i=0; i<B.length; i++){
Array[B[i]] = B[i];
} // O(n) here;
for(int i=0; i<A.length; i++){
if (Array[A[i]] >= 0)
System.out.println("We got one: " + A[i] +" and " + (sum-A[i]));
} // O(n) here;
} // end main();
/******
Running time: 3*O(n)
*******/
}
Below code takes the array and the number N as the target sum.
First the array is sorted, then a new array containing the
remaining elements are taken and then scanned not by binary search
but simple scanning of the remainder and the array simultaneously.
public static int solution(int[] a, int N) {
quickSort(a, 0, a.length-1); // nlog(n)
int[] remainders = new int[a.length];
for (int i=0; i<a.length; i++) {
remainders[a.length-1-i] = N - a[i]; // n
}
int previous = 0;
for (int j=0; j<a.length; j++) { // ~~ n
int k = previous;
while(k < remainders.length && remainders[k] < a[j]) {
k++;
}
if(k < remainders.length && remainders[k] == a[j]) {
return 1;
}
previous = k;
}
return 0;
}
Shouldn't iterating from both ends just solve the problem?
Sort the array. And start comparing from both ends.
if((arr[start] + arr[end]) < sum) start++;
if((arr[start] + arr[end]) > sum) end--;
if((arr[start] + arr[end]) = sum) {print arr[start] "," arr[end] ; start++}
if(start > end) break;
Time Complexity O(nlogn)
if its a sorted array and we need only pair of numbers and not all the pairs we can do it like this:
public void sums(int a[] , int x){ // A = 1,2,3,9,11,20 x=11
int i=0 , j=a.length-1;
while(i < j){
if(a[i] + a[j] == x) system.out.println("the numbers : "a[x] + " " + a[y]);
else if(a[i] + a[j] < x) i++;
else j--;
}
}
1 2 3 9 11 20 || i=0 , j=5 sum=21 x=11
1 2 3 9 11 20 || i=0 , j=4 sum=13 x=11
1 2 3 9 11 20 || i=0 , j=4 sum=11 x=11
END
The following code returns true if two integers in an array match a compared integer.
function compareArraySums(array, compare){
var candidates = [];
function compareAdditions(element, index, array){
if(element <= y){
candidates.push(element);
}
}
array.forEach(compareAdditions);
for(var i = 0; i < candidates.length; i++){
for(var j = 0; j < candidates.length; j++){
if (i + j === y){
return true;
}
}
}
}
Python 2.7 Implementation:
import itertools
list = [1, 1, 2, 3, 4, 5,]
uniquelist = set(list)
targetsum = 5
for n in itertools.combinations(uniquelist, 2):
if n[0] + n[1] == targetsum:
print str(n[0]) + " + " + str(n[1])
Output:
1 + 4
2 + 3
https://github.com/clockzhong/findSumPairNumber
#! /usr/bin/env python
import sys
import os
import re
#get the number list
numberListStr=raw_input("Please input your number list (seperated by spaces)...\n")
numberList=[int(i) for i in numberListStr.split()]
print 'you have input the following number list:'
print numberList
#get the sum target value
sumTargetStr=raw_input("Please input your target number:\n")
sumTarget=int(sumTargetStr)
print 'your target is: '
print sumTarget
def generatePairsWith2IndexLists(list1, list2):
result=[]
for item1 in list1:
for item2 in list2:
#result.append([item1, item2])
result.append([item1+1, item2+1])
#print result
return result
def generatePairsWithOneIndexLists(list1):
result=[]
index = 0
while index< (len(list1)-1):
index2=index+1
while index2 < len(list1):
#result.append([list1[index],list1[index2]])
result.append([list1[index]+1,list1[index2]+1])
index2+=1
index+=1
return result
def getPairs(numList, target):
pairList=[]
candidateSlots=[] ##we have (target-1) slots
#init the candidateSlots list
index=0
while index < target+1:
candidateSlots.append(None)
index+=1
#generate the candidateSlots, contribute O(n) complexity
index=0
while index<len(numList):
if numList[index]<=target and numList[index]>=0:
#print 'index:',index
#print 'numList[index]:',numList[index]
#print 'len(candidateSlots):',len(candidateSlots)
if candidateSlots[numList[index]]==None:
candidateSlots[numList[index]]=[index]
else:
candidateSlots[numList[index]].append(index)
index+=1
#print candidateSlots
#generate the pairs list based on the candidateSlots[] we just created
#contribute O(target) complexity
index=0
while index<=(target/2):
if candidateSlots[index]!=None and candidateSlots[target-index]!=None:
if index!=(target-index):
newPairList=generatePairsWith2IndexLists(candidateSlots[index], candidateSlots[target-index])
else:
newPairList=generatePairsWithOneIndexLists(candidateSlots[index])
pairList+=newPairList
index+=1
return pairList
print getPairs(numberList, sumTarget)
I've successfully implemented one solution with Python under O(n+m) time and space cost.
The "m" means the target value which those two numbers' sum need equal to.
I believe this is the lowest cost could get. Erict2k used itertools.combinations, it'll also cost similar or higher time&space cost comparing my algorithm.
If numbers aren't very big, you can use fast fourier transform to multiply two polynomials and then in O(1) check if coefficient before x^(needed sum) sum is more than zero. O(n log n) total!
// Java implementation using Hashing
import java.io.*;
class PairSum
{
private static final int MAX = 100000; // Max size of Hashmap
static void printpairs(int arr[],int sum)
{
// Declares and initializes the whole array as false
boolean[] binmap = new boolean[MAX];
for (int i=0; i<arr.length; ++i)
{
int temp = sum-arr[i];
// checking for condition
if (temp>=0 && binmap[temp])
{
System.out.println("Pair with given sum " +
sum + " is (" + arr[i] +
", "+temp+")");
}
binmap[arr[i]] = true;
}
}
// Main to test the above function
public static void main (String[] args)
{
int A[] = {1, 4, 45, 6, 10, 8};
int n = 16;
printpairs(A, n);
}
}
public static void Main(string[] args)
{
int[] myArray = {1,2,3,4,5,6,1,4,2,2,7 };
int Sum = 9;
for (int j = 1; j < myArray.Length; j++)
{
if (myArray[j-1]+myArray[j]==Sum)
{
Console.WriteLine("{0}, {1}",myArray[j-1],myArray[j]);
}
}
Console.ReadLine();
}
I want an efficient algorithm to find the next greater permutation of the given string.
Wikipedia has a nice article on lexicographical order generation. It also describes an algorithm to generate the next permutation.
Quoting:
The following algorithm generates the next permutation lexicographically after a given permutation. It changes the given permutation in-place.
Find the highest index i such that s[i] < s[i+1]. If no such index exists, the permutation is the last permutation.
Find the highest index j > i such that s[j] > s[i]. Such a j must exist, since i+1 is such an index.
Swap s[i] with s[j].
Reverse the order of all of the elements after index i till the last element.
A great solution that works is described here: https://www.nayuki.io/page/next-lexicographical-permutation-algorithm. And the solution that, if next permutation exists, returns it, otherwise returns false:
function nextPermutation(array) {
var i = array.length - 1;
while (i > 0 && array[i - 1] >= array[i]) {
i--;
}
if (i <= 0) {
return false;
}
var j = array.length - 1;
while (array[j] <= array[i - 1]) {
j--;
}
var temp = array[i - 1];
array[i - 1] = array[j];
array[j] = temp;
j = array.length - 1;
while (i < j) {
temp = array[i];
array[i] = array[j];
array[j] = temp;
i++;
j--;
}
return array;
}
Using the source cited by #Fleischpfanzerl:
We follow the steps as below to find the next lexicographical permutation:
nums = [0,1,2,5,3,3,0]
nums = [0]*5
curr = nums[-1]
pivot = -1
for items in nums[-2::-1]:
if items >= curr:
pivot -= 1
curr = items
else:
break
if pivot == - len(nums):
print('break') # The input is already the last possible permutation
j = len(nums) - 1
while nums[j] <= nums[pivot - 1]:
j -= 1
nums[j], nums[pivot - 1] = nums[pivot - 1], nums[j]
nums[pivot:] = nums[pivot:][::-1]
> [1, 3, 0, 2, 3, 5]
So the idea is:
The idea is to follow steps -
Find a index 'pivot' from the end of the array such that nums[i - 1] < nums[i]
Find index j, such that nums[j] > nums[pivot - 1]
Swap both these indexes
Reverse the suffix starting at pivot
Homework? Anyway, can look at the C++ function std::next_permutation, or this:
http://blog.bjrn.se/2008/04/lexicographic-permutations-using.html
We can find the next largest lexicographic string for a given string S using the following step.
1. Iterate over every character, we will get the last value i (starting from the first character) that satisfies the given condition S[i] < S[i + 1]
2. Now, we will get the last value j such that S[i] < S[j]
3. We now interchange S[i] and S[j]. And for every character from i+1 till the end, we sort the characters. i.e., sort(S[i+1]..S[len(S) - 1])
The given string is the next largest lexicographic string of S. One can also use next_permutation function call in C++.
nextperm(a, n)
1. find an index j such that a[j….n - 1] forms a monotonically decreasing sequence.
2. If j == 0 next perm not possible
3. Else
1. Reverse the array a[j…n - 1]
2. Binary search for index of a[j - 1] in a[j….n - 1]
3. Let i be the returned index
4. Increment i until a[j - 1] < a[i]
5. Swap a[j - 1] and a[i]
O(n) for each permutation.
I came across a great tutorial.
link : https://www.youtube.com/watch?v=quAS1iydq7U
void Solution::nextPermutation(vector<int> &a) {
int k=0;
int n=a.size();
for(int i=0;i<n-1;i++)
{
if(a[i]<a[i+1])
{
k=i;
}
}
int ele=INT_MAX;
int pos=0;
for(int i=k+1;i<n;i++)
{
if(a[i]>a[k] && a[i]<ele)
{
ele=a[i];pos=i;
}
}
if(pos!=0)
{
swap(a[k],a[pos]);
reverse(a.begin()+k+1,a.end());
}
}
void Solution::nextPermutation(vector<int> &a) {
int i, j=-1, k, n=a.size();
for(i=0; i<n-1; i++) if(a[i] < a[i+1]) j=i;
if(j==-1) reverse(a.begin(), a.end());
else {
for(i=j+1; i<n; i++) if(a[j] < a[i]) k=i;
swap(a[j],a[k]);
reverse(a.begin()+j+1, a.end());
}}
A great solution that works is described here: https://www.nayuki.io/page/next-lexicographical-permutation-algorithm.
and if you are looking for
source code:
/**
* method to find the next lexicographical greater string
*
* #param w
* #return a new string
*/
static String biggerIsGreater(String w) {
char charArray[] = w.toCharArray();
int n = charArray.length;
int endIndex = 0;
// step-1) Start from the right most character and find the first character
// that is smaller than previous character.
for (endIndex = n - 1; endIndex > 0; endIndex--) {
if (charArray[endIndex] > charArray[endIndex - 1]) {
break;
}
}
// If no such char found, then all characters are in descending order
// means there cannot be a greater string with same set of characters
if (endIndex == 0) {
return "no answer";
} else {
int firstSmallChar = charArray[endIndex - 1], nextSmallChar = endIndex;
// step-2) Find the smallest character on right side of (endIndex - 1)'th
// character that is greater than charArray[endIndex - 1]
for (int startIndex = endIndex + 1; startIndex < n; startIndex++) {
if (charArray[startIndex] > firstSmallChar && charArray[startIndex] < charArray[nextSmallChar]) {
nextSmallChar = startIndex;
}
}
// step-3) Swap the above found next smallest character with charArray[endIndex - 1]
swap(charArray, endIndex - 1, nextSmallChar);
// step-4) Sort the charArray after (endIndex - 1)in ascending order
Arrays.sort(charArray, endIndex , n);
}
return new String(charArray);
}
/**
* method to swap ith character with jth character inside charArray
*
* #param charArray
* #param i
* #param j
*/
static void swap(char charArray[], int i, int j) {
char temp = charArray[i];
charArray[i] = charArray[j];
charArray[j] = temp;
}
If you are looking for video explanation for the same, you can visit here.
This problem can be solved just by using two simple algorithms searching and find smaller element in just O(1) extra space and O(nlogn ) time and also easy to implement .
To understand this approach clearly . Watch this Video : https://www.youtube.com/watch?v=DREZ9pb8EQI
def result(lst):
if len(lst) == 0:
return 0
if len(lst) == 1:
return [lst]
l = []
for i in range(len(lst)):
m = lst[i]
remLst = lst[:i] + lst[i+1:]
for p in result(remLst):
l.append([m] + p)
return l
result(['1', '2', '3'])
Start traversing from the end of the list. Compare each one with the previous index value.
If the previous index (say at index i-1) value, consider x, is lower than the current index (index i) value, sort the sublist on right side starting from current position i.
Pick one value from the current position till end which is just higher than x, and put it at index i-1. At the index the value was picked from, put x. That is:
swap(list[i-1], list[j]) where j >= i, and the list is sorted from index "i" onwards
Code:
public void nextPermutation(ArrayList<Integer> a) {
for (int i = a.size()-1; i > 0; i--){
if (a.get(i) > a.get(i-1)){
Collections.sort(a.subList(i, a.size()));
for (int j = i; j < a.size(); j++){
if (a.get(j) > a.get(i-1)) {
int replaceWith = a.get(j); // Just higher than ith element at right side.
a.set(j, a.get(i-1));
a.set(i-1, replaceWith);
return;
}
}
}
}
// It means the values are already in non-increasing order. i.e. Lexicographical highest
// So reset it back to lowest possible order by making it non-decreasing order.
for (int i = 0, j = a.size()-1; i < j; i++, j--){
int tmp = a.get(i);
a.set(i, a.get(j));
a.set(j, tmp);
}
}
Example :
10 40 30 20 => 20 10 30 40 // 20 is just bigger than 10
10 40 30 20 5 => 20 5 10 30 40 // 20 is just bigger than 10. Numbers on right side are just sorted form of this set {numberOnRightSide - justBigger + numberToBeReplaced}.
This is efficient enough up to strings with 11 letters.
// next_permutation example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
void nextPerm(string word) {
vector<char> v(word.begin(), word.end());
vector<string> permvec; // permutation vector
string perm;
int counter = 0; //
int position = 0; // to find the position of keyword in the permutation vector
sort (v.begin(),v.end());
do {
perm = "";
for (vector<char>::const_iterator i = v.begin(); i != v.end(); ++i) {
perm += *i;
}
permvec.push_back(perm); // add permutation to vector
if (perm == word) {
position = counter +1;
}
counter++;
} while (next_permutation(v.begin(),v.end() ));
if (permvec.size() < 2 || word.length() < 2) {
cout << "No answer" << endl;
}
else if (position !=0) {
cout << "Answer: " << permvec.at(position) << endl;
}
}
int main () {
string word = "nextperm";
string key = "mreptxen";
nextPerm(word,key); // will check if the key is a permutation of the given word and return the next permutation after the key.
return 0;
}
I hope this code might be helpful.
int main() {
char str[100];
cin>>str;
int len=strlen(len);
int f=next_permutation(str,str+len);
if(f>0) {
print the string
} else {
cout<<"no answer";
}
}
I believe there's a way to find the kth largest element in an unsorted array of length n in O(n). Or perhaps it's "expected" O(n) or something. How can we do this?
This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.
Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):
Select(A,n,i):
Divide input into ⌈n/5⌉ groups of size 5.
/* Partition on median-of-medians */
medians = array of each group’s median.
pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
Left Array L and Right Array G = partition(A, pivot)
/* Find ith element in L, pivot, or G */
k = |L| + 1
If i = k, return pivot
If i < k, return Select(L, k-1, i)
If i > k, return Select(G, n-k, i-k)
It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.
If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)
The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.
Here is the algorithm.
QuickSelect(A, k)
let r be chosen uniformly at random in the range 1 to length(A)
let pivot = A[r]
let A1, A2 be new arrays
# split into a pile A1 of small elements and A2 of big elements
for i = 1 to n
if A[i] < pivot then
append A[i] to A1
else if A[i] > pivot then
append A[i] to A2
else
# do nothing
end for
if k <= length(A1):
# it's in the pile of small elements
return QuickSelect(A1, k)
else if k > length(A) - length(A2)
# it's in the pile of big elements
return QuickSelect(A2, k - (length(A) - length(A2))
else
# it's equal to the pivot
return pivot
What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of
T(n) = Theta(n) + T(n-1) = Theta(n2)
But if the choices are indeed random, the expected running time is given by
T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))
where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.
Let's guess that T(n) <= an for some a. Then we get
T(n)
<= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
= cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
<= cn + 2 (1/n) ∑i=floor(n/2) to n ai
and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:
∑i=floor(n/2) to n i
= ∑i=1 to n i - ∑i=1 to floor(n/2) i
= n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2
<= n2/2 - (n/4)2/2
= (15/32)n2
where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with
cn + 2 (1/n) ∑i=floor(n/2) to n ai,
<= cn + (2a/n) (15/32) n2
= n (c + (15/16)a)
<= an
provided a > 16c.
This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).
A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17
"Make one pass through tracking the three largest values so far."
(it was specifically for 3d largest)
and this answer:
Build a heap/priority queue. O(n)
Pop top element. O(log n)
Pop top element. O(log n)
Pop top element. O(log n)
Total = O(n) + 3 O(log n) = O(n)
You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).
A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.
I answered the letter of your question :)
The C++ standard library has almost exactly that function call nth_element, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.
const int N = ...;
double a[N];
// ...
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a
You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen.
For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.
Warren's priority heap solution is neater though.
Although not very sure about O(n) complexity, but it will be sure to be between O(n) and nLog(n). Also sure to be closer to O(n) than nLog(n). Function is written in Java
public int quickSelect(ArrayList<Integer>list, int nthSmallest){
//Choose random number in range of 0 to array length
Random random = new Random();
//This will give random number which is not greater than length - 1
int pivotIndex = random.nextInt(list.size() - 1);
int pivot = list.get(pivotIndex);
ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();
//Split list into two.
//Value smaller than pivot should go to smallerNumberList
//Value greater than pivot should go to greaterNumberList
//Do nothing for value which is equal to pivot
for(int i=0; i<list.size(); i++){
if(list.get(i)<pivot){
smallerNumberList.add(list.get(i));
}
else if(list.get(i)>pivot){
greaterNumberList.add(list.get(i));
}
else{
//Do nothing
}
}
//If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list
if(nthSmallest < smallerNumberList.size()){
return quickSelect(smallerNumberList, nthSmallest);
}
//If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
//The step is bit tricky. If confusing, please see the above loop once again for clarification.
else if(nthSmallest > (list.size() - greaterNumberList.size())){
//nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in
//smallerNumberList
nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
return quickSelect(greaterNumberList,nthSmallest);
}
else{
return pivot;
}
}
I implemented finding kth minimimum in n unsorted elements using dynamic programming, specifically tournament method. The execution time is O(n + klog(n)). The mechanism used is listed as one of methods on Wikipedia page about Selection Algorithm (as indicated in one of the posting above). You can read about the algorithm and also find code (java) on my blog page Finding Kth Minimum. In addition the logic can do partial ordering of the list - return first K min (or max) in O(klog(n)) time.
Though the code provided result kth minimum, similar logic can be employed to find kth maximum in O(klog(n)), ignoring the pre-work done to create tournament tree.
Sexy quickselect in Python
def quickselect(arr, k):
'''
k = 1 returns first element in ascending order.
can be easily modified to return first element in descending order
'''
r = random.randrange(0, len(arr))
a1 = [i for i in arr if i < arr[r]] '''partition'''
a2 = [i for i in arr if i > arr[r]]
if k <= len(a1):
return quickselect(a1, k)
elif k > len(arr)-len(a2):
return quickselect(a2, k - (len(arr) - len(a2)))
else:
return arr[r]
As per this paper Finding the Kth largest item in a list of n items the following algorithm will take O(n) time in worst case.
Divide the array in to n/5 lists of 5 elements each.
Find the median in each sub array of 5 elements.
Recursively find the median of all the medians, lets call it M
Partition the array in to two sub array 1st sub-array contains the elements larger than M , lets say this sub-array is a1 , while other sub-array contains the elements smaller then M., lets call this sub-array a2.
If k <= |a1|, return selection (a1,k).
If k− 1 = |a1|, return M.
If k> |a1| + 1, return selection(a2,k −a1 − 1).
Analysis: As suggested in the original paper:
We use the median to partition the list into two halves(the first half,
if k <= n/2 , and the second half otherwise). This algorithm takes
time cn at the first level of recursion for some constant c, cn/2 at
the next level (since we recurse in a list of size n/2), cn/4 at the
third level, and so on. The total time taken is cn + cn/2 + cn/4 +
.... = 2cn = o(n).
Why partition size is taken 5 and not 3?
As mentioned in original paper:
Dividing the list by 5 assures a worst-case split of 70 − 30. Atleast
half of the medians greater than the median-of-medians, hence atleast
half of the n/5 blocks have atleast 3 elements and this gives a
3n/10 split, which means the other partition is 7n/10 in worst case.
That gives T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1, the
worst-case running time isO(n).
Now I have tried to implement the above algorithm as:
public static int findKthLargestUsingMedian(Integer[] array, int k) {
// Step 1: Divide the list into n/5 lists of 5 element each.
int noOfRequiredLists = (int) Math.ceil(array.length / 5.0);
// Step 2: Find pivotal element aka median of medians.
int medianOfMedian = findMedianOfMedians(array, noOfRequiredLists);
//Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian.
List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian
List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian
for (Integer element : array) {
if (element < medianOfMedian) {
listWithSmallerNumbers.add(element);
} else if (element > medianOfMedian) {
listWithGreaterNumbers.add(element);
}
}
// Next step.
if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k);
else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian;
else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1);
return -1;
}
public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) {
int[] medians = new int[noOfRequiredLists];
for (int count = 0; count < noOfRequiredLists; count++) {
int startOfPartialArray = 5 * count;
int endOfPartialArray = startOfPartialArray + 5;
Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray);
// Step 2: Find median of each of these sublists.
int medianIndex = partialArray.length/2;
medians[count] = partialArray[medianIndex];
}
// Step 3: Find median of the medians.
return medians[medians.length / 2];
}
Just for sake of completion, another algorithm makes use of Priority Queue and takes time O(nlogn).
public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) {
int p = 0;
int numElements = nums.length;
// create priority queue where all the elements of nums will be stored
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
// place all the elements of the array to this priority queue
for (int n : nums) {
pq.add(n);
}
// extract the kth largest element
while (numElements - k + 1 > 0) {
p = pq.poll();
k++;
}
return p;
}
Both of these algorithms can be tested as:
public static void main(String[] args) throws IOException {
Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
System.out.println(findKthLargestUsingMedian(numbers, 8));
System.out.println(findKthLargestUsingPriorityQueue(numbers, 8));
}
As expected output is:
18
18
Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies,
Now recurrence becomes:
T(n) = T(n/2) + cn
which gives me O (n) overal.
Below is the link to full implementation with quite an extensive explanation how the algorithm for finding Kth element in an unsorted algorithm works. Basic idea is to partition the array like in QuickSort. But in order to avoid extreme cases (e.g. when smallest element is chosen as pivot in every step, so that algorithm degenerates into O(n^2) running time), special pivot selection is applied, called median-of-medians algorithm. The whole solution runs in O(n) time in worst and in average case.
Here is link to the full article (it is about finding Kth smallest element, but the principle is the same for finding Kth largest):
Finding Kth Smallest Element in an Unsorted Array
How about this kinda approach
Maintain a buffer of length k and a tmp_max, getting tmp_max is O(k) and is done n times so something like O(kn)
Is it right or am i missing something ?
Although it doesn't beat average case of quickselect and worst case of median statistics method but its pretty easy to understand and implement.
There is also one algorithm, that outperforms quickselect algorithm. It's called Floyd-Rivets (FR) algorithm.
Original article: https://doi.org/10.1145/360680.360694
Downloadable version: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf
Wikipedia article https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm
I tried to implement quickselect and FR algorithm in C++. Also I compared them to the standard C++ library implementations std::nth_element (which is basically introselect hybrid of quickselect and heapselect). The result was quickselect and nth_element ran comparably on average, but FR algorithm ran approx. twice as fast compared to them.
Sample code that I used for FR algorithm:
template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
if (n == 0)
return *(std::min_element(data.begin(), data.end()));
else if (n == data.size() - 1)
return *(std::max_element(data.begin(), data.end()));
else
return _FRselect(data, 0, data.size() - 1, n);
}
template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
size_t leftIdx = left;
size_t rightIdx = right;
while (rightIdx > leftIdx)
{
if (rightIdx - leftIdx > 600)
{
size_t range = rightIdx - leftIdx + 1;
long long i = n - (long long)leftIdx + 1;
long long z = log(range);
long long s = 0.5 * exp(2 * z / 3);
long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);
size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);
_FRselect(data, newLeft, newRight, n);
}
T t = data[n];
size_t i = leftIdx;
size_t j = rightIdx;
// arrange pivot and right index
std::swap(data[leftIdx], data[n]);
if (data[rightIdx] > t)
std::swap(data[rightIdx], data[leftIdx]);
while (i < j)
{
std::swap(data[i], data[j]);
++i; --j;
while (data[i] < t) ++i;
while (data[j] > t) --j;
}
if (data[leftIdx] == t)
std::swap(data[leftIdx], data[j]);
else
{
++j;
std::swap(data[j], data[rightIdx]);
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= n)
leftIdx = j + 1;
if (n <= j)
rightIdx = j - 1;
}
return data[leftIdx];
}
template <typename T>
int sgn(T val) {
return (T(0) < val) - (val < T(0));
}
iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)
i would like to suggest one answer
if we take the first k elements and sort them into a linked list of k values
now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)
cheers
Explanation of the median - of - medians algorithm to find the k-th largest integer out of n can be found here:
http://cs.indstate.edu/~spitla/presentation.pdf
Implementation in c++ is below:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int findMedian(vector<int> vec){
// Find median of a vector
int median;
size_t size = vec.size();
median = vec[(size/2)];
return median;
}
int findMedianOfMedians(vector<vector<int> > values){
vector<int> medians;
for (int i = 0; i < values.size(); i++) {
int m = findMedian(values[i]);
medians.push_back(m);
}
return findMedian(medians);
}
void selectionByMedianOfMedians(const vector<int> values, int k){
// Divide the list into n/5 lists of 5 elements each
vector<vector<int> > vec2D;
int count = 0;
while (count != values.size()) {
int countRow = 0;
vector<int> row;
while ((countRow < 5) && (count < values.size())) {
row.push_back(values[count]);
count++;
countRow++;
}
vec2D.push_back(row);
}
cout<<endl<<endl<<"Printing 2D vector : "<<endl;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
cout<<vec2D[i][j]<<" ";
}
cout<<endl;
}
cout<<endl;
// Calculating a new pivot for making splits
int m = findMedianOfMedians(vec2D);
cout<<"Median of medians is : "<<m<<endl;
// Partition the list into unique elements larger than 'm' (call this sublist L1) and
// those smaller them 'm' (call this sublist L2)
vector<int> L1, L2;
for (int i = 0; i < vec2D.size(); i++) {
for (int j = 0; j < vec2D[i].size(); j++) {
if (vec2D[i][j] > m) {
L1.push_back(vec2D[i][j]);
}else if (vec2D[i][j] < m){
L2.push_back(vec2D[i][j]);
}
}
}
// Checking the splits as per the new pivot 'm'
cout<<endl<<"Printing L1 : "<<endl;
for (int i = 0; i < L1.size(); i++) {
cout<<L1[i]<<" ";
}
cout<<endl<<endl<<"Printing L2 : "<<endl;
for (int i = 0; i < L2.size(); i++) {
cout<<L2[i]<<" ";
}
// Recursive calls
if ((k - 1) == L1.size()) {
cout<<endl<<endl<<"Answer :"<<m;
}else if (k <= L1.size()) {
return selectionByMedianOfMedians(L1, k);
}else if (k > (L1.size() + 1)){
return selectionByMedianOfMedians(L2, k-((int)L1.size())-1);
}
}
int main()
{
int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
vector<int> vec(values, values + 25);
cout<<"The given array is : "<<endl;
for (int i = 0; i < vec.size(); i++) {
cout<<vec[i]<<" ";
}
selectionByMedianOfMedians(vec, 8);
return 0;
}
There is also Wirth's selection algorithm, which has a simpler implementation than QuickSelect. Wirth's selection algorithm is slower than QuickSelect, but with some improvements it becomes faster.
In more detail. Using Vladimir Zabrodsky's MODIFIND optimization and the median-of-3 pivot selection and paying some attention to the final steps of the partitioning part of the algorithm, i've came up with the following algorithm (imaginably named "LefSelect"):
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
int l=0, m = n-1, i=l, j=m;
float x;
while (l<m) {
if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
if( a[j] < a[k] ) F_SWAP(a[k],a[j]);
x=a[k];
while (j>k & i<k) {
do i++; while (a[i]<x);
do j--; while (a[j]>x);
F_SWAP(a[i],a[j]);
}
i++; j--;
if (j<k) {
while (a[i]<x) i++;
l=i; j=m;
}
if (k<i) {
while (x<a[j]) j--;
m=j; i=l;
}
}
return a[k];
}
In benchmarks that i did here, LefSelect is 20-30% faster than QuickSelect.
Haskell Solution:
kthElem index list = sort list !! index
withShape ~[] [] = []
withShape ~(x:xs) (y:ys) = x : withShape xs ys
sort [] = []
sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs)
where
ls = filter (< x)
rs = filter (>= x)
This implements the median of median solutions by using the withShape method to discover the size of a partition without actually computing it.
Here is a C++ implementation of Randomized QuickSelect. The idea is to randomly pick a pivot element. To implement randomized partition, we use a random function, rand() to generate index between l and r, swap the element at randomly generated index with the last element, and finally call the standard partition process which uses last element as pivot.
#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;
int randomPartition(int arr[], int l, int r);
// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method. ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
// If k is smaller than number of elements in array
if (k > 0 && k <= r - l + 1)
{
// Partition the array around a random element and
// get position of pivot element in sorted array
int pos = randomPartition(arr, l, r);
// If position is same as k
if (pos-l == k-1)
return arr[pos];
if (pos-l > k-1) // If position is more, recur for left subarray
return kthSmallest(arr, l, pos-1, k);
// Else recur for right subarray
return kthSmallest(arr, pos+1, r, k-pos+l-1);
}
// If k is more than number of elements in array
return INT_MAX;
}
void swap(int *a, int *b)
{
int temp = *a;
*a = *b;
*b = temp;
}
// Standard partition process of QuickSort(). It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
int x = arr[r], i = l;
for (int j = l; j <= r - 1; j++)
{
if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
{
swap(&arr[i], &arr[j]);
i++;
}
}
swap(&arr[i], &arr[r]); // swap the pivot
return i;
}
// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
int n = r-l+1;
int pivot = rand() % n;
swap(&arr[l + pivot], &arr[r]);
return partition(arr, l, r);
}
// Driver program to test above methods
int main()
{
int arr[] = {12, 3, 5, 7, 4, 19, 26};
int n = sizeof(arr)/sizeof(arr[0]), k = 3;
cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
return 0;
}
The worst case time complexity of the above solution is still O(n2).In worst case, the randomized function may always pick a corner element. The expected time complexity of above randomized QuickSelect is Θ(n)
Have Priority queue created.
Insert all the elements into heap.
Call poll() k times.
public static int getKthLargestElements(int[] arr)
{
PriorityQueue<Integer> pq = new PriorityQueue<>((x , y) -> (y-x));
//insert all the elements into heap
for(int ele : arr)
pq.offer(ele);
// call poll() k times
int i=0;
while(i<k)
{
int result = pq.poll();
}
return result;
}
This is an implementation in Javascript.
If you release the constraint that you cannot modify the array, you can prevent the use of extra memory using two indexes to identify the "current partition" (in classic quicksort style - http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/).
function kthMax(a, k){
var size = a.length;
var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2)
//Create an array with all element lower than the pivot and an array with all element higher than the pivot
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
lowerArray.push(current);
} else if (current > pivot) {
upperArray.push(current);
}
}
//Which one should I continue with?
if(k <= upperArray.length) {
//Upper
return kthMax(upperArray, k);
} else {
var newK = k - (size - lowerArray.length);
if (newK > 0) {
///Lower
return kthMax(lowerArray, newK);
} else {
//None ... it's the current pivot!
return pivot;
}
}
}
If you want to test how it perform, you can use this variation:
function kthMax (a, k, logging) {
var comparisonCount = 0; //Number of comparison that the algorithm uses
var memoryCount = 0; //Number of integers in memory that the algorithm uses
var _log = logging;
if(k < 0 || k >= a.length) {
if (_log) console.log ("k is out of range");
return false;
}
function _kthmax(a, k){
var size = a.length;
var pivot = a[parseInt(Math.random()*size)];
if(_log) console.log("Inputs:", a, "size="+size, "k="+k, "pivot="+pivot);
// This should never happen. Just a nice check in this exercise
// if you are playing with the code to avoid never ending recursion
if(typeof pivot === "undefined") {
if (_log) console.log ("Ops...");
return false;
}
var i, lowerArray = [], upperArray = [];
for (i = 0; i < size; i++){
var current = a[i];
if (current < pivot) {
comparisonCount += 1;
memoryCount++;
lowerArray.push(current);
} else if (current > pivot) {
comparisonCount += 2;
memoryCount++;
upperArray.push(current);
}
}
if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);
if(k <= upperArray.length) {
comparisonCount += 1;
return _kthmax(upperArray, k);
} else if (k > size - lowerArray.length) {
comparisonCount += 2;
return _kthmax(lowerArray, k - (size - lowerArray.length));
} else {
comparisonCount += 2;
return pivot;
}
/*
* BTW, this is the logic for kthMin if we want to implement that... ;-)
*
if(k <= lowerArray.length) {
return kthMin(lowerArray, k);
} else if (k > size - upperArray.length) {
return kthMin(upperArray, k - (size - upperArray.length));
} else
return pivot;
*/
}
var result = _kthmax(a, k);
return {result: result, iterations: comparisonCount, memory: memoryCount};
}
The rest of the code is just to create some playground:
function getRandomArray (n){
var ar = [];
for (var i = 0, l = n; i < l; i++) {
ar.push(Math.round(Math.random() * l))
}
return ar;
}
//Create a random array of 50 numbers
var ar = getRandomArray (50);
Now, run you tests a few time.
Because of the Math.random() it will produce every time different results:
kthMax(ar, 2, true);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 2);
kthMax(ar, 34, true);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
kthMax(ar, 34);
If you test it a few times you can see even empirically that the number of iterations is, on average, O(n) ~= constant * n and the value of k does not affect the algorithm.
I came up with this algorithm and seems to be O(n):
Let's say k=3 and we want to find the 3rd largest item in the array. I would create three variables and compare each item of the array with the minimum of these three variables. If array item is greater than our minimum, we would replace the min variable with the item value. We continue the same thing until end of the array. The minimum of our three variables is the 3rd largest item in the array.
define variables a=0, b=0, c=0
iterate through the array items
find minimum a,b,c
if item > min then replace the min variable with item value
continue until end of array
the minimum of a,b,c is our answer
And, to find Kth largest item we need K variables.
Example: (k=3)
[1,2,4,1,7,3,9,5,6,2,9,8]
Final variable values:
a=7 (answer)
b=8
c=9
Can someone please review this and let me know what I am missing?
Here is the implementation of the algorithm eladv suggested(I also put here the implementation with random pivot):
public class Median {
public static void main(String[] s) {
int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
System.out.println(selectK(test,8));
/*
int n = 100000000;
int[] test = new int[n];
for(int i=0; i<test.length; i++)
test[i] = (int)(Math.random()*test.length);
long start = System.currentTimeMillis();
random_selectK(test, test.length/2);
long end = System.currentTimeMillis();
System.out.println(end - start);
*/
}
public static int random_selectK(int[] a, int k) {
if(a.length <= 1)
return a[0];
int r = (int)(Math.random() * a.length);
int p = a[r];
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return random_selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return random_selectK(temp,k-small-equal);
}
}
public static int selectK(int[] a, int k) {
if(a.length <= 5) {
Arrays.sort(a);
return a[k-1];
}
int p = median_of_medians(a);
int small = 0, equal = 0, big = 0;
for(int i=0; i<a.length; i++) {
if(a[i] < p) small++;
else if(a[i] == p) equal++;
else if(a[i] > p) big++;
}
if(k <= small) {
int[] temp = new int[small];
for(int i=0, j=0; i<a.length; i++)
if(a[i] < p)
temp[j++] = a[i];
return selectK(temp, k);
}
else if (k <= small+equal)
return p;
else {
int[] temp = new int[big];
for(int i=0, j=0; i<a.length; i++)
if(a[i] > p)
temp[j++] = a[i];
return selectK(temp,k-small-equal);
}
}
private static int median_of_medians(int[] a) {
int[] b = new int[a.length/5];
int[] temp = new int[5];
for(int i=0; i<b.length; i++) {
for(int j=0; j<5; j++)
temp[j] = a[5*i + j];
Arrays.sort(temp);
b[i] = temp[2];
}
return selectK(b, b.length/2 + 1);
}
}
it is similar to the quickSort strategy, where we pick an arbitrary pivot, and bring the smaller elements to its left, and the larger to the right
public static int kthElInUnsortedList(List<int> list, int k)
{
if (list.Count == 1)
return list[0];
List<int> left = new List<int>();
List<int> right = new List<int>();
int pivotIndex = list.Count / 2;
int pivot = list[pivotIndex]; //arbitrary
for (int i = 0; i < list.Count && i != pivotIndex; i++)
{
int currentEl = list[i];
if (currentEl < pivot)
left.Add(currentEl);
else
right.Add(currentEl);
}
if (k == left.Count + 1)
return pivot;
if (left.Count < k)
return kthElInUnsortedList(right, k - left.Count - 1);
else
return kthElInUnsortedList(left, k);
}
Go to the End of this link : ...........
http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/
You can find the kth smallest element in O(n) time and constant space. If we consider the array is only for integers.
The approach is to do a binary search on the range of Array values. If we have a min_value and a max_value both in integer range, we can do a binary search on that range.
We can write a comparator function which will tell us if any value is the kth-smallest or smaller than kth-smallest or bigger than kth-smallest.
Do the binary search until you reach the kth-smallest number
Here is the code for that
class Solution:
def _iskthsmallest(self, A, val, k):
less_count, equal_count = 0, 0
for i in range(len(A)):
if A[i] == val: equal_count += 1
if A[i] < val: less_count += 1
if less_count >= k: return 1
if less_count + equal_count < k: return -1
return 0
def kthsmallest_binary(self, A, min_val, max_val, k):
if min_val == max_val:
return min_val
mid = (min_val + max_val)/2
iskthsmallest = self._iskthsmallest(A, mid, k)
if iskthsmallest == 0: return mid
if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
return self.kthsmallest_binary(A, mid+1, max_val, k)
# #param A : tuple of integers
# #param B : integer
# #return an integer
def kthsmallest(self, A, k):
if not A: return 0
if k > len(A): return 0
min_val, max_val = min(A), max(A)
return self.kthsmallest_binary(A, min_val, max_val, k)
What I would do is this:
initialize empty doubly linked list l
for each element e in array
if e larger than head(l)
make e the new head of l
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.
Update:
initialize empty sorted tree l
for each element e in array
if e between head(l) and tail(l)
insert e into l // O(log k)
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element
First we can build a BST from unsorted array which takes O(n) time and from the BST we can find the kth smallest element in O(log(n)) which over all counts to an order of O(n).