Maximum of all possible subarrays of an array - algorithm

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3

I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2

I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.

I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.

Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)

I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

Related

Longest Length sub array with elements in a given range

If I have a list of integers, in an array, how do I find the length of the longest sub array, such that the difference between the minimum and maximum element of that array is less than a given integer, say M.
So if we had an array with 3 elements,
[1, 2, 4]
And if M were equal to 2
Then the longest subarry would be [1, 2]
Because if we included 4, and we started from the beginning, the difference would be 3, which is greater than M ( = 2), and if we started from 2, the difference between the largest (4) and smallest element (2) would be 2 and that is not less than 2 (M)
The best I can think of is to start from the left, then go as far right as possible without the sub array range getting too high. Of course at each step we have to keep track of the minimum and maximum element so far. This has an n squared time complexity though, can't we get it faster?
I have an improvement to David Winder's algorithm. The idea is that instead of using two heaps to find the minimum and maximum elements, we can use what I call the deque DP optimization trick (there's probably a proper name for this somewhere).
To understand this, we can look at a simpler problem: finding the minimum element in all subarrays of some size k in an array. The idea is that we keep a double-ended queue containing potential candidates for the minimum element. When we encounter a new element, we pop off all the elements at the back end of the queue more than or equal to the current element before pushing the current element into the back.
We can do this because we know that any subarray we encounter in the future which includes an element that we pop off will also include the current element, and since the current element is less than those elements that gets popped off, those elements will never be the minimum.
After pushing the current element, we pop off the front element in the queue if it is more than k elements away. The minimum element in the current subarray is simply the first element in the queue because the way we popped off the elements from the back of the queue kept it increasing.
To use this algorithm in your problem, we would have two deques to store the minimum and maximum elements. When we encounter a new element which is too much larger than the minimum element, we pop off the front of the deque until the element is no longer too large. The beginning of the longest array ending at that position is then the index of the last element we popped off plus 1.
This makes the solution O(n).
C++ implementation:
int best = std::numeric_limits<int>::lowest(), beg = 0;
//best = length of the longest subarray that meets the requirements so far
//beg = the beginning of the longest subarray ending at the current index
std::deque<int> least, greatest;
//these two deques store the indices of the elements which could cause trouble
for (int i = 0; i < n; i++)
{
while (!least.empty() && a[least.back()] >= a[i])
{
least.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
least.push_back(i);
while (!greatest.empty() && a[greatest.back()] <= a[i])
{
greatest.pop_back();
//we can pop this off since any we encounter subarray which includes this
//in the future will also include the current element
}
greatest.push_back(i);
while (a[least.front()] < a[i] - m)
{
beg = least.front() + 1;
least.pop_front();
//remove elements from the beginning if they are too small
}
while (a[greatest.front()] > a[i] + m)
{
beg = greatest.front() + 1;
greatest.pop_front();
//remove elements from the beginning if they are too large
}
best = std::max(best, i - beg + 1);
}
Consider the following idea:
Let create MaxLen array (size of n) which define as: MaxLen[i] = length of the max sub-array till the i-th place.
After we will fill this array it will be easy (O(n)) to find your max sub-array.
How do we fill the MaxLen array? Assume you know MaxLen[i], What will be in MaxLen[i+1]?
We have 2 option - if the number in originalArr[i+1] do not break your constrains of exceed diff of m in the longest sub-array ending at index i then MaxLen[i+1] = MaxLen[i] + 1 (because we just able to make our previous sub array little bit longer. In the other hand, if originalArr[i+1] bigger or smaller with diff m with one of the last sub array we need to find the element that has diff of m and (let call its index is k) and insert into MaxLen[i+1] = i - k + 1 because our new max sub array will have to exclude the originalArr[k] element.
How do we find this "bad" element? we will use Heap. After every element we pass we insert it value and index to both min and max heap (done in log(n)). When you have the i-th element and you want to check if there is someone in the previous last array who break your sequence you can start extract element from the heap until no element is bigger or smaller the originalArr[i] -> take the max index of the extract element and that your k - the index of the element who broke your sequence.
I will try to simplify with pseudo code (I only demonstrate for min-heap but it the same as the max heap)
Array is input array of size n
min-heap = new heap()
maxLen = array(n) // of size n
maxLen[0] = 1; //max subArray for original Array with size 1
min-heap.push(Array[0], 0)
for (i in (1,n)) {
if (Array[i] - min-heap.top < m) // then all good
maxLen[i] = maxLen[i-1] + 1
else {
maxIndex = min-heap.top.index;
while (Array[i] - min-heap.top.value > m)
maxIndex = max (maxIndex , min-heap.pop.index)
if (empty(min-heap))
maxIndex = i // all element are "bad" so need to start new sub-array
break
//max index is our k ->
maxLen[i] = i - k + 1
}
min-heap.push(Array[i], i)
When you done, run on your max length array and choose the max value (from his index you can extract the begin an end indexes of the original array).
So we had loop over the array (n) and in each insert to 2 heaps (log n).
You would probably saying: Hi! But you also had un-know times of heap extract which force heapify (log n)! But notice that this heap can have max of n element and element can be extract twice so calculate accumolate complecsity and you will see its still o(1).
So bottom line: O(n*logn).
Edited:
This solution can be simplify by using AVL tree instead of 2 heaps - finding min and max are both O(logn) in AVL tree - same goes for insert, find and delete - so just use tree with element of the value and there index in the original array.
Edited 2:
#Fei Xiang even came up with better solution of O(n) using deques.

Finding the sum of maximum difference possible from all subintervals of a given array

I have a problem designing an algorithm. The problem is that this should be executed in O(n) time.
Here is the assignment:
There is an unsorted array "a" with n numbers.
mij=min{ai, ai+1, ..., aj}, Mij=max{ai, ai+1, ..., aj}
Calculate:
S=SUM[i=1,n] { SUM[j=i,n] { (Mij - mij) } }
I am able to solve this in O(nlogn) time. This is a university research assignment. Everything that I tried suggests that this is not possible. I would be very thankful if you could point me in the right direction where to find the solution. Or at least prove that this is not possible.
Further explanation:
Given i and j, find the maximum and minimum elements of the array slice a[i:j]. Subtract those to get the range of the slice, a[max]-a[min].
Now, add up the ranges of all slices for all (i, j) such that 1 <= i <= j <= n. Do it in O(n) time.
This is pretty straight forward problem.
I will assume that it is array of objects (like pair of values or tuples) not numbers. First value is index in array and the second is value.
Right question here is how many time we need to multiply each number and add/subtract it from the sum ie in how many sub-sequences it is maximum and minimum element.
This problem is connected to finding next greatest element (nge), you can see here, just to know it for future problems.
I will write it in pseudo code.
subsum (A):
returnSum = 0
//i am pushing object into the stack. Firt value is index in array, secong is value
lastStackObject.push(-1, Integer.MAX_INT)
for (int i=1; i<n; i++)
next = stack.pop()
stack.push(next)
while (next.value < A[i].value)
last = stack.pop()
beforeLast = stack.peek()
retrunSum = returnSum + last.value*(i-last.index)*(last.index-beforeLast.index)
stack.push(A[i])
while stack is not empty:
last = stack.pop()
beforeLast = stack.peek()
retrunSum = returnSum + last.value*(A.length-last.index)*(last.index-beforeLast.index)
return returnSum
sum(A)
//first we calculate sum of maximum values in subarray, and then sum of minimum values. This is done by simply multiply each value in array by -1
retrun subsum(A)+subsum(-1 for x in A.value)
Time complexity of this code is O(n).
Peek function is just to read next value in stack without popping it.

Given N arrays, how many ways are there for each array to contribute one element and add to k?

Say I had N arrays. These N arrays are in an array of arrays A. How many N-tuples are there such that for a tuple t,
sum = 0
for i = 0 ... N-1
sum += A[i][t[i]]
sum == k
What is an efficient way to solve this? The best I can come up with is just enumerating all possibilities.
P.S. This isn't a homework question. I saw this on LeetCode and was curious about a solution to the general case.
Conceptual solution (can be improved):
sort the elements in each array
shift the elements in each array by the absolute minimum of all arrays (abs_min - to shift all arrays you'll subtract the abs_min from each element of all arrays) - you now have all arrays with non-negative elements and you are searching for a target_sum = initial_sum - num_arrays*abs_min
set your curr_array as the first one
binary search for the position of target_sum in the curr_array. You will need to consider all the elements in curr_array with indices under this position. Take one such element, subtract it from the target_sum, and recursively repeat the search with the next array.
I believe the (amortised) complexity will be somewhere of O(num_arrays*N*log(N)) where N is the (maximum) number of elements in the arrays.
Opportunities for improvement:
I kinda feel that shifting all arrays by abs_min is unnecessary (just an artifice that helps the thinking). Maybe before going one step deeper in recursion in step 4, the target_sum may be shifted by the min of current array?
reordering the arrays so that the shorter ones are considered first will perhaps improve the performance (lower number of elements in the upper levels of the recursion to consider) [Edit] or maybe reordering the arrays in the descending order of their min value (take out from the target_sum in the most aggressive way possible)?
adopting a scheme which eliminate/multiplexes the duplicates inside the initial arrays may help - i.e a map with the index_key=unique_value and the map-value the set of indexes). If the specific tuples are not required, then a map of unique-value->occurrence_count would be enough. (this may be useful if one can be sure that duplicate exist - e.g. the values in the arrays are within tight ranges and arrays are pretty long - pigeonhole principle)
[Edited to show how it works in the example of {{1, 2, 3}, {42,43, 44, 45, 46, 47}}]
Upper limit = index of the element strictly greater than the provided value. If you want values lesser or equal, take values strictly below that index!!
Zero-based index convention
49 target sum in the first array gets the upper limit of index=3 (so all indexes under 3 need to to be considered)
first array - start index=2 / value=3 in the first array, you will be looking for a target_sum of 46 in the second. Upper limit by binary search in the second is index=5 (and will be looking strictly under), so start with index=4/value=46 (the algo cuts out the value of 47). 46 is good and retained, index=3/value=45 is not enough and (not having a 3-rd array to recurse into it) the algo won't even consider under index=3/value=45.
first array, index=1/value=2, looking for a target_sum of 47 in the second array. Get an upper limit (binary search) affords index=7 (to search strictly under it) so index=6/value=47. 47 is retained, 46 and below and the algo cut out
down in the first array, index=0/value=1, looking for a target_sum of 48 in the second array. Upper limit is again 7, at index=6/value=47 we get an insufficient value and terminate.
So, grand totals:
Total binary searches: 1-in the first array, 3 in the second.
Total successful equalities tested=2 (two tuples found).
Total unsuccesful equalities tested=3 (until the second array does no longer offer a satisfactory answer).
Total additions/subtraction performed=3 (one for each value in the first array)
By contrast, the exhaustive scanning would get:
no binary searches
total additions = 3*6=18
total successful equality tested = 2
total un-successful equality tested = 16
Language: C++
Constraints: A[i][j] >= 0
Complexity: O(N * k)
int A [MAX_N][MAX_N], memo[MAX_N][MAX_K+1];
int countWays2(int m, int sum, int N){
if(memo [m][sum] != -1)
return memo [m][sum];
if(m == 0)
return memo[m][sum] = count(A[0], A[0]+N, sum);
int ways = 0;
for(int i = 0; i < N; ++i)
if(sum >= A[m][i])
ways += countWays2(m-1, sum - A[m][i], N);
return memo[m][sum] = ways;
}
int countWays(int N, int k){
if(k < 0) return 0;
for(int i = 0; i < N; ++i)
fill(memo[i], memo[i] + k + 1, -1); //Initialize memoization table
return countWays2(N-1, k, N);
}
The answer is countWays(N, k)

How to find pair with kth largest sum?

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
The pairs with largest sums are
13 + 16 = 29
13 + 12 = 25
8 + 16 = 24
13 + 8 = 21
8 + 12 = 20
So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?
Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.
I am already aware of the O(Klogk) solution , using Max-Heap given here .
It also is one of the favorite Google interview question , and they demand a O(k) solution .
I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .
Can someone explain the correct solution with a pseudocode .
P.S.
Please DON'T post this link as answer/comment.It DOESN'T contain the answer.
I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.
Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N2) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?
We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.
This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.
If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.
I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).
A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.
OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.
Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).
EDIT: This does not work. I leave the answer, since apparently I am not the only one who could have this kind of idea; see the discussion below.
A counter-example is x = (2, 3, 6), y = (1, 4, 5) and k=3, where the algorithm gives 7 (3+4) instead of 8 (3+5).
Let x and y be the two arrays, sorted in decreasing order; we want to construct the K-th largest sum.
The variables are: i the index in the first array (element x[i]), j the index in the second array (element y[j]), and k the "order" of the sum (k in 1..K), in the sense that S(k)=x[i]+y[j] will be the k-th greater sum satisfying your conditions (this is the loop invariant).
Start from (i, j) equal to (0, 0): clearly, S(1) = x[0]+y[0].
for k from 1 to K-1, do:
if x[i+1]+ y[j] > x[i] + y[j+1], then i := i+1 (and j does not change) ; else j:=j+1
To see that it works, consider you have S(k) = x[i] + y[j]. Then, S(k+1) is the greatest sum which is lower (or equal) to S(k), and such as at least one element (i or j) changes. It is not difficult to see that exactly one of i or j should change.
If i changes, the greater sum you can construct which is lower than S(k) is by setting i=i+1, because x is decreasing and all the x[i'] + y[j] with i' < i are greater than S(k). The same holds for j, showing that S(k+1) is either x[i+1] + y[j] or x[i] + y[j+1].
Therefore, at the end of the loop you found the K-th greater sum.
tl;dr: If you look ahead and look behind at each iteration, you can start with the end (which is highest) and work back in O(K) time.
Although the insight underlying this approach is, I believe, sound, the code below is not quite correct at present (see comments).
Let's see: first of all, the arrays are sorted. So, if the arrays are a and b with lengths M and N, and as you have arranged them, the largest items are in slots M and N respectively, the largest pair will always be a[M]+b[N].
Now, what's the second largest pair? It's going to have perhaps one of {a[M],b[N]} (it can't have both, because that's just the largest pair again), and at least one of {a[M-1],b[N-1]}. BUT, we also know that if we choose a[M-1]+b[N-1], we can make one of the operands larger by choosing the higher number from the same list, so it will have exactly one number from the last column, and one from the penultimate column.
Consider the following two arrays: a = [1, 2, 53]; b = [66, 67, 68]. Our highest pair is 53+68. If we lose the smaller of those two, our pair is 68+2; if we lose the larger, it's 53+67. So, we have to look ahead to decide what our next pair will be. The simplest lookahead strategy is simply to calculate the sum of both possible pairs. That will always cost two additions, and two comparisons for each transition (three because we need to deal with the case where the sums are equal);let's call that cost Q).
At first, I was tempted to repeat that K-1 times. BUT there's a hitch: the next largest pair might actually be the other pair we can validly make from {{a[M],b[N]}, {a[M-1],b[N-1]}. So, we also need to look behind.
So, let's code (python, should be 2/3 compatible):
def kth(a,b,k):
M = len(a)
N = len(b)
if k > M*N:
raise ValueError("There are only %s possible pairs; you asked for the %sth largest, which is impossible" % M*N,k)
(ia,ib) = M-1,N-1 #0 based arrays
# we need this for lookback
nottakenindices = (0,0) # could be any value
nottakensum = float('-inf')
for i in range(k-1):
optionone = a[ia]+b[ib-1]
optiontwo = a[ia-1]+b[ib]
biggest = max((optionone,optiontwo))
#first deal with look behind
if nottakensum > biggest:
if optionone == biggest:
newnottakenindices = (ia,ib-1)
else: newnottakenindices = (ia-1,ib)
ia,ib = nottakenindices
nottakensum = biggest
nottakenindices = newnottakenindices
#deal with case where indices hit 0
elif ia <= 0 and ib <= 0:
ia = ib = 0
elif ia <= 0:
ib-=1
ia = 0
nottakensum = float('-inf')
elif ib <= 0:
ia-=1
ib = 0
nottakensum = float('-inf')
#lookahead cases
elif optionone > optiontwo:
#then choose the first option as our next pair
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
elif optionone < optiontwo: # choose the second
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#next two cases apply if options are equal
elif a[ia] > b[ib]:# drop the smallest
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
else: # might be equal or not - we can choose arbitrarily if equal
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#+2 - one for zero-based, one for skipping the 1st largest
data = (i+2,a[ia],b[ib],a[ia]+b[ib],ia,ib)
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
if ia <= 0 and ib <= 0:
raise ValueError("Both arrays exhausted before Kth (%sth) pair reached"%data[0])
return data, narrative
For those without python, here's an ideone: http://ideone.com/tfm2MA
At worst, we have 5 comparisons in each iteration, and K-1 iterations, which means that this is an O(K) algorithm.
Now, it might be possible to exploit information about differences between values to optimise this a little bit, but this accomplishes the goal.
Here's a reference implementation (not O(K), but will always work, unless there's a corner case with cases where pairs have equal sums):
import itertools
def refkth(a,b,k):
(rightia,righta),(rightib,rightb) = sorted(itertools.product(enumerate(a),enumerate(b)), key=lamba((ia,ea),(ib,eb):ea+eb)[k-1]
data = k,righta,rightb,righta+rightb,rightia,rightib
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
return data, narrative
This calculates the cartesian product of the two arrays (i.e. all possible pairs), sorts them by sum, and takes the kth element. The enumerate function decorates each item with its index.
The max-heap algorithm in the other question is simple, fast and correct. Don't knock it. It's really well explained too. https://stackoverflow.com/a/5212618/284795
Might be there isn't any O(k) algorithm. That's okay, O(k log k) is almost as fast.
If the last two solutions were at (a1, b1), (a2, b2), then it seems to me there are only four candidate solutions (a1-1, b1) (a1, b1-1) (a2-1, b2) (a2, b2-1). This intuition could be wrong. Surely there are at most four candidates for each coordinate, and the next highest is among the 16 pairs (a in {a1,a2,a1-1,a2-1}, b in {b1,b2,b1-1,b2-1}). That's O(k).
(No it's not, still not sure whether that's possible.)
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
Merge the 2 arrays and note down the indexes in the sorted array. Here is the index array looks like (starting from 1 not 0)
[1, 2, 4, 6, 8]
[3, 5, 7, 9]
Now start from end and make tuples. sum the elements in the tuple and pick the kth largest sum.
public static List<List<Integer>> optimization(int[] nums1, int[] nums2, int k) {
// 2 * O(n log(n))
Arrays.sort(nums1);
Arrays.sort(nums2);
List<List<Integer>> results = new ArrayList<>(k);
int endIndex = 0;
// Find the number whose square is the first one bigger than k
for (int i = 1; i <= k; i++) {
if (i * i >= k) {
endIndex = i;
break;
}
}
// The following Iteration provides at most endIndex^2 elements, and both arrays are in ascending order,
// so k smallest pairs must can be found in this iteration. To flatten the nested loop, refer
// 'https://stackoverflow.com/questions/7457879/algorithm-to-optimize-nested-loops'
for (int i = 0; i < endIndex * endIndex; i++) {
int m = i / endIndex;
int n = i % endIndex;
List<Integer> item = new ArrayList<>(2);
item.add(nums1[m]);
item.add(nums2[n]);
results.add(item);
}
results.sort(Comparator.comparing(pair->pair.get(0) + pair.get(1)));
return results.stream().limit(k).collect(Collectors.toList());
}
Key to eliminate O(n^2):
Avoid cartesian product(or 'cross join' like operation) of both arrays, which means flattening the nested loop.
Downsize iteration over the 2 arrays.
So:
Sort both arrays (Arrays.sort offers O(n log(n)) performance according to Java doc)
Limit the iteration range to the size which is just big enough to support k smallest pairs searching.

Reduce a sequence in most optimal way

We are given a sequence a of n numbers. The reduction of sequence a is defined as replacing the elements a[i] and a[i+1] with max(a[i],a[i+1]).
Each reduction operation has a cost defined as max(a[i],a[i+1]). After n-1 reductions a sequence of length 1 is obtained.
Now our goal is to print the cost of the optimal reduction of the given sequence a such that the resulting sequence of length 1 has the minimum cost.
e.g.:
1
2
3
Output :
5
An O(N^2) solution is trivial. Any ideas?
EDIT1:
People are asking about my idea, so my idea was to traverse through the sequence pairwise and for each pair check cost and in the end reduce the pair with least cost.
1 2 3
2 3 <=== Cost is 2
So reduce above sequence to
2 3
now again traverse through sequence, we get cost as 3
2 3
3 <=== Cost is 3
So total cost is 2+3=5
Above algorithm is of O(N^2). That is why I was asking for some more optimized idea.
O(n) solution:
High-level:
The basic idea is to repeatedly merge any element e smaller than both its neighbours ns and nl with its smallest neighbour ns. This produces the minimal cost because both the cost and result of merging is max(a[i],a[i+1]), which means no merge can make an element smaller than it currently is, thus the cheapest possible merge for e is with ns, and that merge can't increase the cost of any other possible merges.
This can be done with a one pass algorithm by keeping a stack of elements from our array in decreasing order. We compare the current element to both its neighbours (one being the top of the stack) and perform appropriate merges until we're done.
Pseudo-code:
stack = empty
for pos = 0 to length
// stack.top > arr[pos] is implicitly true because of the previous iteration of the loop
if stack.top > arr[pos] > arr[pos+1]
stack.push(arr[pos])
else if stack.top > arr[pos+1] > arr[pos]
merge(arr[pos], arr[pos+1])
else while arr[pos+1] > stack.top > arr[pos]
merge(arr[pos], stack.pop)
Java code:
Stack<Integer> stack = new Stack<Integer>();
int cost = 0;
int arr[] = {10,1,2,3,4,5};
for (int pos = 0; pos < arr.length; pos++)
if (pos < arr.length-1 && (stack.empty() || stack.peek() >= arr[pos+1]))
if (arr[pos] > arr[pos+1])
stack.push(arr[pos]);
else
cost += arr[pos+1]; // merge pos and pos+1
else
{
int last = Integer.MAX_VALUE; // required otherwise a merge may be missed
while (!stack.empty() && (pos == arr.length-1 || stack.peek() < arr[pos+1]))
{
last = stack.peek();
cost += stack.pop(); // merge stack.pop() and pos or the last popped item
}
if (last != Integer.MAX_VALUE)
{
int costTemp = Integer.MAX_VALUE;
if (!stack.empty())
costTemp = stack.peek();
if (pos != arr.length-1)
costTemp = Math.min(arr[pos+1], costTemp);
cost += costTemp;
}
}
System.out.println(cost);
I am confused if you mean by "cost" of reduction "computational cost" i.e. an operation taking time max(a[i],a[i+1]) or simply something you want to calculate. If it is the latter, then the following algorithm is better than O(n^2):
sort the list, or more precise, define b[i] s.t. a[b[i]] is the sorted list: O(n) if you can use RADIX sort, O(n log n) otherwise.
starting from the second-lowest item i in the sorted list: if left/right is lower than i, then perform reduction: O(1) for each item, update list from 2, O(n) in total.
I have no idea if that is the optimal solution, but it's O(n) for integers and O(n log n), otherwise.
edit: Realized that removing a precomputing step made it much simpler
If you don't consider it cheating to sort the list, then do it in n log n time and then merge the first two entries recursively. The total cost in this case will be the sum of the entries minus the smallest entry. This is optimal since
the cost will be the sum of n-1 entries (with repeats allowed)
the ith smallest entry can appear at most i-1 times in the cost function
The same fundamental idea works even if the list isn't sorted. An optimal solution is to merge the smallest element with its smallest neighbor. To see that this is optimal, note that
the cost will be the sum of n-1 entries (with repeats allowed)
entry a_i can appear at most j-1 times in the cost function, where j is the length of the longest consecutive subsequence containing a_i such that a_i is the maximum element of the subsequence
In the worst case, the sequence is decreasing and the time is O(n^2).
Greedy approach indeed works.
You can always reduce the smallest number with its smaller neighbor.
Proof: we have to reduce smallest number at some point. Any reduction of a neighbor will make the value of neighbor at least the same(possibly) bigger, so operation that reduces minimal element a[i] will always have cost c>=min(a[i-1], a[i+1])
Now we need to
quickly find/remove smallest number
find its neigbors
I'd go with 2 RMQs on that. Doing operation 2 as a binary search. Which gives us O(N * log^2(N))
EDIT: first RMQ - values. When you remove an element put some big value there
second RMQ - "presence". 0 or 1 (value is there/isn't there). To find a [for example] left neighbor of a[i], you need to find the greatest l, that sum[l,i-1] = 1.

Resources