Reduce a sequence in most optimal way - algorithm

We are given a sequence a of n numbers. The reduction of sequence a is defined as replacing the elements a[i] and a[i+1] with max(a[i],a[i+1]).
Each reduction operation has a cost defined as max(a[i],a[i+1]). After n-1 reductions a sequence of length 1 is obtained.
Now our goal is to print the cost of the optimal reduction of the given sequence a such that the resulting sequence of length 1 has the minimum cost.
e.g.:
1
2
3
Output :
5
An O(N^2) solution is trivial. Any ideas?
EDIT1:
People are asking about my idea, so my idea was to traverse through the sequence pairwise and for each pair check cost and in the end reduce the pair with least cost.
1 2 3
2 3 <=== Cost is 2
So reduce above sequence to
2 3
now again traverse through sequence, we get cost as 3
2 3
3 <=== Cost is 3
So total cost is 2+3=5
Above algorithm is of O(N^2). That is why I was asking for some more optimized idea.

O(n) solution:
High-level:
The basic idea is to repeatedly merge any element e smaller than both its neighbours ns and nl with its smallest neighbour ns. This produces the minimal cost because both the cost and result of merging is max(a[i],a[i+1]), which means no merge can make an element smaller than it currently is, thus the cheapest possible merge for e is with ns, and that merge can't increase the cost of any other possible merges.
This can be done with a one pass algorithm by keeping a stack of elements from our array in decreasing order. We compare the current element to both its neighbours (one being the top of the stack) and perform appropriate merges until we're done.
Pseudo-code:
stack = empty
for pos = 0 to length
// stack.top > arr[pos] is implicitly true because of the previous iteration of the loop
if stack.top > arr[pos] > arr[pos+1]
stack.push(arr[pos])
else if stack.top > arr[pos+1] > arr[pos]
merge(arr[pos], arr[pos+1])
else while arr[pos+1] > stack.top > arr[pos]
merge(arr[pos], stack.pop)
Java code:
Stack<Integer> stack = new Stack<Integer>();
int cost = 0;
int arr[] = {10,1,2,3,4,5};
for (int pos = 0; pos < arr.length; pos++)
if (pos < arr.length-1 && (stack.empty() || stack.peek() >= arr[pos+1]))
if (arr[pos] > arr[pos+1])
stack.push(arr[pos]);
else
cost += arr[pos+1]; // merge pos and pos+1
else
{
int last = Integer.MAX_VALUE; // required otherwise a merge may be missed
while (!stack.empty() && (pos == arr.length-1 || stack.peek() < arr[pos+1]))
{
last = stack.peek();
cost += stack.pop(); // merge stack.pop() and pos or the last popped item
}
if (last != Integer.MAX_VALUE)
{
int costTemp = Integer.MAX_VALUE;
if (!stack.empty())
costTemp = stack.peek();
if (pos != arr.length-1)
costTemp = Math.min(arr[pos+1], costTemp);
cost += costTemp;
}
}
System.out.println(cost);

I am confused if you mean by "cost" of reduction "computational cost" i.e. an operation taking time max(a[i],a[i+1]) or simply something you want to calculate. If it is the latter, then the following algorithm is better than O(n^2):
sort the list, or more precise, define b[i] s.t. a[b[i]] is the sorted list: O(n) if you can use RADIX sort, O(n log n) otherwise.
starting from the second-lowest item i in the sorted list: if left/right is lower than i, then perform reduction: O(1) for each item, update list from 2, O(n) in total.
I have no idea if that is the optimal solution, but it's O(n) for integers and O(n log n), otherwise.
edit: Realized that removing a precomputing step made it much simpler

If you don't consider it cheating to sort the list, then do it in n log n time and then merge the first two entries recursively. The total cost in this case will be the sum of the entries minus the smallest entry. This is optimal since
the cost will be the sum of n-1 entries (with repeats allowed)
the ith smallest entry can appear at most i-1 times in the cost function
The same fundamental idea works even if the list isn't sorted. An optimal solution is to merge the smallest element with its smallest neighbor. To see that this is optimal, note that
the cost will be the sum of n-1 entries (with repeats allowed)
entry a_i can appear at most j-1 times in the cost function, where j is the length of the longest consecutive subsequence containing a_i such that a_i is the maximum element of the subsequence
In the worst case, the sequence is decreasing and the time is O(n^2).

Greedy approach indeed works.
You can always reduce the smallest number with its smaller neighbor.
Proof: we have to reduce smallest number at some point. Any reduction of a neighbor will make the value of neighbor at least the same(possibly) bigger, so operation that reduces minimal element a[i] will always have cost c>=min(a[i-1], a[i+1])
Now we need to
quickly find/remove smallest number
find its neigbors
I'd go with 2 RMQs on that. Doing operation 2 as a binary search. Which gives us O(N * log^2(N))
EDIT: first RMQ - values. When you remove an element put some big value there
second RMQ - "presence". 0 or 1 (value is there/isn't there). To find a [for example] left neighbor of a[i], you need to find the greatest l, that sum[l,i-1] = 1.

Related

Q: Count array pairs with bitwise AND > k ~ better than O(N^2) possible?

Given an array nums
Count no. of pairs (two elements) where bitwise AND is greater than K
Brute force
for i in range(0,n):
for j in range(i+1,n):
if a[i]&a[j] > k:
res += 1
Better version:
preprocess to remove all elements ≤k
and then brute force
But i was wondering, what would be the limit in complexity here?
Can we do better with a trie, hashmap approach like two-sum?
( I did not find this problem on Leetcode so I thought of asking here )
Let size_of_input_array = N. Let the input array be of B-bit numbers
Here is an easy to understand and implement solution.
Eliminate all values <= k.
The above image shows 5 10-bit numbers.
Step 1: Adjacency Graph
Store a list of set bits. In our example, 7th bit is set for numbers at index 0,1,2,3 in the input array.
Step 2: The challenge is to avoid counting the same pairs again.
To solve this challenge we take help of union-find data structure as shown in the code below.
//unordered_map<int, vector<int>> adjacency_graph;
//adjacency_graph has been filled up in step 1
vector<int> parent;
for(int i = 0; i < input_array.size(); i++)
parent.push_back(i);
int result = 0;
for(int i = 0; i < adjacency_graph.size(); i++){ // loop 1
auto v = adjacency_graph[i];
if(v.size() > 1){
int different_parents = 1;
for (int j = 1; j < v.size(); j++) { // loop 2
int x = find(parent, v[j]);
int y = find(parent, v[j - 1]);
if (x != y) {
different_parents++;
union(parent, x, y);
}
}
result += (different_parents * (different_parents - 1)) / 2;
}
}
return result;
In the above code, find and union are from union-find data structure.
Time Complexity:
Step 1:
Build Adjacency Graph: O(BN)
Step 2:
Loop 1: O(B)
Loop 2: O(N * Inverse of Ackermann’s function which is an extremely slow-growing function)
Overall Time Complexity
= O(BN)
Space Complexity
Overall space complexity = O(BN)
First, prune everything <= k. Also Sort the value list.
Going from the most significant bit to the least significant we are going to keep track of the set of numbers we are working with (initially all ,s=0, e=n).
Let p be the first position that contains a 1 in the current set at the current position.
If the bit in k is 0, then everything that would yield a 1 world definetly be good and we need to investigate the ones that get a 0. We have (end - p) * (end-p-1) /2 pairs in the current range and (end-p) * <total 1s in this position larger or equal to end> combinations with larger previously good numbers, that we can add to the solution. To continue we update end = p. We want to count 1s in all the numbers above, because we only counted them before in pairs with each other, not with the numbers this low in the set.
If the bit in k is 1, then we can't count any wins yet, but we need to eliminate everything below p, so we update start = p.
You can stop once you went through all the bits or start==end.
Details:
Since at each step we eliminate either everything that has a 0 or everything that has a 1, then everything between start and end will have the same bit-prefix. since the values are sorted we can do a binary search to find p.
For <total 1s in this position larger than p>. We already have the values sorted. So we can compute partial sums and store for every position in the sorted list the number of 1s in every bit position for all numbers above it.
Complexity:
We got bit-by-bit so L (the bit length of the numbers), we do a binary search (logN), and lookup and updates O(1), so this is O(L logN).
We have to sort O(NlogN).
We have to compute partial bit-wise sums O(L*N).
Total O(L logN + NlogN + L*N).
Since N>>L, L logN is subsummed by NlogN. Since L>>logN (probably, as in you have 32 bit numbers but you don't have 4Billion of them), then NlogN is subsummed by L*N. So complexity is O(L * N). Since we also need to keep the partial sums around the memory complexity is also O(L * N).

Maximum of all possible subarrays of an array

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3
I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2
I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.
I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.
Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)
I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

How to find pair with kth largest sum?

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
The pairs with largest sums are
13 + 16 = 29
13 + 12 = 25
8 + 16 = 24
13 + 8 = 21
8 + 12 = 20
So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?
Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.
I am already aware of the O(Klogk) solution , using Max-Heap given here .
It also is one of the favorite Google interview question , and they demand a O(k) solution .
I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .
Can someone explain the correct solution with a pseudocode .
P.S.
Please DON'T post this link as answer/comment.It DOESN'T contain the answer.
I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.
Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N2) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?
We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.
This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.
If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.
I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).
A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.
OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.
Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).
EDIT: This does not work. I leave the answer, since apparently I am not the only one who could have this kind of idea; see the discussion below.
A counter-example is x = (2, 3, 6), y = (1, 4, 5) and k=3, where the algorithm gives 7 (3+4) instead of 8 (3+5).
Let x and y be the two arrays, sorted in decreasing order; we want to construct the K-th largest sum.
The variables are: i the index in the first array (element x[i]), j the index in the second array (element y[j]), and k the "order" of the sum (k in 1..K), in the sense that S(k)=x[i]+y[j] will be the k-th greater sum satisfying your conditions (this is the loop invariant).
Start from (i, j) equal to (0, 0): clearly, S(1) = x[0]+y[0].
for k from 1 to K-1, do:
if x[i+1]+ y[j] > x[i] + y[j+1], then i := i+1 (and j does not change) ; else j:=j+1
To see that it works, consider you have S(k) = x[i] + y[j]. Then, S(k+1) is the greatest sum which is lower (or equal) to S(k), and such as at least one element (i or j) changes. It is not difficult to see that exactly one of i or j should change.
If i changes, the greater sum you can construct which is lower than S(k) is by setting i=i+1, because x is decreasing and all the x[i'] + y[j] with i' < i are greater than S(k). The same holds for j, showing that S(k+1) is either x[i+1] + y[j] or x[i] + y[j+1].
Therefore, at the end of the loop you found the K-th greater sum.
tl;dr: If you look ahead and look behind at each iteration, you can start with the end (which is highest) and work back in O(K) time.
Although the insight underlying this approach is, I believe, sound, the code below is not quite correct at present (see comments).
Let's see: first of all, the arrays are sorted. So, if the arrays are a and b with lengths M and N, and as you have arranged them, the largest items are in slots M and N respectively, the largest pair will always be a[M]+b[N].
Now, what's the second largest pair? It's going to have perhaps one of {a[M],b[N]} (it can't have both, because that's just the largest pair again), and at least one of {a[M-1],b[N-1]}. BUT, we also know that if we choose a[M-1]+b[N-1], we can make one of the operands larger by choosing the higher number from the same list, so it will have exactly one number from the last column, and one from the penultimate column.
Consider the following two arrays: a = [1, 2, 53]; b = [66, 67, 68]. Our highest pair is 53+68. If we lose the smaller of those two, our pair is 68+2; if we lose the larger, it's 53+67. So, we have to look ahead to decide what our next pair will be. The simplest lookahead strategy is simply to calculate the sum of both possible pairs. That will always cost two additions, and two comparisons for each transition (three because we need to deal with the case where the sums are equal);let's call that cost Q).
At first, I was tempted to repeat that K-1 times. BUT there's a hitch: the next largest pair might actually be the other pair we can validly make from {{a[M],b[N]}, {a[M-1],b[N-1]}. So, we also need to look behind.
So, let's code (python, should be 2/3 compatible):
def kth(a,b,k):
M = len(a)
N = len(b)
if k > M*N:
raise ValueError("There are only %s possible pairs; you asked for the %sth largest, which is impossible" % M*N,k)
(ia,ib) = M-1,N-1 #0 based arrays
# we need this for lookback
nottakenindices = (0,0) # could be any value
nottakensum = float('-inf')
for i in range(k-1):
optionone = a[ia]+b[ib-1]
optiontwo = a[ia-1]+b[ib]
biggest = max((optionone,optiontwo))
#first deal with look behind
if nottakensum > biggest:
if optionone == biggest:
newnottakenindices = (ia,ib-1)
else: newnottakenindices = (ia-1,ib)
ia,ib = nottakenindices
nottakensum = biggest
nottakenindices = newnottakenindices
#deal with case where indices hit 0
elif ia <= 0 and ib <= 0:
ia = ib = 0
elif ia <= 0:
ib-=1
ia = 0
nottakensum = float('-inf')
elif ib <= 0:
ia-=1
ib = 0
nottakensum = float('-inf')
#lookahead cases
elif optionone > optiontwo:
#then choose the first option as our next pair
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
elif optionone < optiontwo: # choose the second
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#next two cases apply if options are equal
elif a[ia] > b[ib]:# drop the smallest
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
else: # might be equal or not - we can choose arbitrarily if equal
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#+2 - one for zero-based, one for skipping the 1st largest
data = (i+2,a[ia],b[ib],a[ia]+b[ib],ia,ib)
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
if ia <= 0 and ib <= 0:
raise ValueError("Both arrays exhausted before Kth (%sth) pair reached"%data[0])
return data, narrative
For those without python, here's an ideone: http://ideone.com/tfm2MA
At worst, we have 5 comparisons in each iteration, and K-1 iterations, which means that this is an O(K) algorithm.
Now, it might be possible to exploit information about differences between values to optimise this a little bit, but this accomplishes the goal.
Here's a reference implementation (not O(K), but will always work, unless there's a corner case with cases where pairs have equal sums):
import itertools
def refkth(a,b,k):
(rightia,righta),(rightib,rightb) = sorted(itertools.product(enumerate(a),enumerate(b)), key=lamba((ia,ea),(ib,eb):ea+eb)[k-1]
data = k,righta,rightb,righta+rightb,rightia,rightib
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
return data, narrative
This calculates the cartesian product of the two arrays (i.e. all possible pairs), sorts them by sum, and takes the kth element. The enumerate function decorates each item with its index.
The max-heap algorithm in the other question is simple, fast and correct. Don't knock it. It's really well explained too. https://stackoverflow.com/a/5212618/284795
Might be there isn't any O(k) algorithm. That's okay, O(k log k) is almost as fast.
If the last two solutions were at (a1, b1), (a2, b2), then it seems to me there are only four candidate solutions (a1-1, b1) (a1, b1-1) (a2-1, b2) (a2, b2-1). This intuition could be wrong. Surely there are at most four candidates for each coordinate, and the next highest is among the 16 pairs (a in {a1,a2,a1-1,a2-1}, b in {b1,b2,b1-1,b2-1}). That's O(k).
(No it's not, still not sure whether that's possible.)
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
Merge the 2 arrays and note down the indexes in the sorted array. Here is the index array looks like (starting from 1 not 0)
[1, 2, 4, 6, 8]
[3, 5, 7, 9]
Now start from end and make tuples. sum the elements in the tuple and pick the kth largest sum.
public static List<List<Integer>> optimization(int[] nums1, int[] nums2, int k) {
// 2 * O(n log(n))
Arrays.sort(nums1);
Arrays.sort(nums2);
List<List<Integer>> results = new ArrayList<>(k);
int endIndex = 0;
// Find the number whose square is the first one bigger than k
for (int i = 1; i <= k; i++) {
if (i * i >= k) {
endIndex = i;
break;
}
}
// The following Iteration provides at most endIndex^2 elements, and both arrays are in ascending order,
// so k smallest pairs must can be found in this iteration. To flatten the nested loop, refer
// 'https://stackoverflow.com/questions/7457879/algorithm-to-optimize-nested-loops'
for (int i = 0; i < endIndex * endIndex; i++) {
int m = i / endIndex;
int n = i % endIndex;
List<Integer> item = new ArrayList<>(2);
item.add(nums1[m]);
item.add(nums2[n]);
results.add(item);
}
results.sort(Comparator.comparing(pair->pair.get(0) + pair.get(1)));
return results.stream().limit(k).collect(Collectors.toList());
}
Key to eliminate O(n^2):
Avoid cartesian product(or 'cross join' like operation) of both arrays, which means flattening the nested loop.
Downsize iteration over the 2 arrays.
So:
Sort both arrays (Arrays.sort offers O(n log(n)) performance according to Java doc)
Limit the iteration range to the size which is just big enough to support k smallest pairs searching.

Find triplets in better than linear time such that A[n-1] >= A[n] <= A[n+1]

A sequence of numbers was given in an interview such that A[0] >= A[1] and A[N-1] >= A[N-2]. I was asked to find at-least one triplet such that A[n-1] >= A[n] <= A[n+1].
I tried to solve in iterations. Interviewer expected better than linear time solution. How should I approach this question?
Example: 9 8 5 4 3 2 6 7
Answer: 3 2 6
We can solve this in O(logn) time using divide & conquer aka. binary search. Better than linear time. So we need to find a triplet such that A[n-1] >= A[n] <= A[n+1].
First find the mid of the given array. If mid is smaller than its left and greater than its right. then return, thats your answer. Incidentally this would be a basecase in your recursion. Also if len(arr) < 3 then too return. another basecase.
Now comes the recursion scenarios. When to recurse, we would need to inspect further right. For that, If mid is greater than the element on its left then consider start to left of the array as a subproblem and recurse with this new array. i.e. in tangible terms at this point we would have ...26... with index n being 6. So we move left to see if the element to the left of 2 completes the triplet.
Otherwise if mid is greater than element on its right subarray then consider mid+1 to right of the array as a subproblem and recurse.
More Theory: The above should be sufficient to understand the problem but read on. The problem essentially boils down to finding local minima in a given set of elements. A number in the array is called local minima if it is smaller than both its left and right numbers which precisely boils down to A[n-1] >= A[n] <= A[n+1].
A given array such that its first 2 elements are decreasing and last 2 elements are increasing HAS to have a local minima. Why is that? Lets prove this by negation. If first two numbers are decreasing, and there is no local minima, that means 3rd number is less than 2nd number. otherwise 2nd number would have been local minima. Following the same logic 4th number will have to be less than 3rd number and so on and so forth. So the numbers in the array will have to be in decreasing order. Which violates the constraint of last two numbers being in increasing order. This proves by negation that there need to be a local minima.
The above theory suggests a O(n) linear approach but we definitely can do better. But the theory definitely gives us a different perspective about the problem.
Code: Here's python code (fyi - was typed in stackoverflow text editor freehand, it might misbheave).
def local_minima(arr, start, end):
mid = (start+end)/2
if mid-2 < 0 and mid+1 >= len(arr):
return -1;
if arr[mid-2] > arr[mid-1] and arr[mid-1] < arr[mid]: #found it!
return mid-1;
if arr[mid-1] > arr[mid-2]:
return local_minima(arr, start, mid);
else:
return local_minima(arr, mid, end);
Note that I just return the index of the n. To print out the triple just do -1 and +1 to the returned index. source
It sounds like what you're asking is this:
You have a sequence of numbers. It starts decreasing and continues to decrease until element n, then it starts increasing until the end of the sequence. Find n.
This is a (non-optimal) solution in linear time:
for (i = 1; i < length(A) - 1; i++)
{
if ((A[i-1] >= A[i]) && (A[i] <= A[i+1]))
return i;
}
To do better than linear time, you need to use the information that you get from the fact that the series decreases then increases.
Consider the difference between A[i] and A[i+1]. If A[i] > A[i+1], then n > i, since the values are still decreasing. If A[i] <= A[i+1], then n <= i, since the values are now increasing. In this case you need to check the difference between A[i-1] and A[i].
This is a solution in log time:
int boundUpper = length(A) - 1;
int boundLower = 1;
int i = (boundUpper + boundLower) / 2; //initial estimate
while (true)
{
if (A[i] > A[i+1])
boundLower = i + 1;
else if (A[i-1] >= A[i])
return i;
else
boundUpper = i;
i = (boundLower + boundUpper) / 2;
}
I'll leave it to you to add in the necessary error check in the case that A does not have an element satisfying the criteria.
Linear you could just do by iterating through the set, comparing them all.
You could also check the slope of the first two, then do a kind of binary chop/in order traversal comparing pairs until you find one of the opposite slope. That would amortize to a better than n time, I think, though it's not guaranteed.
edit: just realised what your ordering meant. The binary chop method is guaranteed to do this in <n time, as there is guaranteed to be a point of change (assuming that your N-1, N-2 are the last two elements of the list).
This means you just need to find it/one of them, in which case binary chop will do it in order log(n)

Algorithm for counting specific type of inversions in an array

I need an algorithm that counts the inversions of type:
Inversion between a and b exists if a has lower index and a > 2b.
Can you think of an algorithm that would do it in O(n logn)?
It can be done via a small tweak in merge sort algorithm. Counting inversions in an array
In the normal standard algorithm during the merge phase you compare elements from left and right half and increase inversions by number of elements remaining in Left portion. Here we increment not by the number of elements remaining in the left half but rather by the number of elements remaining in the left half which are more than twice as large.
A[1..n]
B[1..n] = copy(A)
sort(B) //n*log(n)
for i = 1 to n-1
//log(n)
exists = specialBinarySearch(B, A[i], 1, n)
//log(n)
setHighest(B, A[i], 1, n)
if exists
count++
specialBinarySearch(a, key, from, to)
if from <= to
mid = from + (to-from)/2
if a[mid] < floor(key/2)
return true
else //must go to left of it to get even smaller value
specialBinarySearch(a, key, from, mid-1)
else
return false
setHighest(a, key, from, to)
if from <= to
mid = from + (to-from)/2
if a[mid] == key
a[mid] = INT_MAX
else if a[mid] < key
setHighest(a, key, mid+1, to)
else
setHighest(a, key, from, mid-1)
OK. So, basically here are the steps.
Copy to an auxiliary array B. This O(n)
Sort with any n*logn algorithm
For each element a in A, perform a binary search in B for any element B[i] such that a > 2*B[i]. O(logn). (the algorithm I have written to avoid overflow)
Since we do not have to take B[i] into account, make it disqualify for comparison by setting B[i] = infinity. Another binary search. O(logn)
Repeat 3 and 4 till it exhausts.
So, lets calculate we have
O(n) + O(n*log(n)) + n*O(log(n))
=> O(n*log(n)) asymptotically
This may be solved using dynamic order statistics data structure. I know two alternatives for such a structure:
Order statistic tree
Indexable skiplist
For each element of the array (b) in order, find rank of the value 2b in the order statistics data structure. Then insert b into the order statistics data structure.
Rank of the value 2b gives number of elements a, that have lower index and are less than 2b. Sum of these numbers gives number of "inversions".

Resources