find median with minimum time in an array

find median with minimum time in an array - algorithm

I have an array lets say a = { 1,4,5,6,2,23,4,2};
now I have to find median of array position from 2 to 6 (odd total terms), so what I have done, I have taken a[1] to a[5] in arr[0] to arr[4] then I have sorted it and write the arr[2] as the median .
But here every time I put values from one array to another, so that the values of my initial array remains the same. Secondly, I have sorted, so this procedure is taking pretty much **time**.
So I want to know if there is any way I can do this differently to reduce my computation time.
Any websites, material to understand, what, and how to do?

Use std::nth_element from <algorithm> which is O(N):
nth_element(a, a + size / 2, a + size);
median = a[size/2];

It is possible to find the median without sorting in O(n) time; algorithms that do this are called selection algorithms.

If you are doing multiple queries on the same array then you could use a Segment Tree. They are generally used to do range minimum/maximum and range sum queries but you can change it to do range median.
A segment tree for a set with n intervals uses O(n log n) storage and can be built in O(n log n) time. A range query can be done in O(log n).
Example of median in range segment tree:
You build the segment tree from the bottom up (update from the top down):
[5]
[3] [7]
[1,2] [4] [6] [8]
1 2 3 4 5 6 7 8
Indices covered by node:
[4]
[2] [6]
[0,1] [3] [5] [7]
0 1 2 3 4 5 6 7
A query for median for range indices of 4-6 would go down this path of values:
[4]
[5]
0 1 2 3 4 5 6 7
Doing a search for the median, you know the number of total elements in the query (3) and the median in that range would be the 2nd element (index 5). So you are essentially doing a search for the first node which contains that index which is node with values [1,2] (indices 0,1).
Doing a search of the median of the range 3-6 is a bit more complicated because you have to search for two indices (4,5) which happen to lie in the same node.
[4]
[6]
[5]
0 1 2 3 4 5 6 7
Segment tree
Range minimum query on Segment Tree

To find the median of an array of less than 9 elements, I think the most efficient is to use a sort algorithm like insertion sort. The complexity is bad, but for such a small array because of the k in the complexity of better algorithms like quicksort, insertion sort is very efficient. Do your own benchmark but I can tell you will have better results with insertion sort than with shell sort or quicksort.

I think the best way is to use the median of medians algorithm of counting the k-th largest element of an array. You can find the overall idea of the algorithm here: Median of Medians in Java , on wikipedia: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm or just browse the internet. Some general improvements can be made during implementation (avoid sorting when choosing the median of particular arrays). However, note that for an array of less than 50 elements its more efficient to use insertion sort than median of medians algorithm.

All existing answers have some downsides in certain situations:
Sorting the entire subrange is not very efficient because one does not need to sort the entire array to get the median, and one needs an additional array if multiple subrange medians are to be found.
Using std::nth_element is more efficient but it still mutates the subrange, so one still needs an additional array.
Using segment tree gets you an efficent solution but you need to either implement the structure yourself or use a third party library.
For this reason, I am posting my approach which uses std::map and is inspired by selection sort algorithm:
First collect the frequencies of elements in first subrange into an object of std::map<int, int>.
With this object, we can efficently find the median of the subrange whose length is subrangeLength:
double median(const std::map<int, int> &histogram, int subrangeLength)
{
const int middle{subrangeLength / 2};
int count{0};
/* We use the fact that keys in std::map are sorted, so by simply iterating
and adding up the frequencies, we can find the median. */
if (subrangeLength % 2 == 1) {
for (const auto &freq : histogram) {
count += freq.second;
/* In case where subrangeLength is odd, "middle" is the lower integer bound of
subrangeLength / 2, so as soon as we cross it, we have found the median. */
if (count > middle) {
return freq.first;
}
}
} else {
std::optional<double> medLeft;
for (const auto &freq : histogram) {
count += freq.second;
/* In case where subrangeLength is even, we need to pay attention to the case when
elements at positions middle and middle + 1 are different. */
if (count == middle) {
medLeft = freq.first;
} else if (count > middle) {
if (!medLeft) {
medLeft = freq.first;
}
return (*medLeft + freq.first) / 2.0;
}
}
}
return -1;
}
Now when we want to get the median of next subrange, we simply update the histogram by decreasing the frequency of the element that is to be removed and add/increase it for the new element (with std::map, this is done in constant time). Now we compute the median again and continue with this until we handle all subranges.

Related

Maximum of all possible subarrays of an array

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3

I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2

I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.

I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.

Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)

I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

Is this a Selection sort or Insertion sort?

Suppose we are sorting an array of ten integers using a some quadratic sorting algorithm. After four iterations of the algorithm's main loop, the array elements are ordered as shown here:
1 2 3 4 5 0 6 7 8 9
Which statement is correct? (Note: Our selection sort picks largest items first.)
A. The algorithm might be either selection sort or insertion sort.
B. The algorithm might be selectionsort, but could not be insertionsort.
C. The algorithm might be insertionsort, but could not be selectionsort.
D. The algorithm is neither selectionsort nor insertionsort.
I think the answer should be A(both Insertion sort and Selection Sort)? But I found on some websites that correct answer is C. I don't know the reason. Could someone please explain me. Correct me if I am wrong.
Source : Google Book

This output is possible for a largest-first selection sort.
It cannot be a smallest-first selection sort because selection sort keeps the invariant that, after n iterations, the first n items in the list are fully-sorted. So, if this was a selection sort, you would expect that the element with the value 0 would have been sorted to index 0 in the first iteration. Thus, the list would look like 0 1 2 3 ... after the first 4 iterations.
The output is possible for both a largest-first and smallest-first insertion sort because insertion sort has a different invariant. In an insertion sort, after n iterations, the first n items in the list (last n for a largest-first sort) are sorted with respect to each other, but they're not necessarily in their final position in the list.
For a smallest-first insertion sort, after 4 iterations, it's possible to see the "out of place" 0 because the sort hasn't yet iterated enough to re-position the 0 in its correct index.

Insertion sort is possible since the current snapshot just finished sorting the array from index 0 up to 4.
Selection sort will pick the minimum value (0 in your example) first and put it to the left end. So the answer is C.

This could be the output of either a largest-first selection sort or an insertion sort. If the array originally looked like that, after 4 iterations of either algorithm, it would still look like that. It couldn't be a smallest-first selection sort, since after 4 iterations of that, the first 4 items of the array would be 0 1 2 3.

According to me answer is C.
*
Selection sort-
for(int x=0; x<n; x++){
int index_of_min = x;
for(int y=x; y<n; y++){
if(array[index_of_min]>array[y]){
index_of_min = y;
}
}
int temp = array[x];
array[x] = array[index_of_min];
array[index_of_min] = temp;
}
*
It is so because in Selection sort you traverse array right of the x in array and find minimum element in every pass and replace that index with a[x].
In above case
1 2 3 4 5 0 6 7 8 9
min element is 0 and it should be the leftmost element if selection sort is used because selection sort in the first pass minimum in the whole array should be at a[0].
Insertion Sort is possible because in insertion sort we pick up element and keep on comparing the current element with the elements to its left till the current element is at its correct sorted location.What i am trying to say is:
*
> Here is an example: for sorting the array the array 52314 First, 2 is
> inserted before 5, resulting in 25314 Then, 3 is inserted between 2
> and 5, resulting in 23514 Next, one is inserted at the start, 12354
> Finally, 4 is inserted between 3 and 5, 12345.
*
So Insertion Sort is possible as it has finished sorting up to its fourth pass.

Inversion distance

First of all let's recall definition of inversion.
Inversion of some sequence S which contains numbers is situation when S[i] > S[j] and i < j or frankly speaking it's situation when we have disordered elements. For instance for sequence:
1 4 3 7 5 6 2
We have following inversions (4,3), (4,2), (3,2), (7,5), etc.
We state problem as follows: distance of inversion is maximum (in terms of indexing) distance between two values that are inversion. For out example we can perform human-brain searching that gives us pair (4,2) <=> (S[1], S[6]) and there for index distance is 6-1 = 5 which is maximum possible for this case.
This problem can be solved trivial way in O(n^2) by finding all inversions and keeping max distance (or updated if we find better option)
We can also perform better inversion searching using merge sort and therefore do the same in O(nlogn). Is there any possibility for existence of O(n) algorithm? Take in mind that we just want maximum distance, we don't want to find all inversions. Elaborate please.

Yes, O(n) algorithm is possible.
We could extract strictly increasing subsequence with greedy algorithm:
source: 1 4 3 7 5 6 2
strictly increasing subsequence: 1 4 7
Then we could extract strictly decreasing subsequence going backwards:
source: 1 4 3 7 5 6 2
strictly decreasing subsequence: 1 2
Note that after this strictly decreasing subsequence is found we could interpret it as increasing sequence (in normal direction).
For each element of these subsequences we need to store their index in source sequence.
Now "inversion distance" could be found by merging these two subsequences (similar to merge sort mentioned in OP, but only one merge pass is needed):
merge 1 & 1 ... no inversion, advance both indices
merge 4 & 2 ... inversion found, distance=5, should advance second index,
but here is end of subsequence, so we are done, max distance = 5

Maybe my idea is the same as #Evgeny.
Here is the explanation:
make a strictly increasing array from the beginning we call it array1
make a strictly decreasing array from the ending which is array2 (But keep the values in increasing order)
***Keep track of original indexes of the values of both arrays.
Now start from the beginning of both arrays.
Do this loop following untill array1 or array2 checking is complete
While( array1[index] > arry2[index] )
{
check the original distance between array1 index and arry2 index.
Update result accordingly.
increase array2 index.
}
increase both array index
Continue with the loop
At the end of this process you will have the maximum result. Proof of this solution is not that complex, you can try it yourself.

How to find pair with kth largest sum?

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
The pairs with largest sums are
13 + 16 = 29
13 + 12 = 25
8 + 16 = 24
13 + 8 = 21
8 + 12 = 20
So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?
Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.
I am already aware of the O(Klogk) solution , using Max-Heap given here .
It also is one of the favorite Google interview question , and they demand a O(k) solution .
I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .
Can someone explain the correct solution with a pseudocode .
P.S.
Please DON'T post this link as answer/comment.It DOESN'T contain the answer.

I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.
Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N2) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?
We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.
This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.
If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.
I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).
A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.
OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.
Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).

EDIT: This does not work. I leave the answer, since apparently I am not the only one who could have this kind of idea; see the discussion below.
A counter-example is x = (2, 3, 6), y = (1, 4, 5) and k=3, where the algorithm gives 7 (3+4) instead of 8 (3+5).
Let x and y be the two arrays, sorted in decreasing order; we want to construct the K-th largest sum.
The variables are: i the index in the first array (element x[i]), j the index in the second array (element y[j]), and k the "order" of the sum (k in 1..K), in the sense that S(k)=x[i]+y[j] will be the k-th greater sum satisfying your conditions (this is the loop invariant).
Start from (i, j) equal to (0, 0): clearly, S(1) = x[0]+y[0].
for k from 1 to K-1, do:
if x[i+1]+ y[j] > x[i] + y[j+1], then i := i+1 (and j does not change) ; else j:=j+1
To see that it works, consider you have S(k) = x[i] + y[j]. Then, S(k+1) is the greatest sum which is lower (or equal) to S(k), and such as at least one element (i or j) changes. It is not difficult to see that exactly one of i or j should change.
If i changes, the greater sum you can construct which is lower than S(k) is by setting i=i+1, because x is decreasing and all the x[i'] + y[j] with i' < i are greater than S(k). The same holds for j, showing that S(k+1) is either x[i+1] + y[j] or x[i] + y[j+1].
Therefore, at the end of the loop you found the K-th greater sum.

tl;dr: If you look ahead and look behind at each iteration, you can start with the end (which is highest) and work back in O(K) time.
Although the insight underlying this approach is, I believe, sound, the code below is not quite correct at present (see comments).
Let's see: first of all, the arrays are sorted. So, if the arrays are a and b with lengths M and N, and as you have arranged them, the largest items are in slots M and N respectively, the largest pair will always be a[M]+b[N].
Now, what's the second largest pair? It's going to have perhaps one of {a[M],b[N]} (it can't have both, because that's just the largest pair again), and at least one of {a[M-1],b[N-1]}. BUT, we also know that if we choose a[M-1]+b[N-1], we can make one of the operands larger by choosing the higher number from the same list, so it will have exactly one number from the last column, and one from the penultimate column.
Consider the following two arrays: a = [1, 2, 53]; b = [66, 67, 68]. Our highest pair is 53+68. If we lose the smaller of those two, our pair is 68+2; if we lose the larger, it's 53+67. So, we have to look ahead to decide what our next pair will be. The simplest lookahead strategy is simply to calculate the sum of both possible pairs. That will always cost two additions, and two comparisons for each transition (three because we need to deal with the case where the sums are equal);let's call that cost Q).
At first, I was tempted to repeat that K-1 times. BUT there's a hitch: the next largest pair might actually be the other pair we can validly make from {{a[M],b[N]}, {a[M-1],b[N-1]}. So, we also need to look behind.
So, let's code (python, should be 2/3 compatible):
def kth(a,b,k):
M = len(a)
N = len(b)
if k > M*N:
raise ValueError("There are only %s possible pairs; you asked for the %sth largest, which is impossible" % M*N,k)
(ia,ib) = M-1,N-1 #0 based arrays
# we need this for lookback
nottakenindices = (0,0) # could be any value
nottakensum = float('-inf')
for i in range(k-1):
optionone = a[ia]+b[ib-1]
optiontwo = a[ia-1]+b[ib]
biggest = max((optionone,optiontwo))
#first deal with look behind
if nottakensum > biggest:
if optionone == biggest:
newnottakenindices = (ia,ib-1)
else: newnottakenindices = (ia-1,ib)
ia,ib = nottakenindices
nottakensum = biggest
nottakenindices = newnottakenindices
#deal with case where indices hit 0
elif ia <= 0 and ib <= 0:
ia = ib = 0
elif ia <= 0:
ib-=1
ia = 0
nottakensum = float('-inf')
elif ib <= 0:
ia-=1
ib = 0
nottakensum = float('-inf')
#lookahead cases
elif optionone > optiontwo:
#then choose the first option as our next pair
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
elif optionone < optiontwo: # choose the second
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#next two cases apply if options are equal
elif a[ia] > b[ib]:# drop the smallest
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
else: # might be equal or not - we can choose arbitrarily if equal
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#+2 - one for zero-based, one for skipping the 1st largest
data = (i+2,a[ia],b[ib],a[ia]+b[ib],ia,ib)
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
if ia <= 0 and ib <= 0:
raise ValueError("Both arrays exhausted before Kth (%sth) pair reached"%data[0])
return data, narrative
For those without python, here's an ideone: http://ideone.com/tfm2MA
At worst, we have 5 comparisons in each iteration, and K-1 iterations, which means that this is an O(K) algorithm.
Now, it might be possible to exploit information about differences between values to optimise this a little bit, but this accomplishes the goal.
Here's a reference implementation (not O(K), but will always work, unless there's a corner case with cases where pairs have equal sums):
import itertools
def refkth(a,b,k):
(rightia,righta),(rightib,rightb) = sorted(itertools.product(enumerate(a),enumerate(b)), key=lamba((ia,ea),(ib,eb):ea+eb)[k-1]
data = k,righta,rightb,righta+rightb,rightia,rightib
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
return data, narrative
This calculates the cartesian product of the two arrays (i.e. all possible pairs), sorts them by sum, and takes the kth element. The enumerate function decorates each item with its index.

The max-heap algorithm in the other question is simple, fast and correct. Don't knock it. It's really well explained too. https://stackoverflow.com/a/5212618/284795
Might be there isn't any O(k) algorithm. That's okay, O(k log k) is almost as fast.

If the last two solutions were at (a1, b1), (a2, b2), then it seems to me there are only four candidate solutions (a1-1, b1) (a1, b1-1) (a2-1, b2) (a2, b2-1). This intuition could be wrong. Surely there are at most four candidates for each coordinate, and the next highest is among the 16 pairs (a in {a1,a2,a1-1,a2-1}, b in {b1,b2,b1-1,b2-1}). That's O(k).
(No it's not, still not sure whether that's possible.)

[2, 3, 5, 8, 13]
[4, 8, 12, 16]
Merge the 2 arrays and note down the indexes in the sorted array. Here is the index array looks like (starting from 1 not 0)
[1, 2, 4, 6, 8]
[3, 5, 7, 9]
Now start from end and make tuples. sum the elements in the tuple and pick the kth largest sum.

public static List<List<Integer>> optimization(int[] nums1, int[] nums2, int k) {
// 2 * O(n log(n))
Arrays.sort(nums1);
Arrays.sort(nums2);
List<List<Integer>> results = new ArrayList<>(k);
int endIndex = 0;
// Find the number whose square is the first one bigger than k
for (int i = 1; i <= k; i++) {
if (i * i >= k) {
endIndex = i;
break;
}
}
// The following Iteration provides at most endIndex^2 elements, and both arrays are in ascending order,
// so k smallest pairs must can be found in this iteration. To flatten the nested loop, refer
// 'https://stackoverflow.com/questions/7457879/algorithm-to-optimize-nested-loops'
for (int i = 0; i < endIndex * endIndex; i++) {
int m = i / endIndex;
int n = i % endIndex;
List<Integer> item = new ArrayList<>(2);
item.add(nums1[m]);
item.add(nums2[n]);
results.add(item);
}
results.sort(Comparator.comparing(pair->pair.get(0) + pair.get(1)));
return results.stream().limit(k).collect(Collectors.toList());
}
Key to eliminate O(n^2):
Avoid cartesian product(or 'cross join' like operation) of both arrays, which means flattening the nested loop.
Downsize iteration over the 2 arrays.
So:
Sort both arrays (Arrays.sort offers O(n log(n)) performance according to Java doc)
Limit the iteration range to the size which is just big enough to support k smallest pairs searching.

How to find sum of elements from given index interval (i, j) in constant time?

Given an array. How can we find sum of elements in index interval (i, j) in constant time. You are allowed to use extra space.
Example:
A: 3 2 4 7 1 -2 8 0 -4 2 1 5 6 -1
length = 14
int getsum(int* arr, int i, int j, int len);
// suppose int array "arr" is initialized here
int sum = getsum(arr, 2, 5, 14);
sum should be 10 in constant time.

If you can spend O(n) time to "prepare" the auxiliary information, based on which you would be able calculate sums in O(1), you could easily do it.
Preparation (O(n)):
aux[0] = 0;
foreach i in (1..LENGTH) {
aux[i] = aux[i-1] + arr[i];
}
Query (O(1)), arr is numerated from 1 to LENGTH:
sum(i,j) = aux[j] - aux[i-1];
I think it was the intent, because, otherwise, it's impossible: for any length to calculate sum(0,length-1) you should have scanned the whole array; this takes linear time, at least.

It cannot be done in constant time unless you store the information.
You would have to do something like specially modify the array to store, for each index, the sum of all values between the start of the array and this index, then using subtraction on the range to get the difference in sums.
However, nothing in your code sample seems to allow this. The array is created by the user (and can change at any time) and you have no control over it.
Any algorithm that needs to scan a group of elements in a sequential unsorted list will be O(n).

Previous answers are absolutely fine for the question asked. I am just adding a point, if this question is changed a bit like:
Find the sum of the interval, if the array gets changed dynamically.
If array elements get changed, then we have to recompute whatever sum we have stored in the auxiliary array as mentioned in #Pavel Shved's approach.
Recomputing is O(n) operation and hence we need to reduce the complexity down to O(nlogn) by making use of Segment Tree.
http://www.geeksforgeeks.org/segment-tree-set-1-sum-of-given-range/

There are three known algorithms for range based queries given [l,r]
1.Segment tree: total query time O(NlogN)
2.Fenwick tree: total query time O(NlogN)
3.Mo's algorithm(square root decomposition)
The first two algorithms can deal with modifications in the list/array given to you. The third algorithm or Mo's algorithm is an offline algorithm means all the queries need to be given to you prior. Modifications in the list/array are not allowed in this algorithm. For implementation, runtime and further reading of this algorithm you can check out this Medium blog. It explains with code. And a very few people actually know about this method.

this question will solve O(n^2)time,O(n)space or O(n)time,O(n)space..
Now the best optimal solution in this case (i.e O(n)time,O(n))
suppose a[]={1,3,5,2,6,4,9} is given
if we create an array(sum[]) in which we kept the value of sum of 0 index to that particular index.like for array a[],sum array will be sum[]={1,4,9,11,17,21,30};like
{1,3+1,3+1+5......} this takes O(n)time and O(n) space..
when we give index then it directly fetch from sum array it means add(i,j)=sum[j]-sum[i-1]; and this takes O(1) times and O(1) spaces...
so,this program takes O(n) time and O(N) spaces..
int sum[]=new int[l];
sum[0]=a[0];
System.out.print(cumsum[0]+" ");
for(int i=1;i<l;i++)
{
sum[i]=sum[i-1]+a[i];
System.out.print(sum[i]+" ");
}
?* this gives 1,4,9,11,17,21,30 and take O(n)time and O(n) spaces */
sum(i,j)=sum[j]-sum[i-1]/this gives sum of indexes from i to j and take O(1)time and O(1) spaces/
so,this program takes O(n) time and O(N) spaces..emphasized text

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

find median with minimum time in an array - algorithm

Use std::nth_element from <algorithm> which is O(N): nth_element(a, a + size / 2, a + size); median = a[size/2];

It is possible to find the median without sorting in O(n) time; algorithms that do this are called selection algorithms.

Related

Maximum of all possible subarrays of an array

Is this a Selection sort or Insertion sort?

Inversion distance

How to find pair with kth largest sum?

How to find sum of elements from given index interval (i, j) in constant time?

Categories

Resources