"Lengthless" fast sorting algorithm - algorithm

I've been looking into sorting algorithms. So far, all the sorting algorithms I've found either rely on a known length (pretty much all sort algos. I can't use them because "proper" length is O(n)), or are slower than quicksort (e.g. insertion sort).
In Lua, there are 2 concepts of length:
Proper sequence length
Is O(n)
Used by ipairs etc
Sequence length
Is O(log n)
Has holes (nil values)
Used by table.insert etc
I've looked into heapsort, but heapsort needs to build a heap, then sort. It doesn't do both as a single operation, which means it still suffers from the O(n) length problem.
With insertion sort, you just run the insertion sort algorithm until you hit the first nil. This sorts only the "proper sequence" part of a table (that is, the keys from 1 to n without any nil values), but insertion sort is slower than quicksort.
Are there any in-place sorting algorithms that, like insertion sort, don't depend on length, but with performance comparable to that of quicksort?
Example insertion sort code (with some help from wikipedia):
function isort(t)
-- In-place insertion sort that never uses the length operator.
-- Stops at the first nil, as expected. Breaks if you use "for i = 1, #t do"
for i in ipairs(t) do
local j = i
while j > 1 and t[j-1] > t[j] do
t[j], t[j-1] = t[j-1], t[j]
j = j - 1
end
end
end
local t = {6, 5, 3, 1, 7, 2, 4, nil, 1, 1, 8, 3, 4, nil, nil, 1}
isort(t)
io.write("{")
if #t > 0 then
io.write(tostring(t[1]))
for i=2, #t do
io.write(", ")
io.write(tostring(t[i]))
end
end
io.write("}\n")
-- stdout:
-- {1, 2, 3, 4, 5, 6, 7, nil, 1, 1, 8, 3, 4, nil, nil, 1}

Since the sort itself must take at least O(n log n), an extra O(n) scan doesn't seem like it would invalidate the algorithm. Using quadratic algorithms such as insertion or bubble sort is false economy.
You could use the heapsort variant where you simply iteratively insert into a growing heap, rather than using the O(n) buildheap algorithm. Heapsort is definitely O(n log n), even if you build the heap incrementally, but I doubt whether it is competitive with quicksort. (It's definitely competitive with insertion sort for large inputs, particularly large inputs in reverse order.)
You can see pseudocode for standard heapsort in Wikipedia. My pseudocode below differs in that it doesn't require the size of the array as a parameter, but instead returns it as the result. It also uses 1-based vectors rather than 0-based, since you are using Lua, so a is assume to run from a[1] to a[count] for some value of count.
procedure heapsort(a):
input: an array of comparable elements
output: the number of elements in the array.
(Heapify successive prefixes of the array)
count ← 1
while a has an element indexed by count:
siftup(a, count)
count ← count + 1
count ← count - 1
(Extract the sorted list from the heap)
i ← count
while i > 1:
swap(a, 1, i)
i ← i - 1
siftdown(a, i)
return count
siftup and siftdown are the standard heap functions, here presented in the 1-based version. The code provided uses a standard optimization in which the sifting is done with a single rotation instead of a series of swaps; this cuts the number of array references significantly. (The swap in the heapsort procedure could be integrated into siftdown for a slight additional savings but it obscures the algorithm. If you wanted to use this optimization, change val ← a[1] to val ← a[count + 1]; a[count + 1] ← a[1] and remove the swap from heapsort.)
In a 1-based heap, the parent of node i is node floor(i/2) and the children of node i are nodes 2i and 2i+1. Recall that the heap constraint requires that every node be no less than its parent. (That produces a minheap, which is used to produce a descending sort. If you want an ascending sort, you need a maxheap, which means changing the three value comparisons below from > to <.)
procedure siftup(a, count):
input: a vector of length count, of which the first count - 1
elements satisfy the heap constraint.
result: the first count elements of a satisfy the heap constraint.
val ← a[count]
loop:
parent ← floor(count / 2)
if parent == 0 or val > a[parent]:
a[count] ← val
return
else
a[count] ← a[parent]
count ← parent
procedure siftdown(a, count):
input: a vector of length count which satisfies the heap constraint
except for the first element.
result: the first count elements of a satisfy the heap constraint.
val ← a[1]
parent ← 1
loop :
child ← 2 * parent
if child < count and a[child] > a[child + 1]:
child ← child + 1
if count < child or not (val > a[child]):
a[parent] ← val
return
else
a[parent] ← a[child]
parent ← child

Related

kth order statistic in range [i, j]

There is a problem but I can't find a efficient algorithm for it.
Problem
Given an array of numbers a[1], ..., a[n], we get queries of the kind:
SELECT(i, j, k): find k-th smallest number in range [i, j] after sorting a[i], a[i+1], ..., a[j]
SET(i, value): perform a[i] = value
Example
Input:
5 5 // n = 5 (size of array), m = 5 (number of query)
5 10 9 6 7
select 2 4 1
select 2 4 2
set 3 12
set 4 15
select 2 4 1
Output:
6
9
10
I think that we can implement this with Merge Sort Tree (Special segment tree). I found this in internet: merge sort tree for range order statistics
but because we can change array value, this algorithm in not efficient.
Is it possible to help me, How can I implement it efficiently?
Thanks.
I don't know about the merge-sort-tree but I can think of different data-structure / algorithm that gives you the desire output in O(n) per query.
Notice solution for this problem depends on the distribution between SET and SELECT queries -> I assume there are more SELECT's so I tried to lower that complexity. If you have more SET's then I would use #miradham answer:
david miradham
SET O(n) O(1)
SELECT O(n) O(nlogn)
Space O(n) O(n)
Both solution are space complexity of O(n).
In your question you used indexes that start from 1 -> I will modify it to start from 0.
Let look at your example: a = array (5, 10, 9, 6, 7). As pre-processing we will create sorted array that contains also the original index of the elements -> b = array(5(0), 6(3), 7(4), 9(2), 10(1)) when the number in bracket is the index in the original array a. This can be done in O(nlogn).
How do we deals with the queries?
SELECT(i, j, k):
let cnt = 1;
for m in b (sorted array)
if m(index) <= i && m(index) <= j // the index is in given range
if (cnt == k)
return k // found the k lowest
else cnt++
This is O(n) as you loop over b
SET(i, value):
Changing a is easy and can be done in O(1). Changing b:
originalValue = a[i] // old value
Add [value(i)] to b as new element // O(logn) as b sorted
Remove [originalValue(i)] from b // b sorted but array implementation may cause O(n)
Total of O(n)
If further explanation is needed feel free to ask. Hope that helps!

Time complexity with insertion sort for 2^N array?

Consider an array of integers, which has a size of 2^N, where the element at index X (0 ≤ X < 2N) is X xor 3 (that is, two least significant bits of X are flipped). What is the running time of the insertion sort on this array?
Examine the structure of what the lists looks like:
For n = 2:
{3, 2, 1, 0}
For n = 3 :
{3, 2, 1, 0, 7, 6, 5, 4}
For insertion sort, you're maintaining the invariant that the list from 1 up to your current index is sorted, so you're task at each step is to place the current element into it's correct place among the sorted elements before it. In the worst case, you will have to traverse all previous indices before you can insert the current value (think of the case where the list is in reverse sorted order). But it's clear from the structure above that for a list with the property that each value is equivalent to the index ^ 3, that the furthest back in the list that you would have to go from any given index is 3. So you've reduced the possibility that you'd have to do O(n) work at the insertion step to a constant factor. But you still have to do O(n) work to examine each element of the list. So, for this particular case, the running time of insertion sort is linear in the size of the input, whereas in the worst case it is quadratic.

Heap's algorithm for permutations

I'm preparing for interviews and I'm trying to memorize Heap's algorithm:
procedure generate(n : integer, A : array of any):
if n = 1 then
output(A)
else
for i := 0; i < n; i += 1 do
generate(n - 1, A)
if n is even then
swap(A[i], A[n-1])
else
swap(A[0], A[n-1])
end if
end for
end if
This algorithm is a pretty famous one to generate permutations. It is concise and fast and goes hand-in-hand with the code to generate combinations.
The problem is: I don't like to memorize things by heart and I always try to keep the concepts to "deduce" the algorithm later.
This algorithm is really not intuitive and I can't find a way to explain how it works to myself.
Can someone please tell me why and how this algorithm works as expected when generating permutations?
Heap's algorithm is probably not the answer to any reasonable interview question. There is a much more intuitive algorithm which will produce permutations in lexicographical order; although it is amortized O(1) (per permutation) instead of O(1), it is not noticeably slower in practice, and it is much easier to derive on the fly.
The lexicographic order algorithm is extremely simple to describe. Given some permutation, find the next one by:
Finding the rightmost element which is smaller than the element to its right.
Swap that element with the smallest element to its right which is larger than it.
Reverse the part of the permutation to the right of where that element was.
Both steps (1) and (3) are worst-case O(n), but it is easy to prove that the average time for those steps is O(1).
An indication of how tricky Heap's algorithm is (in the details) is that your expression of it is slightly wrong because it does one extra swap; the extra swap is a no-op if n is even, but significantly changes the order of permutations generated when n is odd. In either case, it does unnecessary work. See https://en.wikipedia.org/wiki/Heap%27s_algorithm for the correct algorithm (at least, it's correct today) or see the discussion at Heap's algorithm permutation generator
To see how Heap's algorithm works, you need to look at what a full iteration of the loop does to the vector, in both even and odd cases. Given a vector of even length, a full iteration of Heap's algorithm will rearrange the elements according to the rule
[1,...n] → [(n-2),(n-1),2,3,...,(n-3),n,1]
whereas if the vector is of odd length, it will be simply swap the first and last elements:
[1,...n] → [n,2,3,4,...,(n-2),(n-1),1]
You can prove that both of these facts are true using induction, although that doesn't provide any intuition as to why it's true. Looking at the diagram on the Wikipedia page might help.
I found an article that tries to explain it here: Why does Heap's algorithm work?
However, I think it is hard to understand it, so came up with an explanation that is hopefully easier to understand:
Please just assume that these statements are true for a moment (i'll show that later):
Each invocation of the "generate" function
(I) where n is odd, leaves the elements in the exact same ordering when it is finished.
(II) where n is even, rotates the elements to the right, for example ABCD becomes DABC.
So in the "for i"-loop
when
n is even
The recursive call "generate(n - 1, A)" does not change the order.
So the for-loop can iteratively swap the element at i=0..(n-1) with the element at (n - 1) and will have called "generate(n - 1, A)" each time with another element missing.
n is odd
The recursive call "generate(n - 1, A)" has rotated the elements right.
So the element at index 0 will always be a different element automatically.
Just swap the elements at 0 and (n-1) in each iteration to produce a unique set of elements.
Finally, let's see why the initial statements are true:
Rotate-right
(III) This series of swaps result in a rotation to the right by one position:
A[0] <-> A[n - 1]
A[1] <-> A[n - 1]
A[2] <-> A[n - 1]
...
A[n - 2] <-> A[n - 1]
For example try it with sequence ABCD:
A[0] <-> A[3]: DBCA
A[1] <-> A[3]: DACB
A[2] <-> A[3]: DABC
No-op
(IV) This series of steps leaves the sequence in the exact same ordering as before:
Repeat n times:
Rotate the sub-sequence a[0...(n-2)] to the right
Swap: a[0] <-> a[n - 1]
Intuitively, this is true:
If you have a sequence of length 5, then rotate it 5 times, it ends up unchanged.
Taking the element at 0 out before the rotation, then after the rotation swapping it with the new element at 0 does not change the outcome (if rotating n times).
Induction
Now we can see why (I) and (II) are true:
If n is 1:
Trivially, the ordering is unchanged after invoking the function.
If n is 2:
The recursive calls "generate(n - 1, A)" leave the ordering unchanged (because it invokes generate with first argument being 1).
So we can just ignore those calls.
The swaps that get executed in this invocation result in a right-rotation, see (III).
If n is 3:
The recursive calls "generate(n - 1, A)" result in a right-rotation.
So the total steps in this invocation equal (IV) => The sequence is unchanged.
Repeat for n = 4, 5, 6, ...
The reason Heap’s algorithm constructs all permutations is that it adjoins each element to each permutation of the rest of the elements. When you execute Heap's algorithm, recursive calls on even length inputs place elements n, (n-1), 2, 3, 4, ..., (n-2), 1 in the last position and recursive calls on odd length inputs place elements n, (n-3), (n-4), (n-5), ..., 2, (n-2), (n-1), 1 in the last position. Thus, in either case, all elements are adjoined with all permutations of n - 1 elements.
If you would like a more detailed an graphical explanation, have a look at this article.
function* permute<T>(array: T[], n = array.length): Generator<T[]> {
if (n > 1) {
for (let ix = 1; ix < n; ix += 1) {
for (let _arr of permute(array, n - 1)) yield _arr
let j = n % 2 ? 0 : ix - 1
;[array[j], array[n - 1]] = [array[n - 1], array[j]]
}
for (let _arr of permute(array, n - 1)) yield _arr
} else yield array
}
Example use:
for (let arr of permute([1, 2, 3])) console.log(arr)
Trickiest part for me to understand as I am still studying it as well was the recursive expression:
for i := 0; i < n; i += 1 do
generate(n - 1, A)
I read it as evaluate at every i to n
have the terminate condition at n = 1
either an odd/even n return on execution
Since it calls and returns one 1 for every i as n is passed back recursively. Minimal change can be achieved when permuting every n + 1 passed back.
just a side tip. the heap algorithm will generate n! combinations.
i.e
if you pass n=[1,2,3] as a input the result will be n! which is

How to find pair with kth largest sum?

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
The pairs with largest sums are
13 + 16 = 29
13 + 12 = 25
8 + 16 = 24
13 + 8 = 21
8 + 12 = 20
So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?
Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.
I am already aware of the O(Klogk) solution , using Max-Heap given here .
It also is one of the favorite Google interview question , and they demand a O(k) solution .
I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .
Can someone explain the correct solution with a pseudocode .
P.S.
Please DON'T post this link as answer/comment.It DOESN'T contain the answer.
I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.
Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N2) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?
We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.
This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.
If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.
I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).
A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.
OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.
Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).
EDIT: This does not work. I leave the answer, since apparently I am not the only one who could have this kind of idea; see the discussion below.
A counter-example is x = (2, 3, 6), y = (1, 4, 5) and k=3, where the algorithm gives 7 (3+4) instead of 8 (3+5).
Let x and y be the two arrays, sorted in decreasing order; we want to construct the K-th largest sum.
The variables are: i the index in the first array (element x[i]), j the index in the second array (element y[j]), and k the "order" of the sum (k in 1..K), in the sense that S(k)=x[i]+y[j] will be the k-th greater sum satisfying your conditions (this is the loop invariant).
Start from (i, j) equal to (0, 0): clearly, S(1) = x[0]+y[0].
for k from 1 to K-1, do:
if x[i+1]+ y[j] > x[i] + y[j+1], then i := i+1 (and j does not change) ; else j:=j+1
To see that it works, consider you have S(k) = x[i] + y[j]. Then, S(k+1) is the greatest sum which is lower (or equal) to S(k), and such as at least one element (i or j) changes. It is not difficult to see that exactly one of i or j should change.
If i changes, the greater sum you can construct which is lower than S(k) is by setting i=i+1, because x is decreasing and all the x[i'] + y[j] with i' < i are greater than S(k). The same holds for j, showing that S(k+1) is either x[i+1] + y[j] or x[i] + y[j+1].
Therefore, at the end of the loop you found the K-th greater sum.
tl;dr: If you look ahead and look behind at each iteration, you can start with the end (which is highest) and work back in O(K) time.
Although the insight underlying this approach is, I believe, sound, the code below is not quite correct at present (see comments).
Let's see: first of all, the arrays are sorted. So, if the arrays are a and b with lengths M and N, and as you have arranged them, the largest items are in slots M and N respectively, the largest pair will always be a[M]+b[N].
Now, what's the second largest pair? It's going to have perhaps one of {a[M],b[N]} (it can't have both, because that's just the largest pair again), and at least one of {a[M-1],b[N-1]}. BUT, we also know that if we choose a[M-1]+b[N-1], we can make one of the operands larger by choosing the higher number from the same list, so it will have exactly one number from the last column, and one from the penultimate column.
Consider the following two arrays: a = [1, 2, 53]; b = [66, 67, 68]. Our highest pair is 53+68. If we lose the smaller of those two, our pair is 68+2; if we lose the larger, it's 53+67. So, we have to look ahead to decide what our next pair will be. The simplest lookahead strategy is simply to calculate the sum of both possible pairs. That will always cost two additions, and two comparisons for each transition (three because we need to deal with the case where the sums are equal);let's call that cost Q).
At first, I was tempted to repeat that K-1 times. BUT there's a hitch: the next largest pair might actually be the other pair we can validly make from {{a[M],b[N]}, {a[M-1],b[N-1]}. So, we also need to look behind.
So, let's code (python, should be 2/3 compatible):
def kth(a,b,k):
M = len(a)
N = len(b)
if k > M*N:
raise ValueError("There are only %s possible pairs; you asked for the %sth largest, which is impossible" % M*N,k)
(ia,ib) = M-1,N-1 #0 based arrays
# we need this for lookback
nottakenindices = (0,0) # could be any value
nottakensum = float('-inf')
for i in range(k-1):
optionone = a[ia]+b[ib-1]
optiontwo = a[ia-1]+b[ib]
biggest = max((optionone,optiontwo))
#first deal with look behind
if nottakensum > biggest:
if optionone == biggest:
newnottakenindices = (ia,ib-1)
else: newnottakenindices = (ia-1,ib)
ia,ib = nottakenindices
nottakensum = biggest
nottakenindices = newnottakenindices
#deal with case where indices hit 0
elif ia <= 0 and ib <= 0:
ia = ib = 0
elif ia <= 0:
ib-=1
ia = 0
nottakensum = float('-inf')
elif ib <= 0:
ia-=1
ib = 0
nottakensum = float('-inf')
#lookahead cases
elif optionone > optiontwo:
#then choose the first option as our next pair
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
elif optionone < optiontwo: # choose the second
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#next two cases apply if options are equal
elif a[ia] > b[ib]:# drop the smallest
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
else: # might be equal or not - we can choose arbitrarily if equal
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#+2 - one for zero-based, one for skipping the 1st largest
data = (i+2,a[ia],b[ib],a[ia]+b[ib],ia,ib)
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
if ia <= 0 and ib <= 0:
raise ValueError("Both arrays exhausted before Kth (%sth) pair reached"%data[0])
return data, narrative
For those without python, here's an ideone: http://ideone.com/tfm2MA
At worst, we have 5 comparisons in each iteration, and K-1 iterations, which means that this is an O(K) algorithm.
Now, it might be possible to exploit information about differences between values to optimise this a little bit, but this accomplishes the goal.
Here's a reference implementation (not O(K), but will always work, unless there's a corner case with cases where pairs have equal sums):
import itertools
def refkth(a,b,k):
(rightia,righta),(rightib,rightb) = sorted(itertools.product(enumerate(a),enumerate(b)), key=lamba((ia,ea),(ib,eb):ea+eb)[k-1]
data = k,righta,rightb,righta+rightb,rightia,rightib
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
return data, narrative
This calculates the cartesian product of the two arrays (i.e. all possible pairs), sorts them by sum, and takes the kth element. The enumerate function decorates each item with its index.
The max-heap algorithm in the other question is simple, fast and correct. Don't knock it. It's really well explained too. https://stackoverflow.com/a/5212618/284795
Might be there isn't any O(k) algorithm. That's okay, O(k log k) is almost as fast.
If the last two solutions were at (a1, b1), (a2, b2), then it seems to me there are only four candidate solutions (a1-1, b1) (a1, b1-1) (a2-1, b2) (a2, b2-1). This intuition could be wrong. Surely there are at most four candidates for each coordinate, and the next highest is among the 16 pairs (a in {a1,a2,a1-1,a2-1}, b in {b1,b2,b1-1,b2-1}). That's O(k).
(No it's not, still not sure whether that's possible.)
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
Merge the 2 arrays and note down the indexes in the sorted array. Here is the index array looks like (starting from 1 not 0)
[1, 2, 4, 6, 8]
[3, 5, 7, 9]
Now start from end and make tuples. sum the elements in the tuple and pick the kth largest sum.
public static List<List<Integer>> optimization(int[] nums1, int[] nums2, int k) {
// 2 * O(n log(n))
Arrays.sort(nums1);
Arrays.sort(nums2);
List<List<Integer>> results = new ArrayList<>(k);
int endIndex = 0;
// Find the number whose square is the first one bigger than k
for (int i = 1; i <= k; i++) {
if (i * i >= k) {
endIndex = i;
break;
}
}
// The following Iteration provides at most endIndex^2 elements, and both arrays are in ascending order,
// so k smallest pairs must can be found in this iteration. To flatten the nested loop, refer
// 'https://stackoverflow.com/questions/7457879/algorithm-to-optimize-nested-loops'
for (int i = 0; i < endIndex * endIndex; i++) {
int m = i / endIndex;
int n = i % endIndex;
List<Integer> item = new ArrayList<>(2);
item.add(nums1[m]);
item.add(nums2[n]);
results.add(item);
}
results.sort(Comparator.comparing(pair->pair.get(0) + pair.get(1)));
return results.stream().limit(k).collect(Collectors.toList());
}
Key to eliminate O(n^2):
Avoid cartesian product(or 'cross join' like operation) of both arrays, which means flattening the nested loop.
Downsize iteration over the 2 arrays.
So:
Sort both arrays (Arrays.sort offers O(n log(n)) performance according to Java doc)
Limit the iteration range to the size which is just big enough to support k smallest pairs searching.

Sorting Algorithm For Array with Integers of at most n spots away

Given an array with integers, with each integer being at most n positions away from its final position, what would be the best sorting algorithm?
I've been thinking for a while about this and I can't seem to get a good strategy to start dealing with this problem. Can someone please guide me?
I'd split the list (of size N) into 2n sublists (using zero-based indexing):
list 0: elements 0, 2n, 4n, ...
list 1: elements 1, 2n+1, 4n+1, ...
...
list 2n-1: elements 2n-1, 4n-1, ...
Each of these lists is obviously sorted.
Now merge these lists (repeatedly merging 2 lists at a time, or using a min heap with one element of each of these lists).
That's all. Time complexity is O(N log(n)).
This is easy in Python:
>>> a = [1, 0, 5, 4, 3, 2, 6, 8, 9, 7, 12, 13, 10, 11]
>>> n = max(abs(i - x) for i, x in enumerate(a))
>>> n
3
>>> print(*heapq.merge(*(a[i::2 * n] for i in range(2 * n))))
0 1 2 3 4 5 6 7 8 9 10 11 12 13
The Heap Sort is very fast for initially random array/collection of elements. In pseudo code this sort would be imlemented as follows:
# heapify
for i = n/2:1, sink(a,i,n)
→ invariant: a[1,n] in heap order
# sortdown
for i = 1:n,
swap a[1,n-i+1]
sink(a,1,n-i)
→ invariant: a[n-i+1,n] in final position
end
# sink from i in a[1..n]
function sink(a,i,n):
# {lc,rc,mc} = {left,right,max} child index
lc = 2*i
if lc > n, return # no children
rc = lc + 1
mc = (rc > n) ? lc : (a[lc] > a[rc]) ? lc : rc
if a[i] >= a[mc], return # heap ordered
swap a[i,mc]
sink(a,mc,n)
For different cases like "Nearly Sorted" of "Few Unique" the algorithms can work differently and be more efficent. For a complete list of the algorithms with animations in the various cases see this brilliant site.
I hope this helps.
Ps. For nearly sorted sets (as commented above) the insertion sort is your winner.
I'd recommend using a comb sort, just start it with a gap size equal to the maximum distance away (or about there). It's expected O(n log n) (or in your case O(n log d) where d is the maximum displacement), easy to understand, easy to implement, and will work even when the elements are displaced more than you expect. If you need the guaranteed execution time you can use something like heap sort, but in the past I've found the overhead in space or computation time usually isn't worth it and end up implementing nearly anything else.
Since each integer being at most n positions away from its final position:
1) for the smallest integer (aka. the 0th integer in the final sorted array), its current position must be in A[0...n] because the nth element is n positions away from the 0th position
2) for the second smallest integer (aka. the 1st integer in the final sorted array, zero based), its current position must be in A[0...n+1]
3) for the ith smallest integer, its current position must be in A[i-n...i+n]
We could use a (n+1)-size min heap, containing a rolling window to get the array sorted. And you could find more details here:
http://www.geeksforgeeks.org/nearly-sorted-algorithm/

Resources