Interview Question: Reverse pairs - algorithm

I got this for my interview:
Numbers are said to be "reverse ordered" if N[i] > N[j] for i < j.
For example, in a list: 3 4 1 6 7 3, the reverse ordered items are (3,1) (4,1) (4,3) (6,3) (7,3).
How to get the number of pairs of reverse ordered items in O(nlogn) time.

It is possible to do this in O(n log n) time using a modified version of merge sort. Do the division as normal, but you can count inversions as you merge. Each time you select an item from the right list over an item from the left list increment the count of inversions by the number of items remaining in the left list. So at each level the number of inversions is the number of inversions found during the merge plus the inversions found by each recursive call.

Note, please read the bottom of this answer to see why it actually is possible to do the problem. I read the question wrong.
It is not possible in the general case. Consider the list:
n, n-1, n-2 ... 4, 3, 2, 1
The pairs will be:
(n, n-1), (n, n-2) ... (n, 2), (n, 1), (n-1, n-2), (n-1, n-3) ... ... (3, 2), (3, 1), (2, 1)
Hence there are O(n^2) pairs and hence the list cannot be build in O(n log n)
However, you can do this with one pass of the list:
start at the end of the list and work backwords.
while moving through the list maintain a heap of the numbers you have seen (this will cause the loop to be O(n log n))
for ever number you encounter do a search in the heap to find all numbers which are less than the number you are currently on. Output the current number and the number in the heap as a pair. (this is O(n log n) to find the first match in the heap, but will be O(n) to find all smaller numbers)
For your example:
The list: 3 4 1 6 7 3
starting at the second item
heap (3)
item (7)
Output (7, 3)
heap (3, 7)
item (6)
search and find 7, output (6, 3)
heap (3, 6, 7)
item (1)
search and find nothing
heap (1, 3, 6, 7)
item (4)
search and find 3 and 1. output (4, 3) (4, 1)
etc....
Edit, it is possible actually
Since JoshD correctly noted that we are looking for the number of elements, you can use a B-Tree instead of a heap and then you can get just the count of elements less than the current item and add it to a counter.

This can be solved by creating a binary search tree such that each node contains the size of its left subtree.
Values are added to the BST in reverse order of the original array. A sum is kept and each time we go right when adding a node, the current node being compared's size of the left subtree + 1 is added to the final sum (since the value being added is greater than that node and every value in its left subtree).
Building the tree is nlogn and once the tree is built, the sum will be the number of pairs.
Specially handling needs to be added for duplicate numbers depending on the requirements (i.e. if (4,3) shows up twice, should it be counted twice)

Right-to-left traversal, Red-Black tree where each node is augmented by the size of its respective subtree. So it takes O(logn) to find the number of elements below a given one.

As #jlewis42 points out, you can use a modified version of merge sort. I just wanted to add, you could use any of the standard comparison sort algorithms, as long as the worst-case running time is n log n, by "instrumenting" it to count inversions as it works. See also this near dupe.

Related

Sort permutation of N using 1 free swap and adjacent swaps

Question:
Given an array of N numbers contain a permutation of N. You have 2 types of swaps:
Swap any 2 numbers of the array (you can only do this once)
Swap adjacent numbers (you can do this many times)
What is the least number of swap to sort the array?
Example:
arr[] = {5, 3, 4, 2, 1}
answer: 3
Explaination:
- Swap 5 and 1
- Swap 4 and 2
- Swap 3 and 2
P/S:
I think that we need to use the "free swap" first and then use merge sort. But I don't know how to use the the "free swap" so that the merge sort is minimum.
I think you can just swap the number that is most left from where it should be with the one that is most right from where it should be.
So left would be index i where arr[i] - i is the maximum. And right would be index j where arr[j] - j is the minimum. Then just swap element i with j. This is O(n).
Afterwards you can count the number of swaps you have to do for bubble sort. For this you count all elements which are smaller and to the right of the current element. You can do this in O(n logn) by going from right to left and then for each element you insert it in a balanced sorted tree where you also store the number of nodes in the subtree for each edge (e.g. modified AVL tree). This allows you to count the number of elements which are smaller and to the right of the current one in O(logn).

Efficient approach to find co-prime subarrays

Given an array, is it possible to find the number of co-prime sub arrays of the array in better than O(N²) time? Co-prime arrays are defined as a contiguous subset of an array such that GCD of all elements is 1.
Consider adding one element to the end of the array. Now find the rightmost position, if any, such that the sub-array from that position to the element you have just added is co-prime. Since it is rightmost, no shorter array ending with the element added is co-prime. Since it is co-prime, every array that starts to its left and ends with the new element is co-prime. So you have worked out the number of co-prime sub-arrays that end with the new element. If you can find the rightmost position efficiently - say in O(log n) instead of O(n) - then you can count the number of co-prime sub-arrays in O(n log n) by extending the array one element at a time.
To make it possible to find rightmost positions, think of the full array as the leaves of a complete binary tree, padded out to make its a length a power of two. At each node put the GCD of all of the elements below that node - you can do this from the bottom up in time O(n). Every contiguous interval within the array can be covered by a collection of nodes of size O(log n) such that the interval consists of the leaves underneath the nodes, so you can compute the GCD of the interval is time O(log n).
To find the rightmost position forming a co-prime subarray with your current element, start with the current element and check to see if it is 1. If it is, you are finished. If not, look at the element to its left, take a GCD with that, and push the result on a stack. If the result is 1, you are finished, if not, do the same, but look to see if there is a sub-tree of 2 elements you can use to add 2 elements at once. At each of the succeeding steps you double the size of the sub-tree you are trying to find. You won't always find a convenient sub-tree of the size you want, but because every interval can be covered by O(log n) subtrees you should get lucky often enough to go through this step in time O(log n).
Now you have either found that whole array to the current element is not co-prime or you have found a section that is co-prime, but may go further to the left than it needs. The value at the top of the stack was computed by taking the GCD of the value just below it on the stack and the GCD at the top of a sub-tree. Pop it off the stack and take the GCD of the value just below it and the right half of the sub-tree. If you are still co-prime then you didn't need the left half of the sub-tree. If not, then you needed it, but perhaps not all of it. In either case you can continue down to find the rightmost match in time O(log n).
So I think you can find the rightmost position forming a co-prime subarray with the current element in time O(log n) (admittedly with some very fiddly programming) so you can count the number of coprime sub-arrays in time O(n log n)
Two examples:
List 1, 3, 5, 7. The next level is 1, 1 and the root is 1. If the current element is 13 then I check against 7 and find that gcd(7, 13) = 1. Therefore I immediately know that GCD(5, 7, 13) = GCD(3, 5, 7, 13) = GCD(1, 3, 4, 7, 13) = 1.
List 2, 4, 8, 16. The next level is 2, 8 and the root is 2. If the current numbers is 32 then I check against 16 and find that gcd(16, 32) = 16 != 1 so then I check against 8 and find that GCD(8, 32) = 8 and then I check against 2 and find that GCD(2, 32) = 2 so there is no interval in the extended array which has GCD = 1.

Time complexity with insertion sort for 2^N array?

Consider an array of integers, which has a size of 2^N, where the element at index X (0 ≤ X < 2N) is X xor 3 (that is, two least significant bits of X are flipped). What is the running time of the insertion sort on this array?
Examine the structure of what the lists looks like:
For n = 2:
{3, 2, 1, 0}
For n = 3 :
{3, 2, 1, 0, 7, 6, 5, 4}
For insertion sort, you're maintaining the invariant that the list from 1 up to your current index is sorted, so you're task at each step is to place the current element into it's correct place among the sorted elements before it. In the worst case, you will have to traverse all previous indices before you can insert the current value (think of the case where the list is in reverse sorted order). But it's clear from the structure above that for a list with the property that each value is equivalent to the index ^ 3, that the furthest back in the list that you would have to go from any given index is 3. So you've reduced the possibility that you'd have to do O(n) work at the insertion step to a constant factor. But you still have to do O(n) work to examine each element of the list. So, for this particular case, the running time of insertion sort is linear in the size of the input, whereas in the worst case it is quadratic.

Finding n-th biggest product in a large matrix of numbers, fast

I'm working on a sorting/ranking algorithm that works with quite large number of items and I need to implement the following algorithm in an efficient way to make it work:
There are two lists of numbers. They are equally long, about 100-500 thousand items. From this I need to find the n-th biggest product between these lists, ie. if you create a matrix where on top you have one list, on the side you have the other one and each cell is the product of the number above and the number on the side.
Example: The lists are A=[1, 3, 4] and B=[2, 2, 5]. Then the products are [2, 2, 5, 6, 6, 15, 8, 8, 20]. If I wanted the 3rd biggest from that it would be 8.
The naive solution would be to simply generate those numbers, sort them and then select the n-th biggest. But that is O(m^2 * log m^2) where m is the number of elements in the small lists, and that is just not fast enough.
I think what I need is to first sort the two small lists. That is O(m * log m). Then I know for sure that the biggest one A[0]*B[0]. Second biggest one is either A[0]*B[1] or A[1]*B[0], ...
I feel like this could be done in O(f(n)) steps, independent of the size of the matrix. But I can't figure out an efficient way to do this part.
Edit: There was an answer that got deleted, which suggested to remember position in the two sorted sets and then look at A[a]*B[b+1] and A[a+1]*B[b], returning the bigger one and incrementing a/b. I was going to post this comment before it got deleted:
This won't work. Imagine two lists A=B=[3,2,1]. This will give you
matrix like [9,6,3 ; 6,4,2 ; 3,2,1]. So you start at (0,0)=9, go to
(0,1)=6 and then the choice is (0,2)=3 or (1,1)=4. However, this will
miss the (1,0)=6 which is bigger then both. So you can't just look to
the two neighbors but you have to backtrack.
I think it can be done in O(n log n + n log m). Here's a sketch of my algorithm, which I think will work. It's a little rough.
Sort A descending. (takes O(m log m))
Sort B descending. (takes O(m log m))
Let s be min(m, n). (takes O(1))
Create s lazy sequence iterators L[0] through L[s-1]. L[i] will iterate through the s values A[i]*B[0], A[i]*B[1], ..., A[i]*B[s-1]. (takes O(s))
Put the iterators in a priority queue q. The iterators will be prioritized according to their current value. (takes O(s) because initially they are already in order)
Pull n values from q. The last value pulled will be the desired result. When an iterator is pulled, it is re-inserted in q using its next value as the new priority. If the iterator has been exhausted, do not re-insert it. (takes O(n log s))
In all, this algorithm will take O(m log m + (s + n)log s), but s is equal to either m or n.
I don't think there is an algorithm of O(f(n)), which is independent of m.
But there is a relatively fast O(n*logm) algo:
At first, we sort the two arrays, we get A[0] > A[1] > ... > A[m-1] and B[0] > B[1] > ... > B[m-1]. (This is O(mlogm), of course.)
Then we build a max-heap, whose elements are A[0]*B[0], A[0]*B[1], ... A[0]*B[m-1]. And we maintain a "pointer array" P[0], P[1], ... P[m-1]. P[i]=x means that B[i]*A[x] is in the heap currently. All the P[i] are zero initially.
In each iteration, we pop the max element from the heap, which is the next largest product. Assuming it comes from B[i]*A[P[i]] (we can record the elements in the heap come from which B[i]), we then move the corresponding pointer forward: P[i] += 1, and push the new B[i] * A[P[i]] into the heap. (If P[i] is moved to out-of-range (>=m), we simply push a -inf into the heap.)
After the n-th iteration, we get the n-th largest product.
There are n iterations, and each one is O(logm).
Edit: add some details
You don't need to sort the the 500 000 elements to get the top 3.
Just take the first 3, put them in a SortedList, and iterate over the list, replacing the smallest of the 3 elements with the new value, if that is higher, and resort the resulting list.
Do this for both lists, and you'll end with a 3*3 matrix, where it should be easy to take the 3rd value.
Here is an implementation in scala.
If we assume n is smaller than m, and A=[1, 3, 4] and B=[2, 2, 5], n=2:
You would take (3, 4) => sort them (4,3)
Then take (2,5) => sort them (5, 2)
You could now do an zipped search. Of course the biggest product now is (5, 4). But the next one is either (4*2) or (5*3). For longer lists, you could keep in mind what the result of 4*2 was, compare it only with the next product, taken the other way. That way you would only calculate one product too much.

Sorting Algorithm For Array with Integers of at most n spots away

Given an array with integers, with each integer being at most n positions away from its final position, what would be the best sorting algorithm?
I've been thinking for a while about this and I can't seem to get a good strategy to start dealing with this problem. Can someone please guide me?
I'd split the list (of size N) into 2n sublists (using zero-based indexing):
list 0: elements 0, 2n, 4n, ...
list 1: elements 1, 2n+1, 4n+1, ...
...
list 2n-1: elements 2n-1, 4n-1, ...
Each of these lists is obviously sorted.
Now merge these lists (repeatedly merging 2 lists at a time, or using a min heap with one element of each of these lists).
That's all. Time complexity is O(N log(n)).
This is easy in Python:
>>> a = [1, 0, 5, 4, 3, 2, 6, 8, 9, 7, 12, 13, 10, 11]
>>> n = max(abs(i - x) for i, x in enumerate(a))
>>> n
3
>>> print(*heapq.merge(*(a[i::2 * n] for i in range(2 * n))))
0 1 2 3 4 5 6 7 8 9 10 11 12 13
The Heap Sort is very fast for initially random array/collection of elements. In pseudo code this sort would be imlemented as follows:
# heapify
for i = n/2:1, sink(a,i,n)
→ invariant: a[1,n] in heap order
# sortdown
for i = 1:n,
swap a[1,n-i+1]
sink(a,1,n-i)
→ invariant: a[n-i+1,n] in final position
end
# sink from i in a[1..n]
function sink(a,i,n):
# {lc,rc,mc} = {left,right,max} child index
lc = 2*i
if lc > n, return # no children
rc = lc + 1
mc = (rc > n) ? lc : (a[lc] > a[rc]) ? lc : rc
if a[i] >= a[mc], return # heap ordered
swap a[i,mc]
sink(a,mc,n)
For different cases like "Nearly Sorted" of "Few Unique" the algorithms can work differently and be more efficent. For a complete list of the algorithms with animations in the various cases see this brilliant site.
I hope this helps.
Ps. For nearly sorted sets (as commented above) the insertion sort is your winner.
I'd recommend using a comb sort, just start it with a gap size equal to the maximum distance away (or about there). It's expected O(n log n) (or in your case O(n log d) where d is the maximum displacement), easy to understand, easy to implement, and will work even when the elements are displaced more than you expect. If you need the guaranteed execution time you can use something like heap sort, but in the past I've found the overhead in space or computation time usually isn't worth it and end up implementing nearly anything else.
Since each integer being at most n positions away from its final position:
1) for the smallest integer (aka. the 0th integer in the final sorted array), its current position must be in A[0...n] because the nth element is n positions away from the 0th position
2) for the second smallest integer (aka. the 1st integer in the final sorted array, zero based), its current position must be in A[0...n+1]
3) for the ith smallest integer, its current position must be in A[i-n...i+n]
We could use a (n+1)-size min heap, containing a rolling window to get the array sorted. And you could find more details here:
http://www.geeksforgeeks.org/nearly-sorted-algorithm/

Resources