Inversion distance - algorithm

First of all let's recall definition of inversion.
Inversion of some sequence S which contains numbers is situation when S[i] > S[j] and i < j or frankly speaking it's situation when we have disordered elements. For instance for sequence:
1 4 3 7 5 6 2
We have following inversions (4,3), (4,2), (3,2), (7,5), etc.
We state problem as follows: distance of inversion is maximum (in terms of indexing) distance between two values that are inversion. For out example we can perform human-brain searching that gives us pair (4,2) <=> (S[1], S[6]) and there for index distance is 6-1 = 5 which is maximum possible for this case.
This problem can be solved trivial way in O(n^2) by finding all inversions and keeping max distance (or updated if we find better option)
We can also perform better inversion searching using merge sort and therefore do the same in O(nlogn). Is there any possibility for existence of O(n) algorithm? Take in mind that we just want maximum distance, we don't want to find all inversions. Elaborate please.

Yes, O(n) algorithm is possible.
We could extract strictly increasing subsequence with greedy algorithm:
source: 1 4 3 7 5 6 2
strictly increasing subsequence: 1 4 7
Then we could extract strictly decreasing subsequence going backwards:
source: 1 4 3 7 5 6 2
strictly decreasing subsequence: 1 2
Note that after this strictly decreasing subsequence is found we could interpret it as increasing sequence (in normal direction).
For each element of these subsequences we need to store their index in source sequence.
Now "inversion distance" could be found by merging these two subsequences (similar to merge sort mentioned in OP, but only one merge pass is needed):
merge 1 & 1 ... no inversion, advance both indices
merge 4 & 2 ... inversion found, distance=5, should advance second index,
but here is end of subsequence, so we are done, max distance = 5

Maybe my idea is the same as #Evgeny.
Here is the explanation:
make a strictly increasing array from the beginning we call it array1
make a strictly decreasing array from the ending which is array2 (But keep the values in increasing order)
***Keep track of original indexes of the values of both arrays.
Now start from the beginning of both arrays.
Do this loop following untill array1 or array2 checking is complete
While( array1[index] > arry2[index] )
{
check the original distance between array1 index and arry2 index.
Update result accordingly.
increase array2 index.
}
increase both array index
Continue with the loop
At the end of this process you will have the maximum result. Proof of this solution is not that complex, you can try it yourself.

Related

Median and order statistics with O(n) time complexity

Describe an O(n)-time algorithm that, given a set S of n distinct numbers and a positive
integer k≤n , outputs the k numbers in S that are closest to the median of S (excluding the
median). Hint: The target numbers may not be evenly placed around the median in the sorted
version of the array. E.g., consider 1,2,3,8,10; the 2 numbers closest to the median 3 are 1,2,
excluding the median itself, but they are both less than the median. Note: this is just an
illustration; don't assume that the array is sorted)
Here is the answer that I found link:
Answer: Find the n/2 − k/2 largest element in linear time. Partition on that element. Then, find the k largest element in the bigger subarray formed from the partition. Then, the elements in the smaller subarray from partitioning on this element are the desired k numbers.
My illustration:
Suppose I have an array with 11 elements and the array is an unsorted array
index_number 1 2 3 4 5 6 7 8 9 10 11
arr_elements 2 5 3 10 4 7 1 12 6 13 8
As there are 11 elements median should be 11/2= 5.5 approximately, 6. So arr_element 7 is the median. Now the solution said Find the n/2 − k/2 largest element in linear time. Suppose k=4 so, k/4 = 2, therefore need to find out largest element from index 2 through index 6. The array elements from index 2 through 6 are {5,3,10,4,7}. So the largest element is 10. Now the answer said Partition on that element. So there will be two sub array after partitioning from arr_element 10. The sub arrays are {2,5,3} and {4,7,1,12,6,13,8}. Then the answer said Then, find the k largest element in the bigger subarray formed from the partition. k=4 so kth largest element means 4th largest element. The 4th largest element in the big subarray is 8. Now, the algorithm said Then, the elements in the smaller subarray from partitioning on this element are the desired k numbers. I did not understand this statement.
The problem came from Cormen's Introduction to algorithm Chapter 9: Median and order statistics
Any hints would be appreciated.
The problem is to find the median, then find the distance d such that exactly k or k+1 points are within that distance from the median, and then output those points.
Hint: Study quickselect.

Sequence increasing and decreasing by turns

Let's assume we've got a sequence of integers of given length n. We want to delete some elements (maybe none), so that the sequence is increasing and decreasing by turns in result. It means, that every element should have neighbouring elements either both bigger or both smaller than itself.
For example 1 3 2 7 6 and 5 1 4 2 10 are both sequences increasing and decreasing by turns.
We want to delete some elements to transform our sequence that way, but we also want to maximize the sum of elements left. So, for example, from sequence 2 18 6 7 8 2 10 we want to delete 6 and make it 2 18 7 8 2 10.
I am looking for an effective solution to that problem. Example above shows that the most naive greedy algorithm (delete every first element that breaks the sequence) won't work - it would delete 7 instead of 6, which would not maximize the sum of elements left.
Any ideas how to solve that effectively (O(n) or O(n log n) probably) and correctly?
For every element of the sequence with index i we will calculate F(i, high) and F(i, low), where F(i, high) equals to the biggest sum of the subsequence with wanted characteristics that ends with the i-th element and this element is a "high peak". (I'll explain mainly the "high" part, the "low" part can be done similarly). We can calculate these functions using the following relations:
The answer is maximal among all F(i, high) and F(i, low) values.
That gives us a rather simple dynamic programming solution with O(n^2) time complexity. But we can go further.
We can optimize a calculation of max(F(j,low)) part. What we need to do is to find the biggest value among previously calculated F(j, low) with the condition that a[j] < a[i]. This can be done with segment trees.
First of all, we'll "squeeze" our initial sequence. We need the real value of the element a[i] only when calculating the sum. But we need only the relative order of the elements when checking that a[j] is less than a[i]. So we'll map every element to its index in the sorted elements array without duplicates. For example, sequence a = 2 18 6 7 8 2 10 will be translated to b = 0 5 1 2 3 0 4. This can be done in O(n*log(n)).
The biggest element of b will be less than n, as a result, we can build a segment tree on the segment [0, n] with every node containing the biggest sum within the segment (we need two segment trees for "high" and "low" part accordingly). Now let's describe the step i of the algorithm:
Find the biggest sum max_low on the segment [0, b[i]-1] using the "low" segment tree (initially all nodes of the tree contain zero).
F(i, high) is equal to max_low + a[i].
Find the biggest sum max_high on the segment [b[i]+1, n] using the "high" segment tree.
F(i, low) is equal to max_high + a[i].
Update the [b[i], b[i]] segment of the "high" segment tree with F(i, high) value recalculating maximums of the parent nodes (and [b[i], b[i]] node itself).
Do the same for "low" segment tree and F(i, low).
Complexity analysis: b sequence calculation is O(n*log(n)). Segment tree max/update operations have O(log(n)) complexity and there are O(n) of them. The overall complexity of this algorithm is O(n*log(n)).

Find subsequence from array at every n interval

I would like to find the sequence in a list of numbers where it returns the maximum sum. The restrictions are that it must be at every n interval. For example:
n = 4;
A = [1 4 3 2 9 8 7 6]
The optimal subsequence is therefore 4 + 8 = 12 at positions 1 & 5 (we assume position numbering starts at 0).
My idea:
I know this is a dynamic programming problem. However, I'm not sure how to think about it in terms of a smaller problem. Hope this makes sense. Thanks!
If all the numbers are non negative, it is best to make the subsequence as long as possible to get the maximum sum. The restriction with the interval means, that there are just n possibilities to choose the starting index. In the example you get these four:
1 9
4 8
3 7
2 6
Calculate the sum for each and choose the largest.
You can look to the remainder of the index of the elements by n to divide the elements in subset of element distant n the one from another. Then summing all the elements of each subset, you can find the one that with the higher sum.
The sequence (index in the original array, value) at this point can be easily found.
I mean something like this (pay attention at the indentation)
n=lenght of the inteval;
group[n]=[]
sum[n]=[0,....,0];
for i=0,...,array.lenght-1
k=i%n;
insert the i-th element of the array in group[k];
for j=0,...,n-1
sum[j]=sum of all element in group[j];
max=0;
for k=0,...,n-2
if(sum[k]<sum[k+1])
max=k+1;
for u=0,...,group[max].lenght
index=u*max;
print (index, group[max][u])
I'm not sure this is the approach you are looking for, but maybe it can help you.

Closest equal numbers

Suppose you have a1..an numbers and some queries [l, k] (1 < l, k < n). The problem is to find in [l, k] interval minimum distance between two equal numbers.
Examples: (interval l,k shown as |...|)
1 2 2 |1 0 1| 2 3 0 1 2 3
Answer 2 (101)
1 |2 2| 1 0 1 2 3 0 1 2 3
Answer 1 (22)
1 2 2 1 0 |1 2 3 0 3 2 3|
Answer 2 (303) or (323)
I have thought about segment tree, but it is hard to join results from each tree node, when query is shared between several nodes. I have tried some ways to join them, but it looks ugly. Can somebody give me a hint?
Clarification
Thanks for your answers.
The problem is that there are a lot of queries, so o(n) is not good. I do not accidentally mentioned a segment tree. It performs [l, r] query for finding [l, r]SUM or [l, r]MIN in array with log(n) complexity. Can we do some preprocessing to fit in o(logn) here?
Call an interval minimal if its first number equals its last but each of the numbers in between appears exactly once in the interval. 11 and 101 are minimal, but 12021 and 10101 are not.
In linear time (assuming constant-time hashing), enumerate all of the minimal intervals. This can be done by keeping two indices, l and k, and a hash map that maps each symbol in between l and k to its index. Initially, l = 1 and k = 0. Repeatedly do the following. Increment k (if it's too large, we stop). If the symbol at the new value of k is in the map, then advance l to the map value, deleting stuff from the map as we go. Yield the interval [l, k] and increment l once more. In all cases, write k as the map value of the symbol.
Because of minimality, the minimal intervals are ordered the same way by their left and right endpoints. To answer a query, we look up the first interval that it could contain and the last and then issue a range-minimum query of the lengths of the range of intervals. The result is, in theory, an online algorithm that does linear-time preprocessing and answers queries in constant time, though for convenience you may not implement it that way.
We can do it in O(nlog(n)) with a sort. First, mark all the elements in [l,k] with their original indices. Then, sort the elements in [l,k], first based on value, and second based on original index, both ascending.
Then you can loop over the sorted list, keeping a currentValue variable, and checking adjacent values that are the same for distance and setting minDistance if necessary. currentValue is updated when you reach a new value in the sorted list.
Suppose we have this [l,k] range from your second example:
1 2 3 0 3 2 3
We can mark them as
1(1) 2(2) 3(3) 0(4) 3(5) 2(6) 3(7)
and sort them as
0(4) 1(1) 2(2) 2(6) 3(3) 3(5) 3(7)
Looping over this, there are no ranges for 0 and 1. The minimum distance for 2s is 4, and the minimum distance for 3s is 2 ([3,5] or [3,7], depending on if you reset minDistance when the new minimum distance is equal to the current minimum distance).
Thus we get
[3,5] in [l,k] or [5,7] in [l,k]
EDIT
Since you mention some queries, you can preprocess the list in O(nlog(n)) time, and then only use O(n) time for each individual query. You would just ignore indices that are not in [l,k] while looping over the sorted list.
EDIT 2
This is addressing the clarification in the question, which now states that there will always be lots of queries to run. We can preprocess in O(n^2) time using dynamic programming and then run each query in O(1) time.
First, perform the preprocessing on the entire list that I described above. Then form links in O(n) time from the original list into the sorted list.
We can imagine that:
[l,k] = min([l+1,k], [l,k-1], /*some other sequence starting at l or ending at k*/)
We have one base case
[l,k] = infinity where l = k
If [l,k] is not min([l+1,k], [l,k-1]), then it either starts at l or ends at k. We can take each of these, look into the sorted list and look at the adjacent element in the correct direction and check the distances (making sure we're in bounds). We only have to check 2 elements, so it is a constant factor.
Using this algorithm, we can run the following
for l = n downto 1
for k = l to n
M[l,k] = min(M[l+1,k], M[l,k-1], sequence starting at l, sequence ending at k)
You can also store the solutions in the matrix (which is actually a pyramid). Then, when you are given a query [l,k], you just look it up in the matrix.

divide and conquer - find median between two arrays of equal size that contain unique elements?

I am trying to solve a problem exactly like this:
nth smallest number among two databases of size n each using divide and conquer
From what I could figure out, the "comparing medians/median of medians" algorithm would give us the solution? My question is whether I am understanding this correctly.
array 1: [7 8 6 5 3]
array 2: [4 10 1 2 9]
First, find the median for each. we can do this by querying for k=n/2, where n is the size of that array. Being the 3rd smallest element in this case, this gives us 6 for the first array (call this m1), and 4 for the second array (call this m2).
Since m1 > m2, create 2 arrays using the elements that are less than m1 and greater than m2 in that array.
array 1: [5 3]
array 2: [10 9]
^ How would we find the elements that are less than m1 and greater than m2? Would we just take m1 and m2 and compare them with every element in their respective arrays? I know this works when the two arrays are both sorted, but would sorting them first allow us to still get O(log(n)) queries?
I'm assuming we can continue to use our special query (can we?) to get the k=n/2 smallest element (median) for that particular array. If this is the case, we query for k=n/2=1, leaving us with new m1 = 3, m2 = 9.
m1 < m2, so we make 2 arrays using elements that are greater than m1 and less than m2 in that array.
Since there are no elements in array 2 that are less than m2 = 9, we are only left with one array with one element greater than m1 = 3.
[5] <- this is the median
I am also interested in seeing the proof of correctness (that this finds the median) by induction.
The O(n) meidan of median algorithm actually partitions the array so that the elements before it are less than it and after it are greater than it.
When you recurse with the median of medians as pivot, you are partitioning the array so that it looks like
(elements less than the median) - p - (elements greater than the median)
On the correctness, when you first query for k = n/2. You get m1 and m2(m1 > m2). Now you know that there are more than n elements that are less than m1. so elements following it will never be candidates for the median.
Similarly elements before m2. there are more than n elements ahead of them, so they will never be a candidate for the median. So the median must lie somewhere in the second half of the second array and the first half of the first array.
But now when you recurse you should keep in mind that you have n/2 elements of the second array counted for, so you need to find the element that would occupy the n/2th position in the sorted union of the two arrays(second half and first half).
This seems asymptotically optimal since you're always reducing the size of the arrays you are recursing on to half.
something like O(n) + O(n/2) + O(n/4) ... = O(n).
For sorted arrays you can do this is O(logn).

Resources