Order statistic on intervals

Order statistic on intervals - algorithm

Given an array of numbers a[0], a[1], ..., a[n-1], we get queries of the kind:
output k-th largest number in the range a[i], a[i+1], ..., a[j]
Can these queries be answered in polylogarithmic time (in n) per query? If not, is it possible to average results and still get a good amortized complexity?
EDIT: this can be solved using persistent segment trees
http://blog.anudeep2011.com/persistent-segment-trees-explained-with-spoj-problems/

Yes, these queries can be answered in polylog time if O(n log n) space is available.
Preprocess given array by constructing segment tree with depth log(n). So that leaf nodes are identical to source array, next-depth nodes contain sorted 2-element sub-arrays, next level consists of 4-element arrays produced by merging those 2-element arrays, etc. In other words, perform merge sort but keep results of each merge step in separate array. Here is an example:
root: | 1 2 3 5 5 7 8 9 |
| 1 2 5 8 | 3 5 7 9 |
| 1 5 | 2 8 | 7 9 | 3 5 |
source: | 5 | 1 | 2 | 8 | 7 | 9 | 5 | 3 |
To answer a query, split given range (into at most 2*log(n) subranges). For example, range [0, 4] should be split into [0, 3] and [4], which gives two sorted arrays [1 2 5 8] and [7]. Now the problem is simplified to finding k-th element in several sorted arrays. The easiest way to solve it is nested binary search: first use binary search to choose some candidate element from every array starting from largest one; then use binary search in other (smaller) arrays to determine rank of this candidate element. This allows to get k-th element in O(log(n)^4) time. Probably some optimization (like fractional cascading) or some other algorithm could do this faster...

There is an algoritm named QuickSelect which based on quick sort. It works O(n) in average case. Algorithm's worst case is O(n**2) when input revese ordered.
It gives exact k-th biggest number. If you want range, you can write an wrapper method.

Related

Sort with maximum of given swaps

I'm solving a problem which requires to sort an array. An array with size n can contain elements from 1 to n.
We are given an array and m number of swaps. we have to sort that array by using given swaps and own swap, in such a way we use minimum of own swaps...
Example..
Array 3 1 4 2
Given swaps 1 2
Here first we can perform given swap...array become 1 3 4 2 now we can use own swaps 2 ,4 and 3, 4...(1,2 and 3,4 are indexes)
So answer here is 2 because only own swaps are counted and we need to minimise it

Closest equal numbers

Suppose you have a1..an numbers and some queries [l, k] (1 < l, k < n). The problem is to find in [l, k] interval minimum distance between two equal numbers.
Examples: (interval l,k shown as |...|)
1 2 2 |1 0 1| 2 3 0 1 2 3
Answer 2 (101)
1 |2 2| 1 0 1 2 3 0 1 2 3
Answer 1 (22)
1 2 2 1 0 |1 2 3 0 3 2 3|
Answer 2 (303) or (323)
I have thought about segment tree, but it is hard to join results from each tree node, when query is shared between several nodes. I have tried some ways to join them, but it looks ugly. Can somebody give me a hint?
Clarification
Thanks for your answers.
The problem is that there are a lot of queries, so o(n) is not good. I do not accidentally mentioned a segment tree. It performs [l, r] query for finding [l, r]SUM or [l, r]MIN in array with log(n) complexity. Can we do some preprocessing to fit in o(logn) here?

Call an interval minimal if its first number equals its last but each of the numbers in between appears exactly once in the interval. 11 and 101 are minimal, but 12021 and 10101 are not.
In linear time (assuming constant-time hashing), enumerate all of the minimal intervals. This can be done by keeping two indices, l and k, and a hash map that maps each symbol in between l and k to its index. Initially, l = 1 and k = 0. Repeatedly do the following. Increment k (if it's too large, we stop). If the symbol at the new value of k is in the map, then advance l to the map value, deleting stuff from the map as we go. Yield the interval [l, k] and increment l once more. In all cases, write k as the map value of the symbol.
Because of minimality, the minimal intervals are ordered the same way by their left and right endpoints. To answer a query, we look up the first interval that it could contain and the last and then issue a range-minimum query of the lengths of the range of intervals. The result is, in theory, an online algorithm that does linear-time preprocessing and answers queries in constant time, though for convenience you may not implement it that way.

We can do it in O(nlog(n)) with a sort. First, mark all the elements in [l,k] with their original indices. Then, sort the elements in [l,k], first based on value, and second based on original index, both ascending.
Then you can loop over the sorted list, keeping a currentValue variable, and checking adjacent values that are the same for distance and setting minDistance if necessary. currentValue is updated when you reach a new value in the sorted list.
Suppose we have this [l,k] range from your second example:
1 2 3 0 3 2 3
We can mark them as
1(1) 2(2) 3(3) 0(4) 3(5) 2(6) 3(7)
and sort them as
0(4) 1(1) 2(2) 2(6) 3(3) 3(5) 3(7)
Looping over this, there are no ranges for 0 and 1. The minimum distance for 2s is 4, and the minimum distance for 3s is 2 ([3,5] or [3,7], depending on if you reset minDistance when the new minimum distance is equal to the current minimum distance).
Thus we get
[3,5] in [l,k] or [5,7] in [l,k]
EDIT
Since you mention some queries, you can preprocess the list in O(nlog(n)) time, and then only use O(n) time for each individual query. You would just ignore indices that are not in [l,k] while looping over the sorted list.
EDIT 2
This is addressing the clarification in the question, which now states that there will always be lots of queries to run. We can preprocess in O(n^2) time using dynamic programming and then run each query in O(1) time.
First, perform the preprocessing on the entire list that I described above. Then form links in O(n) time from the original list into the sorted list.
We can imagine that:
[l,k] = min([l+1,k], [l,k-1], /*some other sequence starting at l or ending at k*/)
We have one base case
[l,k] = infinity where l = k
If [l,k] is not min([l+1,k], [l,k-1]), then it either starts at l or ends at k. We can take each of these, look into the sorted list and look at the adjacent element in the correct direction and check the distances (making sure we're in bounds). We only have to check 2 elements, so it is a constant factor.
Using this algorithm, we can run the following
for l = n downto 1
for k = l to n
M[l,k] = min(M[l+1,k], M[l,k-1], sequence starting at l, sequence ending at k)
You can also store the solutions in the matrix (which is actually a pyramid). Then, when you are given a query [l,k], you just look it up in the matrix.

Finding a path through a checkerboard which is closest to a given cost

I've been stuck on a problem for a while. I am in an algorithm course right now but this isn't a homework problem. We haven't gotten to dynamic programming in class yet, I'm just doing this on my own.
Given a NxN sized checkerboard where every coordinate has a cost and another integer M, find the cost of a path from the top left of the checkerboard to the bottom right of the checkerboard (only allowed moves are right or down 1 square) such that the total cost of the path is below M but as close to M as possible. All elements of NxN and M are positive.
If this asked me to find the minimum or maximum path, I could use the standard dynamic programming algorithms but since I'm bounded by M, I think I have to use another strategy. I've been trying to use memoization and construct an array filled with a set of the cost of all possible paths from the start to a given element. To construct the set for (i, j), I add the cost value of (i, j) to every element in the union of the the sets for (i-1, j) and (j-1, i) (if they exist, else just use the set {0} in its place). Once I complete this for all elements in the checkerboard, choosing the right path is trivial. Just pick the element in the set for (N, N) which is below M but closest to M.
For example:
+---+---+---+
| 0 | 1 | 3 |
| 3 | 2 | 1 |
| 5 | 2 | 1 |
+---+---+---+
Cost of paths to a given point:
+---+----------+----------------+
| 0 | 1 | 4 |
| 3 | 3, 5 | 4, 5, 6 |
| 8 | 5, 7, 10 | 5, 6, 7, 8, 11 |
+---+----------+----------------+
This is a really space inefficient way of doing things. If I did the math right, the worst case scenario for the number of elements in the set of the (N, N) node is (N+1)!/((N+1)/2)!. Is there a faster (space or time) way of approaching this problem that I'm missing?

No. If all the costs are integers, at each cell you need to store at most O(M) elements. So you need O(MN^2) memory. If the sum is >M you just ignore it.
In this paper there is a mention of a pseudo polynomial algorithm to solve similar problem (exact cost). You can either use same algorithm multiple time with exact cost = M..1, or maybe read the algorithm and find a variation that solves your problem directly.
Unfortunately that paper is paywalled :(

Inversion distance

First of all let's recall definition of inversion.
Inversion of some sequence S which contains numbers is situation when S[i] > S[j] and i < j or frankly speaking it's situation when we have disordered elements. For instance for sequence:
1 4 3 7 5 6 2
We have following inversions (4,3), (4,2), (3,2), (7,5), etc.
We state problem as follows: distance of inversion is maximum (in terms of indexing) distance between two values that are inversion. For out example we can perform human-brain searching that gives us pair (4,2) <=> (S[1], S[6]) and there for index distance is 6-1 = 5 which is maximum possible for this case.
This problem can be solved trivial way in O(n^2) by finding all inversions and keeping max distance (or updated if we find better option)
We can also perform better inversion searching using merge sort and therefore do the same in O(nlogn). Is there any possibility for existence of O(n) algorithm? Take in mind that we just want maximum distance, we don't want to find all inversions. Elaborate please.

Yes, O(n) algorithm is possible.
We could extract strictly increasing subsequence with greedy algorithm:
source: 1 4 3 7 5 6 2
strictly increasing subsequence: 1 4 7
Then we could extract strictly decreasing subsequence going backwards:
source: 1 4 3 7 5 6 2
strictly decreasing subsequence: 1 2
Note that after this strictly decreasing subsequence is found we could interpret it as increasing sequence (in normal direction).
For each element of these subsequences we need to store their index in source sequence.
Now "inversion distance" could be found by merging these two subsequences (similar to merge sort mentioned in OP, but only one merge pass is needed):
merge 1 & 1 ... no inversion, advance both indices
merge 4 & 2 ... inversion found, distance=5, should advance second index,
but here is end of subsequence, so we are done, max distance = 5

Maybe my idea is the same as #Evgeny.
Here is the explanation:
make a strictly increasing array from the beginning we call it array1
make a strictly decreasing array from the ending which is array2 (But keep the values in increasing order)
***Keep track of original indexes of the values of both arrays.
Now start from the beginning of both arrays.
Do this loop following untill array1 or array2 checking is complete
While( array1[index] > arry2[index] )
{
check the original distance between array1 index and arry2 index.
Update result accordingly.
increase array2 index.
}
increase both array index
Continue with the loop
At the end of this process you will have the maximum result. Proof of this solution is not that complex, you can try it yourself.

Longest Contiguous Subarray with Average Greater than or Equal to k

Consider an array of N integers. Find the longest contiguous subarray so that the average of its elements is greater (or equal) than a given number k.
The obvious answer has O(n^2) complexity. Can we do better?

We can reduce this problem to longest contiguous subarray with sum >= 0 by subtracting k from all values in O(n) time. Now let's calculate prefix sums:
index 0 1 2 3 4 5 6
array 2 -3 3 2 0 -1
prefix 0 2 -1 2 5 5 4
Now this problem is finding the two indices most far apart with prefix_right - prefix_left >= 0. Let's create a new prefix-index array and sort it by prefix, then indices.
index 2 0 1 3 6 4 5
prefix -1 0 2 2 4 5 5
We can then do a right-to-left sweep to calculate, for each prefix, the maximum index with prefix greater than or equal to the current prefix:
index 2 0 1 3 6 4 5
prefix -1 0 2 2 4 5 5
maxind 6 6 6 6 6 5 5
Now, let's go back to the original prefix array. For each prefix-index pair, we do a binary search on our new array to find the smallest prefix >= the current prefix. We subtract, from maxind of the binary searched prefix, the index of the current prefix to retrieve the best possible sequence length starting at the current index. Take the sequence with the maximum length.
This algorithm is O(n log n) because of the sorting and the n binary searches.

We can solve problem in O(n) time and O(n) space complexity:
I have tried with naive and optimal approach.
In short, the problem involves two steps:
(1) Subtract k from each ar[i] and find cumulative value in new array. Lets call the new array as cumArr[].
(2) Now the problem becomes finding max(j-1) in CumArr[] such that j>i and cumArr[j]>cumArr[i]. This step is a famous question and can be found at lots of places.
Here is the detail with running code:
http://codeshare.io/Y1Xc8
There might be small corner cases which can be handled easily.
Let me know your thoughts friends.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Order statistic on intervals - algorithm

There is an algoritm named QuickSelect which based on quick sort. It works O(n) in average case. Algorithm's worst case is O(n**2) when input revese ordered. It gives exact k-th biggest number. If you want range, you can write an wrapper method.

Related

Sort with maximum of given swaps

Closest equal numbers

Finding a path through a checkerboard which is closest to a given cost

Inversion distance

Longest Contiguous Subarray with Average Greater than or Equal to k

Categories

Resources