array median transformation minimum steps - algorithm

Given an array A with n
integers. In one turn one can apply the
following operation to any consecutive
subarray A[l..r] : assign to all A i (l <= i <= r)
median of subarray A[l..r] .
Let max be the maximum integer of A .
We want to know the minimum
number of operations needed to change A
to an array of n integers each with value
max.
For example, let A = [1, 2, 3] . We want to change it to [3, 3, 3] . We
can do this in two operations, first for
subarray A[2..3] (after that A equals to [1,
3, 3] ), then operation to A[1..3] .
Also,median is defined for some array A as follows. Let B be the same
array A , but sorted in non-decreasing
order. Median of A is B m (1-based
indexing), where m equals to (n div 2)+1 .
Here 'div' is an integer division operation.
So, for a sorted array with 5 elements,
median is the 3rd element and for a sorted
array with 6 elements, it is the 4th element.
Since the maximum value of N is 30.I thought of brute forcing the result
could there be a better solution.

You can double the size of the subarray containing the maximum element in each iteration. After the first iteration, there is a subarray of size 2 containing the maximum. Then apply your operation to a subarray of size 4, containing those 2 elements, giving you a subarray of size 4 containing the maximum. Then apply to a size 8 subarray and so on. You fill the array in log2(N) operations, which is optimal. If N is 30, five operations is enough.
This is optimal in the worst case (i.e. when only one element is the maximum), since it sets the highest possible number of elements in each iteration.
Update 1: I noticed I messed up the 4s and 8s a bit. Corrected.
Update 2: here's an example. Array size 10, start state:
[6 1 5 9 3 2 0 7 4 8]
To get two nines, run op on subarray of size two containing the nine. For instance A[4…5] gets you:
[6 1 5 9 9 2 0 7 4 8]
Now run on size four subarray that contains 4…5, for instance on A[2…5] to get:
[6 9 9 9 9 2 0 7 4 8]
Now on subarray of size 8, for instance A[1…8], get:
[9 9 9 9 9 9 9 9 4 8]
Doubling now would get us 16 nines, but we have only 10 positions, so round of with A[1…10], get:
[9 9 9 9 9 9 9 9 9 9]
Update 3: since this is only optimal in the worst case, it is actually not an answer to the original question, which asks for a way of finding the minimal number of operations for all inputs. I misinterpreted the sentence about brute forcing to be about brute forcing with the median operations, rather than in finding the minimum sequence of operations.

This is the problem from codechef Long Contest.Since the contest is already over,so awkwardiom ,i am pasting the problem setter approach (Source : CC Contest Editorial Page).
"Any state of the array can be represented as a binary mask with each bit 1 means that corresponding number is equal to the max and 0 otherwise. You can run DP with state R[mask] and O(n) transits. You can proof (or just believe) that the number of statest will be not big, of course if you run good DP. The state of our DP will be the mask of numbers that are equal to max. Of course, it makes sense to use operation only for such subarray [l; r] that number of 1-bits is at least as much as number of 0-bits in submask [l; r], because otherwise nothing will change. Also you should notice that if the left bound of your operation is l it is good to make operation only with the maximal possible r (this gives number of transits equal to O(n)). It was also useful for C++ coders to use map structure to represent all states."
The C/C++ Code is::
#include <cstdio>
#include <iostream>
using namespace std;
int bc[1<<15];
const int M = (1<<15) - 1;
void setMin(int& ret, int c)
{
if(c < ret) ret = c;
}
void doit(int n, int mask, int currentSteps, int& currentBest)
{
int numMax = bc[mask>>15] + bc[mask&M];
if(numMax == n) {
setMin(currentBest, currentSteps);
return;
}
if(currentSteps + 1 >= currentBest)
return;
if(currentSteps + 2 >= currentBest)
{
if(numMax * 2 >= n) {
setMin(currentBest, 1 + currentSteps);
}
return;
}
if(numMax < (1<<currentSteps)) return;
for(int i=0;i<n;i++)
{
int a = 0, b = 0;
int c = mask;
for(int j=i;j<n;j++)
{
c |= (1<<j);
if(mask&(1<<j)) b++;
else a++;
if(b >= a) {
doit(n, c, currentSteps + 1, currentBest);
}
}
}
}
int v[32];
void solveCase() {
int n;
scanf(" %d", &n);
int maxElement = 0;
for(int i=0;i<n;i++) {
scanf(" %d", v+i);
if(v[i] > maxElement) maxElement = v[i];
}
int mask = 0;
for(int i=0;i<n;i++) if(v[i] == maxElement) mask |= (1<<i);
int ret = 0, p = 1;
while(p < n) {
ret++;
p *= 2;
}
doit(n, mask, 0, ret);
printf("%d\n",ret);
}
main() {
for(int i=0;i<(1<<15);i++) {
bc[i] = bc[i>>1] + (i&1);
}
int cases;
scanf(" %d",&cases);
while(cases--) solveCase();
}

The problem setter approach has exponential complexity. It is pretty good for N=30. But not so for larger sizes. I think, it's more interesting to find an exponential time solution. And I found one, with O(N4) complexity.
This approach uses the fact that optimal solution starts with some group of consecutive maximal elements and extends only this single group until whole array is filled with maximal values.
To prove this fact, take 2 starting groups of consecutive maximal elements and extend each of them in optimal way until they merge into one group. Suppose that group 1 needs X turns to grow to size M, group 2 needs Y turns to grow to the same size M, and on turn X + Y + 1 these groups merge. The result is a group of size at least M * 4. Now instead of turn Y for group 2, make an additional turn X + 1 for group 1. In this case group sizes are at least M * 2 and at most M / 2 (even if we count initially maximal elements, that might be included in step Y). After this change, on turn X + Y + 1 the merged group size is at least M * 4 only as a result of the first group extension, add to this at least one element from second group. So extending a single group here produces larger group in same number of steps (and if Y > 1, it even requires less steps). Since this works for equal group sizes (M), it will work even better for non-equal groups. This proof may be extended to the case of several groups (more than two).
To work with single group of consecutive maximal elements, we need to keep track of only two values: starting and ending positions of the group. Which means it is possible to use a triangular matrix to store all possible groups, allowing to use a dynamic programming algorithm.
Pseudo-code:
For each group of consecutive maximal elements in original array:
Mark corresponding element in the matrix and clear other elements
For each matrix diagonal, starting with one, containing this element:
For each marked element in this diagonal:
Retrieve current number of turns from this matrix element
(use indexes of this matrix element to initialize p1 and p2)
p2 = end of the group
p1 = start of the group
Decrease p1 while it is possible to keep median at maximum value
(now all values between p1 and p2 are assumed as maximal)
While p2 < N:
Check if number of maximal elements in the array is >= N/2
If this is true, compare current number of turns with the best result \
and update it if necessary
(additional matrix with number of maximal values between each pair of
points may be used to count elements to the left of p1 and to the
right of p2)
Look at position [p1, p2] in the matrix. Mark it and if it contains \
larger number of turns, update it
Repeat:
Increase p1 while it points to maximal value
Increment p1 (to skip one non-maximum value)
Increase p2 while it is possible to keep median at maximum value
while median is not at maximum value
To keep algorithm simple, I didn't mention special cases when group starts at position 0 or ends at position N, skipped initialization and didn't make any optimizations.

Related

Find the kth largest element in an array after inserting the absolute difference back in the array

I recently found this question somewhere in a contest, couldn't remember though. The problem statement goes like this.
Given an unsorted positive integer array like [2,4,9], you can do an operation on the array to give it a new form. Find the kth largest element after you no longer can do the operation.
Operation is defined as follows. Absolute difference of any two elements should be re-inserted back in the array. For example for the above array, it could be [2,4,9,5,7], duplicates can't be inserted back, for example absolute diff(2,4) is 2, but 2 is already part of array.
Can anybody figure out the approach?
The answer is equal to m - k + 1 multiplied by the greatest common divisor (GCD) of the elements in the input array, where m is the number of elements in the final array, so long as this number is at least k.
To show this, we need to show that the array after applying the operation as many times as possible will always result in an array of the form [d, 2*d, 3*d, ..., m*d] in some order, where d is the GCD and m is some positive integer. There are three parts to the proof:
We need to show that d is constructible by some sequence of applying the operation. This is true because the operation allows us to do any subtractions we like where the smaller number is the one subtracted, and this is sufficient to perform Euclid's algorithm.
We need to show that all of the numbers in the claimed result are constructible. This is true because the largest number in the input array has d as a divisor by definition, so it must be m*d for some m, the smaller multiples can be constructed by repeatedly subtracting d.
We need to show that no other numbers are constructible. This is true because the result of a subtraction always shares common divisors with the two operands, and because larger numbers cannot be constructed by subtraction.
So the algorithm works as follows:
Find the GCD of the input array (e.g. by repeatedly applying Euclid's algorithm). Call the result d.
Find the maximum element of the input array, and divide it by d. Call the result m.
If m >= k, then return (m - k + 1)*d, otherwise raise an error.
The m - k + 1 term is to get the kth largest element in the result; if the kth smallest element is required, this will be k*d.
Not a complete answer, just some thoughts that might help.
We have streams of numbers. For example, given [2, 4, 9], we know all numbers with the difference of 2 will be generated down from each number, m, higher than 2, as well as m mod 2, which starts another cycle.
9-2=7
7-2=5
5-2=3
etc.
We get [2, 3, 4, 5, 7, 9] and the remainder 1. But 1, using the same procedure, will generate all numbers in the range (1, max).
I would start with considering how to obtain the smallest such remainder (greater than zero) we can have. But we may also need to consider the full range of differences that each generate such a "stream."
import java.util.*;
class TestClass {
public static void main(String args[] ) throws Exception {
Scanner s = new Scanner(System.in);
int t = s.nextInt();
while(t--!=0){
int n = s.nextInt();
int arr[] = new int[n];
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
int k;
for(int i=0;i<n;i++){
arr[i] = s.nextInt();
if(arr[i] > max)
max = arr[i];
if(arr[i] < min)
min = arr[i];
}
k = s.nextInt();
int is_dev = 1;
for(int i =0 ;i < n;i++){
if(arr[i]%min !=0){
is_dev = 0; //easily reach to [1,2,3 .... max]
break;
}
}
if(is_dev == 0){
n = max - (k - 1);
} else {
n = max - min*(k-1);
}
if(n<=0)
System.out.println("-1");
else
System.out.println(n);
}
}
}
eg1 . 4 7 9
here min - 4
max - 9
if we check all number not divisible to 4 take min number always
iteration 1 . 4 5 7 9 - for pair (4,7)
iteration 2 . 1 4 5 7 9 - for pair (4,5)
at this point if we have we can easily go upto
1 2 3 4 5 6 7 8 9
easily find kth largest element
so point is any one of number not divisible to min number then we can easily reach to 1.
eg2 . 3 9
min 3
max 9
here all number number divisible by 3 so final array like .
3 6 9
kth largest element will be : max-3(k-1)
eg 9 - 3(2-1) =6

Number of different marks

I came across an interesting problem and I can't solve it in a good complexity (better than O(qn)):
There are n persons in a row. Initially every person in this row has some value - lets say that i-th person has value a_i. These values are pairwise distinct.
Every person gets a mark. There are two conditions:
If a_i < a_j then j-th person cant get worse mark than i-th person.
If i < j then j-th person can't get worse mark than i-th person (this condition tells us that sequence of marks is non-decreasing sequence).
There are q operations. In every operation two person are swapped (they swap their values).
After each operation you have tell what is maximal number of diffrent marks that these n persons can get.
Do you have any idea?
Consider any two groups, J and I (j < i and a_j < a_i for all j and i). In any swap scenario, a_i is the new max for J and a_j is the new min for I, and J gets extended to the right at least up to and including i.
Now if there was any group of is to the right of i whos values were all greater than the values in the left segment of I up to i, this group would not have been part of I, but rather its own group or part of another group denoting a higher mark.
So this kind of swap would reduce the mark count by the count of groups between J and I and merge groups J up to I.
Now consider an in-group swap. The only time a mark would be added is if a_i and a_j (j < i), are the minimum and maximum respectively of two adjacent segments, leading to the group splitting into those two segments. Banana123 showed in a comment below that this condition is not sufficient (e.g., 3,6,4,5,1,2 => 3,1,4,5,6,2). We can address this by also checking before the switch that the second smallest i is greater than the second largest j.
Banana123 also showed in a comment below that more than one mark could be added in this instance, for example 6,2,3,4,5,1. We can handle this by keeping in a segment tree a record of min,max and number of groups, which correspond with a count of sequential maxes.
Example 1:
(1,6,1) // (min, max, group_count)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 2 and 5. Updates happen in log(n) along the intervals containing 2 and 5.
To add group counts in a larger interval the left group's max must be lower than the right group's min. But if it's not, as in the second example, we must check one level down in the tree.
(1,6,1)
(2,6,1) (1,5,1)
(6,6,1) (2,3,2) (4,4,1) (1,5,1)
6 2 3 4 5 1
Swap 1 and 6:
(1,6,6)
(1,3,3) (4,6,3)
(1,1,1) (2,3,2) (4,4,1) (5,6,2)
1 2 3 4 5 6
Example 2:
(1,6,1)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 1 and 6. On the right side, we have two groups where the left group's max is greater than the right group's min, (4,4,1) (2,6,2). To get an accurate mark count, we go down a level and move 2 into 4's group to arrive at a count of two marks. A similar examination is then done in the level before the top.
(1,6,3)
(1,5,2) (2,6,2)
(1,1,1) (3,5,1) (4,4,1) (2,6,2)
1 5 3 4 2 6
Here's an O(n log n) solution:
If n = 0 or n = 1, then there are n distinct marks.
Otherwise, consider the two "halves" of the list, LEFT = [1, n/2] and RIGHT = [n/2 + 1, n]. (If the list has an odd number of elements, the middle element can go in either half, it doesn't matter.)
Find the greatest value in LEFT — call it aLEFT_MAX — and the least value in the second half — call it aRIGHT_MIN.
If aLEFT_MAX < aRIGHT_MIN, then there's no need for any marks to overlap between the two, so you can just recurse into each half and return the sum of the two results.
Otherwise, we know that there's some segment, extending at least from LEFT_MAX to RIGHT_MIN, where all elements have to have the same mark.
To find the leftmost extent of this segment, we can scan leftward from RIGHT_MIN down to 1, keeping track of the minimum value we've seen so far and the position of the leftmost element we've found to be greater than some further-rightward value. (This can actually be optimized a bit more, but I don't think we can improve the algorithmic complexity by doing so, so I won't worry about that.) And, conversely to find the rightmost extent of the segment.
Suppose the segment in question extends from LEFTMOST to RIGHTMOST. Then we just need to recursively compute the number of distinct marks in [1, LEFTMOST) and in (RIGHTMOST, n], and return the sum of the two results plus 1.
I wasn't able to get a complete solution, but here are a few ideas about what can and can't be done.
First: it's impossible to find the number of marks in O(log n) from the array alone - otherwise you could use your algorithm to check if the array is sorted faster than O(n), and that's clearly impossible.
General idea: spend O(n log n) to create any additional data which would let you to compute number of marks in O(log n) time and said data can be updated after a swap in O(log n) time. One possibly useful piece to include is the current number of marks (i.e. finding how number of marks changed may be easier than to compute what it is).
Since update time is O(log n), you can't afford to store anything mark-related (such as "the last person with the same mark") for each person - otherwise taking an array 1 2 3 ... n and repeatedly swapping first and last element would require you to update this additional data for every element in the array.
Geometric interpretation: taking your sequence 4 1 3 2 5 7 6 8 as an example, we can draw points (i, a_i):
|8
+---+-
|7 |
| 6|
+-+---+
|5|
-------+-+
4 |
3 |
2|
1 |
In other words, you need to cover all points by a maximal number of squares. Corollary: exchanging points from different squares a and b reduces total number of squares by |a-b|.
Index squares approach: let n = 2^k (otherwise you can add less than n fictional persons who will never participate in exchanges), let 0 <= a_i < n. We can create O(n log n) objects - "index squares" - which are "responsible" for points (i, a_i) : a*2^b <= i < (a+1)*2^b or a*2^b <= a_i < (a+1)*2^b (on our plane, this would look like a cross with center on the diagonal line a_i=i). Every swap affects only O(log n) index squares.
The problem is, I can't find what information to store for each index square so that it would allow to find number of marks fast enough? all I have is a feeling that such approach may be effective.
Hope this helps.
Let's normalize the problem first, so that a_i is in the range of 0 to n-1 (can be achieved in O(n*logn) by sorting a, but just hast to be done once so we are fine).
function normalize(a) {
let b = [];
for (let i = 0; i < a.length; i++)
b[i] = [i, a[i]];
b.sort(function(x, y) {
return x[1] < y[1] ? -1 : 1;
});
for (let i = 0; i < a.length; i++)
a[b[i][0]] = i;
return a;
}
To get the maximal number of marks we can count how many times
i + 1 == mex(a[0..i]) , i integer element [0, n-1]
a[0..1] denotes the sub-array of all the values from index 0 to i.
mex() is the minimal exclusive, which is the smallest value missing in the sequence 0, 1, 2, 3, ...
This allows us to solve a single instance of the problem (ignoring the swaps for the moment) in O(n), e.g. by using the following algorithm:
// assuming values are normalized to be element [0,n-1]
function maxMarks(a) {
let visited = new Array(a.length + 1);
let smallestMissing = 0, marks = 0;
for (let i = 0; i < a.length; i++) {
visited[a[i]] = true;
if (a[i] == smallestMissing) {
smallestMissing++;
while (visited[smallestMissing])
smallestMissing++;
if (i + 1 == smallestMissing)
marks++;
}
}
return marks;
}
If we swap the values at indices x and y (x < y) then the mex for all values i < x and i > y doesn't change, although it is an optimization, unfortunately that doesn't improve complexity and it is still O(qn).
We can observe that the hits (where mark is increased) are always at the beginning of an increasing sequence and all matches within the same sequence have to be a[i] == i, except for the first one, but couldn't derive an algorithm from it yet:
0 6 2 3 4 5 1 7
*--|-------|*-*
3 0 2 1 4 6 5 7
-|---|*-*--|*-*

Minimum sum that cant be obtained from a set

Given a set S of positive integers whose elements need not to be distinct i need to find minimal non-negative sum that cant be obtained from any subset of the given set.
Example : if S = {1, 1, 3, 7}, we can get 0 as (S' = {}), 1 as (S' = {1}), 2 as (S' = {1, 1}), 3 as (S' = {3}), 4 as (S' = {1, 3}), 5 as (S' = {1, 1, 3}), but we can't get 6.
Now we are given one array A, consisting of N positive integers. Their are M queries,each consist of two integers Li and Ri describe i'th query: we need to find this Sum that cant be obtained from array elements ={A[Li], A[Li+1], ..., A[Ri-1], A[Ri]} .
I know to find it by a brute force approach to be done in O(2^n). But given 1 ≤ N, M ≤ 100,000.This cant be done .
So is their any effective approach to do it.
Concept
Suppose we had an array of bool representing which numbers so far haven't been found (by way of summing).
For each number n we encounter in the ordered (increasing values) subset of S, we do the following:
For each existing True value at position i in numbers, we set numbers[i + n] to True
We set numbers[n] to True
With this sort of a sieve, we would mark all the found numbers as True, and iterating through the array when the algorithm finishes would find us the minimum unobtainable sum.
Refinement
Obviously, we can't have a solution like this because the array would have to be infinite in order to work for all sets of numbers.
The concept could be improved by making a few observations. With an input of 1, 1, 3, the array becomes (in sequence):
(numbers represent true values)
An important observation can be made:
(3) For each next number, if the previous numbers had already been found it will be added to all those numbers. This implies that if there were no gaps before a number, there will be no gaps after that number has been processed.
For the next input of 7 we can assert that:
(4) Since the input set is ordered, there will be no number less than 7
(5) If there is no number less than 7, then 6 cannot be obtained
We can come to a conclusion that:
(6) the first gap represents the minimum unobtainable number.
Algorithm
Because of (3) and (6), we don't actually need the numbers array, we only need a single value, max to represent the maximum number found so far.
This way, if the next number n is greater than max + 1, then a gap would have been made, and max + 1 is the minimum unobtainable number.
Otherwise, max becomes max + n. If we've run through the entire S, the result is max + 1.
Actual code (C#, easily converted to C):
static int Calculate(int[] S)
{
int max = 0;
for (int i = 0; i < S.Length; i++)
{
if (S[i] <= max + 1)
max = max + S[i];
else
return max + 1;
}
return max + 1;
}
Should run pretty fast, since it's obviously linear time (O(n)). Since the input to the function should be sorted, with quicksort this would become O(nlogn). I've managed to get results M = N = 100000 on 8 cores in just under 5 minutes.
With numbers upper limit of 10^9, a radix sort could be used to approximate O(n) time for the sorting, however this would still be way over 2 seconds because of the sheer amount of sorts required.
But, we can use statistical probability of 1 being randomed to eliminate subsets before sorting. On the start, check if 1 exists in S, if not then every query's result is 1 because it cannot be obtained.
Statistically, if we random from 10^9 numbers 10^5 times, we have 99.9% chance of not getting a single 1.
Before each sort, check if that subset contains 1, if not then its result is one.
With this modification, the code runs in 2 miliseconds on my machine. Here's that code on http://pastebin.com/rF6VddTx
This is a variation of the subset-sum problem, which is NP-Complete, but there is a pseudo-polynomial Dynamic Programming solution you can adopt here, based on the recursive formula:
f(S,i) = f(S-arr[i],i-1) OR f(S,i-1)
f(-n,i) = false
f(_,-n) = false
f(0,i) = true
The recursive formula is basically an exhaustive search, each sum can be achieved if you can get it with element i OR without element i.
The dynamic programming is achieved by building a SUM+1 x n+1 table (where SUM is the sum of all elements, and n is the number of elements), and building it bottom-up.
Something like:
table <- SUM+1 x n+1 table
//init:
for each i from 0 to SUM+1:
table[0][i] = true
for each j from 1 to n:
table[j][0] = false
//fill the table:
for each i from 1 to SUM+1:
for each j from 1 to n+1:
if i < arr[j]:
table[i][j] = table[i][j-1]
else:
table[i][j] = table[i-arr[j]][j-1] OR table[i][j-1]
Once you have the table, you need the smallest i such that for all j: table[i][j] = false
Complexity of solution is O(n*SUM), where SUM is the sum of all elements, but note that the algorithm can actually be trimmed after the required number was found, without the need to go on for the next rows, which are un-needed for the solution.

Counting bounded slice codility

I have recently attended a programming test in codility, and the question is to find the Number of bounded slice in an array..
I am just giving you breif explanation of the question.
A Slice of an array said to be a Bounded slice if Max(SliceArray)-Min(SliceArray)<=K.
If Array [3,5,6,7,3] and K=2 provided .. the number of bounded slice is 9,
first slice (0,0) in the array Min(0,0)=3 Max(0,0)=3 Max-Min<=K result 0<=2 so it is bounded slice
second slice (0,1) in the array Min(0,1)=3 Max(0,1)=5 Max-Min<=K result 2<=2 so it is bounded slice
second slice (0,2) in the array Min(0,1)=3 Max(0,2)=6 Max-Min<=K result 3<=2 so it is not bounded slice
in this way you can find that there are nine bounded slice.
(0, 0), (0, 1), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3), (4, 4).
Following is the solution i have provided
private int FindBoundSlice(int K, int[] A)
{
int BoundSlice=0;
Stack<int> MinStack = new Stack<int>();
Stack<int> MaxStack = new Stack<int>();
for (int p = 0; p < A.Length; p++)
{
MinStack.Push(A[p]);
MaxStack.Push(A[p]);
for (int q = p; q < A.Length; q++)
{
if (IsPairBoundedSlice(K, A[p], A[q], MinStack, MaxStack))
BoundSlice++;
else
break;
}
}
return BoundSlice;
}
private bool IsPairBoundedSlice(int K, int P, int Q,Stack<int> Min,Stack<int> Max)
{
if (Min.Peek() > P)
{
Min.Pop();
Min.Push(P);
}
if (Min.Peek() > Q)
{
Min.Pop();
Min.Push(Q);
}
if (Max.Peek() < P)
{
Max.Pop();
Max.Push(P);
}
if (Max.Peek() < Q)
{
Max.Pop();
Max.Push(Q);
}
if (Max.Peek() - Min.Peek() <= K)
return true;
else
return false;
}
But as per codility review the above mentioned solution is running in O(N^2), can anybody help me in finding the solution which runs in O(N).
Maximum Time Complexity allowed O(N).
Maximum Space Complexity allowed O(N).
Disclaimer
It is possible and I demonstrate it here to write an algorithm that solves the problem you described in linear time in the worst case, visiting each element of the input sequence at a maximum of two times.
This answer is an attempt to deduce and describe the only algorithm I could find and then gives a quick tour through an implementation written in Clojure. I will probably write a Java implementation as well and update this answer but as of now that task is left as an excercise to the reader.
EDIT: I have now added a working Java implementation. Please scroll down to the end.
EDIT: Notices that PeterDeRivaz provided a sequence ([0 1 2 3 4], k=2) making the algorithm visit certain elements three times and probably falsifying it. I will update the answer at later time regarding that issue.
Unless I have overseen something trivial I can hardly imagine significant further simplification. Feedback is highly welcome.
(I found your question here when googling for codility like exercises as a preparation for a job test there myself. I set myself aside half an hour to solve it and didn't come up with a solution, so I was unhappy and spent some dedicated hammock time - now that I have taken the test I must say found the presented exercises significantly less difficult than this problem).
Observations
For any valid bounded slice of size we can say that it is divisible into the triangular number of size bounded sub-slices with their individual bounds lying within the slices bounds (including itself).
Ex. 1: [3 1 2] is a bounded slice for k=2, has a size of 3 and thus can be divided into (3*4)/2=6 sub-slices:
[3 1 2] ;; slice 1
[3 1] [1 2] ;; slices 2-3
[3] [1] [2] ;; slices 4-6
Naturally, all those slices are bounded slices for k.
When you have two overlapping slices that are both bounded slices for k but differ in their bounds, the amount of possible bounded sub-slices in the array can be calculated as the sum of the triangular numbers of those slices minus the triangular number of the count of elements they share.
Ex. 2: The bounded slices [4 3 1] and [3 1 2] for k=2 differ in bounds and overlap in the array [4 3 1 2]. They share the bounded slice [3 1] (notice that overlapping bounded slices always share a bounded slice, otherwise they could not overlap). For both slices the triangular number is 6, the triangular number of the shared slice is (2*3)/2=3. Thus the array can be divided into 6+6-3=9 slices:
[4 3 1] [3 1 2] ;; 1-2 the overlapping slices
[4 3] 6 [3 1] 6 [1 2] ;; 3-5 two slices and the overlapping slice
[4] [3] 3 [1] [2] ;; 6-9 single-element slices
As observable, the triangle of the overlapping bounded slice is part of both triangles element count, so that is why it must be subtracted from the added triangles as it otherwise would be counted twice. Again, all counted slices are bounded slices for k=2.
Approach
The approach is to find the largest possible bounded slices within the input sequence until all elements have been visited, then to sum them up using the technique described above.
A slice qualifies as one of the largest possible bounded slices (in the following text often referred as one largest possible bounded slice which shall then not mean the largest one, only one of them) if the following conditions are fulfilled:
It is bounded
It may share elements with two other slices to its left and right
It can not grow to the left or to the right without becoming unbounded - meaning: If it is possible, it has to contain so many elements that its maximum-minimum=k
By implication a bounded slice does not qualify as one of the largest possible bounded slices if there is a bounded slice with more elements that entirely encloses this slice
As a goal our algorithm must be capable to start at any element in the array and determine one largest possible bounded slice that contains that element and is the only one to contain it. It is then guaranteed that the next slice constructed from a starting point outside of it will not share the starting element of the previous slice because otherwise it would be one largest possible bounded slice with the previously found slice together (which now, by definition, is impossible). Once that algorithm has been found it can be applied sequentially from the beginning building such largest possible slices until no more elements are left. This would guarantee that each element is traversed two times in the worst case.
Algorithm
Start at the first element and find the largest possible bounded slice that includes said first element. Add the triangular number of its size to the counter.
Continue exactly one element after found slice and repeat. Subtract the triangular number of the count of elements shared with the previous slice (found searching backwards), add the triangular number of its total size (found searching forwards and backwards) until the sequence has been traversed. Repeat until no more elements can be found after a found slice, return the result.
Ex. 3: For the input sequence [4 3 1 2 0] with k=2 find the count of bounded slices.
Start at the first element, find the largest possible bounded slice:
[4 3], count=2, overlap=0, result=3
Continue after that slice, find the largest possible bounded slice:
[3 1 2], size=3, overlap=1, result=3-1+6=8
...
[1 2 0], size=3, overlap=2, result=8-3+6=11
result=11
Process behavior
In the worst case the process grows linearly in time and space. As proven above, elements are traversed two times at max. and per search for a largest possible bounded slice some locals need to be stored.
However, the process becomes dramatically faster as the array contains less largest possible bounded slices. For example, the array [4 4 4 4] with k>=0 has only one largest possible bounded slice (the array itself). The array will be traversed once and the triangular number of the count of its elements is returned as the correct result. Notice how this is complementary to solutions of worst case growth O((n * (n+1)) / 2). While they reach their worst case with only one largest possible bounded slice, for this algorithm such input would mean the best case (one visit per element in one pass from start to end).
Implementation
The most difficult part of the implementation is to find a largest bounded slice from one element scanning in two directions. When we search in one direction, we track the minimum and maximum bounds of our search and see how they compare to k. Once an element has been found that stretches the bounds so that maximum-minimum <= k does not hold anymore, we are done in that direction. Then we search into the other direction but use the last valid bounds of the backwards scan as starting bounds.
Ex.4: We start in the array [4 3 1 2 0] at the third element (1) after we have successfully found the largest bounded slice [4 3]. At this point we only know that our starting value 1 is the minimum, the maximum (of the searched largest bounded slice) or between those two. We scan backwards (exclusive) and stop after the second element (as 4 - 1 > k=2). The last valid bounds were 1 and 3. When we now scan forwards, we use the same algorithm but use 1 and 3 as bounds. Notice that even though in this example our starting element is one of the bounds, that is not always the case: Consider the same scenario with a 2 instead of the 3: Neither that 2 or the 1 would be determined to be a bound as we could find a 0 but also a 3 while scanning forwards - only then it could be decided which of 2 or 3 is a lower or upper bound.
To solve that problem here is a special counting algorithm. Don't worry if you don't understand Clojure yet, it does just what it says.
(defn scan-while-around
"Count numbers in `coll` until a number doesn't pass an (inclusive)
interval filter where said interval is guaranteed to contain
`around` and grows with each number to a maximum size of `size`.
Return count and the lower and upper bounds (inclusive) that were not
passed as [count lower upper]."
([around size coll]
(scan-while-around around around size coll))
([lower upper size coll]
(letfn [(step [[count lower upper :as result] elem]
(let [lower (min lower elem)
upper (max upper elem)]
(if (<= (- upper lower) size)
[(inc count) lower upper]
(reduced result))))]
(reduce step [0 lower upper] coll))))
Using this function we can search backwards, from before the starting element passing it our starting element as around and using k as the size.
Then we start a forward scan from the starting element with the same function, by passing it the previously returned bounds lower and upper.
We add their returned counts to the total count of the found largest possible slide and use the count of the backwards scan as the length of the overlap and subtract its triangular number.
Notice that in any case the forward scan is guaranteed to return a count of at least one. This is important for the algorithm for two reasons:
We use the resulting count of the forward scan to determine the starting point of the next search (and would loop infinitely with it being 0)
The algorithm would not be correct as for any starting element the smallest possible largest possible bounded slice always exists as an array of size 1 containing the starting element.
Assuming that triangular is a function returning the triangular number, here is the final algorithm:
(defn bounded-slice-linear
"Linear implementation"
[s k]
(loop [start-index 0
acc 0]
(if (< start-index (count s))
(let [start-elem (nth s start-index)
[backw lower upper] (scan-while-around start-elem
k
(rseq (subvec s 0
start-index)))
[forw _ _] (scan-while-around lower upper k
(subvec s start-index))]
(recur (+ start-index forw)
(-> acc
(+ (triangular (+ forw
backw)))
(- (triangular backw)))))
acc)))
(Notice that the creation of subvectors and their reverse sequences happens in constant time and that the resulting vectors share structure with the input vector so no "rest-size" depending allocation is happening (although it may look like it). This is one of the beautiful aspects of Clojure, that you can avoid tons of index-fiddling and usually work with elements directly.)
Here is a triangular implementation for comparison:
(defn bounded-slice-triangular
"O(n*(n+1)/2) implementation for testing."
[s k]
(reduce (fn [c [elem :as elems]]
(+ c (first (scan-while-around elem k elems))))
0
(take-while seq
(iterate #(subvec % 1) s))))
Both functions only accept vectors as input.
I have extensively tested their behavior for correctness using various strategies. Please try to prove them wrong anyway. Here is a link to a full file to hack on: https://www.refheap.com/32229
Here is the algorithm implemented in Java (not tested as extensively but seems to work, Java is not my first language. I'd be happy about feedback to learn)
public class BoundedSlices {
private static int triangular (int i) {
return ((i * (i+1)) / 2);
}
public static int solve (int[] a, int k) {
int i = 0;
int result = 0;
while (i < a.length) {
int lower = a[i];
int upper = a[i];
int countBackw = 0;
int countForw = 0;
for (int j = (i-1); j >= 0; --j) {
if (a[j] < lower) {
if (upper - a[j] > k)
break;
else
lower = a[j];
}
else if (a[j] > upper) {
if (a[j] - lower > k)
break;
else
upper = a[j];
}
countBackw++;
}
for (int j = i; j <a.length; j++) {
if (a[j] < lower) {
if (upper - a[j] > k)
break;
else
lower = a[j];
}
else if (a[j] > upper) {
if (a[j] - lower > k)
break;
else
upper = a[j];
}
countForw++;
}
result -= triangular(countBackw);
result += triangular(countForw + countBackw);
i+= countForw;
}
return result;
}
}
Now codility release their golden solution with O(N) time and space.
https://codility.com/media/train/solution-count-bounded-slices.pdf
if you still confused after read the pdf, like me.. here is a
very nice explanation
The solution from the pdf:
def boundedSlicesGolden(K, A):
N = len(A)
maxQ = [0] * (N + 1)
posmaxQ = [0] * (N + 1)
minQ = [0] * (N + 1)
posminQ = [0] * (N + 1)
firstMax, lastMax = 0, -1
firstMin, lastMin = 0, -1
j, result = 0, 0
for i in xrange(N):
while (j < N):
# added new maximum element
while (lastMax >= firstMax and maxQ[lastMax] <= A[j]):
lastMax -= 1
lastMax += 1
maxQ[lastMax] = A[j]
posmaxQ[lastMax] = j
# added new minimum element
while (lastMin >= firstMin and minQ[lastMin] >= A[j]):
lastMin -= 1
lastMin += 1
minQ[lastMin] = A[j]
posminQ[lastMin] = j
if (maxQ[firstMax] - minQ[firstMin] <= K):
j += 1
else:
break
result += (j - i)
if result >= maxINT:
return maxINT
if posminQ[firstMin] == i:
firstMin += 1
if posmaxQ[firstMax] == i:
firstMax += 1
return result
HINTS
Others have explained the basic algorithm which is to keep 2 pointers and advance the start or the end depending on the current difference between maximum and minimum.
It is easy to update the maximum and minimum when moving the end.
However, the main challenge of this problem is how to update when moving the start. Most heap or balanced tree structures will cost O(logn) to update, and will result in an overall O(nlogn) complexity which is too high.
To do this in time O(n):
Advance the end until you exceed the allowed threshold
Then loop backwards from this critical position storing a cumulative value in an array for the minimum and maximum at every location between the current end and the current start
You can now advance the start pointer and immediately lookup from the arrays the updated min/max values
You can carry on using these arrays to update start until start reaches the critical position. At this point return to step 1 and generate a new set of lookup values.
Overall this procedure will work backwards over every element exactly once, and so the total complexity is O(n).
EXAMPLE
For the sequence with K of 4:
4,1,2,3,4,5,6,10,12
Step 1 advances the end until we exceed the bound
start,4,1,2,3,4,5,end,6,10,12
Step 2 works backwards from end to start computing array MAX and MIN.
MAX[i] is maximum of all elements from i to end
Data = start,4,1,2,3,4,5,end,6,10,12
MAX = start,5,5,5,5,5,5,critical point=end -
MIN = start,1,1,2,3,4,5,critical point=end -
Step 3 can now advance start and immediately lookup the smallest values of max and min in the range start to critical point.
These can be combined with the max/min in the range critical point to end to find the overall max/min for the range start to end.
PYTHON CODE
def count_bounded_slices(A,k):
if len(A)==0:
return 0
t=0
inf = max(abs(a) for a in A)
left=0
right=0
left_lows = [inf]*len(A)
left_highs = [-inf]*len(A)
critical = 0
right_low = inf
right_high = -inf
# Loop invariant
# t counts number of bounded slices A[a:b] with a<left
# left_lows[i] is defined for values in range(left,critical)
# and contains the min of A[left:critical]
# left_highs[i] contains the max of A[left:critical]
# right_low is the minimum of A[critical:right]
# right_high is the maximum of A[critical:right]
while left<len(A):
# Extend right as far as possible
while right<len(A) and max(left_highs[left],max(right_high,A[right]))-min(left_lows[left],min(right_low,A[right]))<=k:
right_low = min(right_low,A[right])
right_high = max(right_high,A[right])
right+=1
# Now we know that any slice starting at left and ending before right will satisfy the constraints
t += right-left
# If we are at the critical position we need to extend our left arrays
if left==critical:
critical=right
left_low = inf
left_high = -inf
for x in range(critical-1,left,-1):
left_low = min(left_low,A[x])
left_high = max(left_high,A[x])
left_lows[x] = left_low
left_highs[x] = left_high
right_low = inf
right_high = -inf
left+=1
return t
A = [3,5,6,7,3]
print count_bounded_slices(A,2)
Here is my attempt at solving this problem:
- you start with p and q form position 0, min =max =0;
- loop until p = q = N-1
- as long as max-min<=k advance q and increment number of bounded slides.
- if max-min >k advance p
- you need to keep track of 2x min/max values because when you advance p, you might remove one or both of the min/max values
- each time you advance p or q update min/max
I can write the code if you want, but I think the idea is explicit enough...
Hope it helps.
Finally a code that works according to the below mentioned idea. This outputs 9.
(The code is in C++. You can change it for Java)
#include <iostream>
using namespace std;
int main()
{
int A[] = {3,5,6,7,3};
int K = 2;
int i = 0;
int j = 0;
int minValue = A[0];
int maxValue = A[0];
int minIndex = 0;
int maxIndex = 0;
int length = sizeof(A)/sizeof(int);
int count = 0;
bool stop = false;
int prevJ = 0;
while ( (i < length || j < length) && !stop ) {
if ( maxValue - minValue <= K ) {
if ( j < length-1 ) {
j++;
if ( A[j] > maxValue ) {
maxValue = A[j];
maxIndex = j;
}
if ( A[j] < minValue ) {
minValue = A[j];
minIndex = j;
}
} else {
count += j - i + 1;
stop = true;
}
} else {
if ( j > 0 ) {
int range = j - i;
int count1 = range * (range + 1) / 2; // Choose 2 from range with repitition.
int rangeRep = prevJ - i; // We have to subtract already counted ones.
int count2 = rangeRep * (rangeRep + 1) / 2;
count += count1 - count2;
prevJ = j;
}
if ( A[j] == minValue ) {
// first reach the first maxima
while ( A[i] - minValue <= K )
i++;
// then come down to correct level.
while ( A[i] - minValue > K )
i++;
maxValue = A[i];
} else {//if ( A[j] == maxValue ) {
while ( maxValue - A[i] <= K )
i++;
while ( maxValue - A[i] > K )
i++;
minValue = A[i];
}
}
}
cout << count << endl;
return 0;
}
Algorithm (minor tweaking done in code):
Keep two pointers i & j and maintain two values minValue and maxValue..
1. Initialize i = 0, j = 0, and minValue = maxValue = A[0];
2. If maxValue - minValue <= K,
- Increment count.
- Increment j.
- if new A[j] > maxValue, maxValue = A[j].
- if new A[j] < minValue, minValue = A[j].
3. If maxValue - minValue > K, this can only happen iif
- the new A[j] is either maxValue or minValue.
- Hence keep incrementing i untill abs(A[j] - A[i]) <= K.
- Then update the minValue and maxValue and proceed accordingly.
4. Goto step 2 if ( i < length-1 || j < length-1 )
I have provided the answer for the same question in different SO Question
(1) For an A[n] input , for sure you will have n slices , So add at first.
for example for {3,5,4,7,6,3} you will have for sure (0,0)(1,1)(2,2)(3,3)(4,4) (5,5).
(2) Then find the P and Q based on min max comparison.
(3) apply the Arithmetic series formula to find the number of combination between (Q-P) as a X . then it would be X ( X+1) /2 But we have considered "n" already so the formula would be (x ( x+1) /2) - x) which is x (x-1) /2 after basic arithmetic.
For example in the above example if P is 0 (3) and Q is 3 (7) we have Q-P is 3 . When apply the formula the value would be 3 (3-1)/2 = 3. Now add the 6 (length) + 3 .Then take care of Q- min or Q - max records.
Then check the Min and Max index .In this case Min as 0 Max as 3 (obivously any one of the would match with currentIndex (which ever used to loop). here we took care of (0,1)(0,2)(1,2) but we have not taken care of (1,3) (2,3) . Rather than start the hole process from index 1 , save this number (position 2,3 = 2) , then start same process from currentindex( assume min and max as A[currentIndex] as we did while starting). finaly multiply the number with preserved . in our case 2 * 2 ( A[7],A[6]) .
It runs in O(N) time with O(N) space.
I came up with a solution in Scala:
package test
import scala.collection.mutable.Queue
object BoundedSlice {
def apply(k:Int, a:Array[Int]):Int = {
var c = 0
var q:Queue[Int] = Queue()
a.map(i => {
if(!q.isEmpty && Math.abs(i-q.last) > k)
q.clear
else
q = q.dropWhile(j => (Math.abs(i-j) > k)).toQueue
q += i
c += q.length
})
c
}
def main(args: Array[String]): Unit = {
val a = Array[Int](3,5,6,7,3)
println(BoundedSlice(2, a))
}
}

nth smallest element in a union of an array of intervals with repetition

I want to know if there is a more efficient solution than what I came up with(not coded it yet but described the gist of it at the bottom).
Write a function calcNthSmallest(n, intervals) which takes as input a non-negative int n, and a list of intervals [[a_1; b_1]; : : : ; [a_m; b_m]] and calculates the nth smallest number (0-indexed) when taking the union of all the intervals with repetition. For example, if the intervals were [1; 5]; [2; 4]; [7; 9], their union with repetition would be [1; 2; 2; 3; 3; 4; 4; 5; 7; 8; 9] (note 2; 3; 4 each appear twice since they're in both the intervals [1; 5] and [2; 4]). For this list of intervals, the 0th smallest number would be 1, and the 3rd and 4th smallest would both be 3. Your implementation should run quickly even when the a_i; b_i can be very large (like, one trillion), and there are several intervals
The way I thought to go about it is the straightforward solution which is to make the union array and traverse it.
This problem can be solved in O(N log N) where N is the number of intervals in the list, regardless of the actual values of the interval endpoints.
The key to solving this problem efficiently is to transform the list of possibly-overlapping intervals into a list of intervals which are either disjoint or identical. In the given example, only the first interval needs to be split:
{ [1,5], [2,4], [7,9]} =>
+-----------------+ +---+ +---+
{[1,1], [2,4], [5,5], [2,4], [7,9]}
(This doesn't have to be done explicitly, though: see below.) Now, we can sort the new intervals, replacing duplicates with a count. From that, we can compute the number of values each (possibly-duplicated) interval represents. Now, we simply need to accumulate the values to figure out which interval the solution lies in:
interval count size values cumulative
in interval values
[1,1] 1 1 1 [0, 1)
[2,4] 2 3 6 [1, 7) (eg. from n=1 to n=6 will be here)
[5,5] 1 1 1 [7, 8)
[7,9] 1 3 3 [8, 11)
I wrote the cumulative values as a list of half-open intervals, but obviously we only need the end-points. We can then find which interval holds value n by, for example, binary-searching the cumulative values list, and we can figure out which value in the interval we want by subtracting the start of the interval from n and then integer-dividing by the count.
It should be clear that the maximum size of the above table is twice the number of original intervals, because every row must start and end at either the start or end of some interval in the original list. If we'd written the intervals as half-open instead of closed, this would be even clearer; in that case, we can assert that the precise size of the table will be the number of unique values in the collection of end-points. And from that insight, we can see that we don't really need the table at all; we just need the sorted list of end-points (although we need to know which endpoint each value represents). We can simply iterate through that list, maintaining the count of the number of active intervals, until we reach the value we're looking for.
Here's a quick python implementation. It could be improved.
def combineIntervals(intervals):
# endpoints will map each endpoint to a count
endpoints = {}
# These two lists represent the start and (1+end) of each interval
# Each start adds 1 to the count, and each limit subtracts 1
for start in (i[0] for i in intervals):
endpoints[start] = endpoints.setdefault(start, 0) + 1
for limit in (i[1]+1 for i in intervals):
endpoints[limit] = endpoints.setdefault(limit, 0) - 1
# Filtering is a possibly premature optimization but it was easy
return sorted(filter(lambda kv: kv[1] != 0,
endpoints.iteritems()))
def nthSmallestInIntervalList(n, intervals):
limits = combineIntervals(intervals)
cumulative = 0
count = 0
index = 0
here = limits[0][0]
while index < len(limits):
size = limits[index][0] - here
if n < cumulative + count * size:
# [here, next) contains the value we're searching for
return here + (n - cumulative) / count
# advance
cumulative += count * size
count += limits[index][1]
here += size
index += 1
# We didn't find it. We could throw an error
So, as I said, the running time of this algorithm is independent of the actual values of the intervals; it only depends in the length of the interval list. This particular solution is O(N log N) because of the cost of the sort (in combineIntervals); if we used a priority queue instead of a full sort, we could construct the heap in O(N) but making the scan O(log N) for each scanned endpoint. Unless N is really big and the expected value of the argument n is relatively small, this would be counter-productive. There might be other ways to reduce complexity, though.
Edit2:
Here's yet another take on your question.
Let's consider the intervals graphically:
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
[-------]
[-----------------]
[---------]
[--------------]
[-----]
When sorted in increasing order on the lower bound, we could get something that looks like the above for the interval list ([2;10];[4;24];[7;17];[13;30];[20;27]). Each lower bound indicates the start of a new interval, and would also marks the beginning of one more "level" of duplication of the numbers. Conversely, upper bounds mark the end of that level, and decrease the duplication level of one.
We could therefore convert the above into the following list:
[2;+];[4;+];[7;+][10;-];[13;+];[17;-][20;+];[24;-];[27;-];[30;-]
Where the first value indicates the rank of the bound, and the second value whether the bound is lower (+) or upper (-). The computation of the nth element is done by following the list, raising or lowering the duplication level when encountering an lower or upper bound, and using the duplication level as a counting factor.
Let's consider again the list graphically, but as an histogram:
3333 44444 5555
2222222333333344444555
111111111222222222222444444
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
The view above is the same as the first one, with all the intervals packed vertically.
1 being the elements of the 1st one, 2 the second one, etc. In fact, what matters here
is the height at each index, corresponding of the number of time each index is duplicated in the union of all intervals.
3333 55555 7777
2223333445555567777888
112223333445555567777888999
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
| | | | | | || | |
We can see that histogram blocks start at lower bounds of intervals, and end either on upper bounds, or one unit before lower bounds, so the new notation must be modified accordingly.
With a list containing n intervals, as a first step, we convert the list into the notation above (O(n)), and sort it in increasing bound order (O(nlog(n))). The second step of computing the number is then in O(n), for a total average time in O(nlog(n)).
Here's a simple implementation in OCaml, using 1 and -1 instead of '+' and '-'.
(* transform the list in the correct notation *)
let rec convert = function
[] -> []
| (l,u)::xs -> (l,1)::(u+1,-1)::convert xs;;
(* the counting function *)
let rec count r f = function
[] -> raise Not_found
| [a,x] -> (match f + x with
0 -> if r = 0 then a else raise Not_found
| _ -> a + (r / f))
| (a,x)::(b,y)::l ->
if a = b
then count r f ((b,x+y)::l)
else
let f = f + x in
if f > 0 then
let range = (b - a) * f in
if range > r
then a + (r / f)
else count (r - range) f ((b,y)::l)
else count r f ((b,y)::l);;
(* the compute function *)
let compute l =
let compare (x,_) (y,_) = compare x y in
let l = List.sort compare (convert l) in
fun m -> count m 0 l;;
Notes:
- the function above will raise an exception if the sought number is above the intervals. This corner case isn't taken in account by the other methods below.
- the list sorting function used in OCaml is merge sort, which effectively performs in O(nlog(n)).
Edit:
Seeing that you might have very large intervals, the solution I gave initially (see down below) is far from optimal.
Instead, we could make things much faster by transforming the list:
we try to compress the interval list by searching for overlapping ones and replace them by prefixing intervals, several times the overlapping one, and suffixing intervals. We can then directly compute the number of entries covered by each element of the list.
Looking at the splitting above (prefix, infix, suffix), we see that the optimal structure to do the processing is a binary tree. A node of that tree may optionally have a prefix and a suffix. So the node must contain :
an interval i in the node
an integer giving the number of repetition of i in the list,
a left subtree of all the intervals below i
a right subtree of all the intervals above i
with this structure in place, the tree is automatically sorted.
Here's an example of an ocaml type embodying that tree.
type tree = Empty | Node of int * interval * tree * tree
Now the transformation algorithm boils down to building the tree.
This function create a tree out of its component:
let cons k r lt rt =
the tree made of count k, interval r, left tree lt and right tree rt
This function recursively insert an interval in a tree.
let rec insert i it =
let r = root of it
let lt = the left subtree of it
let rt = the right subtree of it
let k = the count of r
let prf, inf, suf = the prefix, infix and suffix of i according to r
return cons (k+1) inf (insert prf lt) (insert suf rt)
Once the tree is built, we do a pre-order traversal of the tree, using the count of the node to accelerate the computation of the nth element.
Below is my previous answer.
Here are the steps of my solution:
you need to sort the interval list in increasing order on the lower bound of each interval
you need a deque dq (or a list which will be reversed at some point) to store the intervals
here's the code:
let lower i = lower bound of interval i
let upper i = upper bound of i
let il = sort of interval list
i <- 0
j <- lower (head of il)
loop on il:
i <- i + 1
let h = the head of il
let il = the tail of il
if upper h > j then push h to dq
if lower h > j then
il <- concat dq and il
j <- j + 1
dq <- empty
loop
if i = k then return j
loop
This algorithm works by simply iterating through the intervals, only taking in account the relevant intervals, and counting both the rank i of the element in the union, and the value j of that element. When the targeted rank k has been reached, the value is returned.
The complexity is roughly in O(k) + O(sort(l)).
if i have understood your question correctly, you want to find the kth largest element in union of list of intervals.
If we assume that no of list = 2 the question is :
Find the kth smallest element in union of two sorted arrays (where an interval [2,5] is nothing but elements from 2 to 5 {2,3,4,5}) this sollution can be solved in (n+m)log(n+m) time where (n and m are sizes of list) . where i and j are list iterators .
Maintaining the invariant
i + j = k – 1,
If Bj-1 < Ai < Bj, then Ai must be the k-th smallest,
or else if Ai-1 < Bj < Ai, then Bj must be the k-th smallest.
For details click here
Now the problem is if you have no of lists=3 lists then
Maintaining the invariant
i + j+ x = k – 1,
i + j=k-x-1
The value k-x-1 can take y (size of third list, because x iterates from start point of list to end point) .
problem of 3 lists size can be reduced to y*(problem of size 2 list). So complexity is `y*((n+m)log(n+m))`
If Bj-1 < Ai < Bj, then Ai must be the k-th smallest,
or else if Ai-1 < Bj < Ai, then Bj must be the k-th smallest.
So for problem of size n list the complexity is NP .
But yes we can do minor improvement if we know that k< sizeof(some lists) we can chop the elements starting from k+1th element to end(from our search space ) in those list whose size is bigger than k (i think it doesnt help for large k).If there is any mistake please let me know.
Let me explain with an example:
Assume we are given these intervals [5,12],[3,9],[8,13].
The union of these intervals is:
number : 3 4 5 5 6 6 7 7 8 8 8 9 9 9 10 10 11 11 12 12 13.
indices: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
The lowest will return 11 when 9 is passed an input.
The highest will return 14 when 9 is passed an input.
Lowest and highest function just check whether the x is present in that interval, if it is present then adds x-a(lower index of interval) to return value for that one particular interval. If an interval is completely smaller than x, then adds total number of elements in that interval to the return value.
The find function will return 9 when 13 is passed.
The find function will use the concept of binary search to find the kth smallest element. In the given range [0,N] (if range is not given we can find high range in O(n)) find the mid and calculate the lowest and highest for mid. If given k falls in between lowest and highest return mid else if k is less than or equal to lowest search in the lower half(0,mid-1) else search in the upper half(mid+1,high).
If the number of intervals are n and the range is N, then the running time of this algorithm is n*log(N). we will find lowest and highest (which runs in O(n)) log(N) times.
//Function call will be `find(0,N,k,in)`
//Retrieves the no.of smaller elements than first x(excluding) in union
public static int lowest(List<List<Integer>> in, int x){
int sum = 0;
for(List<Integer> lst: in){
if(x > lst.get(1))
sum += lst.get(1) - lst.get(0)+1;
else if((x >= lst.get(0) && x<lst.get(1)) || (x > lst.get(0) && x<=lst.get(1))){
sum += x - lst.get(0);
}
}
return sum;
}
//Retrieve the no.of smaller elements than last x(including) in union.
public static int highest(List<List<Integer>> in, int x){
int sum = 0;
for(List<Integer> lst: in){
if(x > lst.get(1))
sum += lst.get(1) - lst.get(0)+1;
else if((x >= lst.get(0) && x<lst.get(1)) || (x > lst.get(0) && x<=lst.get(1))){
sum += x - lst.get(0)+1;
}
}
return sum;
}
//Do binary search on the range.
public static int find(int low, int high, int k,List<List<Integer>> in){
if(low > high)
return -1;
int mid = low + (high-low)/2;
int lowIdx = lowest(in,mid);
int highIdx = highest(in,mid);
//k lies between the current numbers high and low indices
if(k > lowIdx && k <= highIdx) return mid;
//k less than lower index. go on to left side
if(k <= lowIdx) return find(low,mid-1,k,in);
// k greater than higher index go to right
if(k > highIdx) return find(mid+1,high,k,in);
else
return -1; // catch statement
}
It's possible to count how many numbers in the list are less than some chosen number X (by iterating through all of the intervals). Now, if this number is greater than n, the solution is certainly smaller than X. Similarly, if this number is less than or equal to n, the solution is greater than or equal to X. Based on these observation we can use binary search.
Below is a Java implementation :
public int nthElement( int[] lowerBound, int[] upperBound, int n )
{
int lo = Integer.MIN_VALUE, hi = Integer.MAX_VALUE;
while ( lo < hi ) {
int X = (int)( ((long)lo+hi+1)/2 );
long count = 0;
for ( int i=0; i<lowerBound.length; ++i ) {
if ( X >= lowerBound[i] && X <= upperBound[i] ) {
// part of interval i is less than X
count += (long)X - lowerBound[i];
}
if ( X >= lowerBound[i] && X > upperBound[i] ) {
// all numbers in interval i are less than X
count += (long)upperBound[i] - lowerBound[i] + 1;
}
}
if ( count <= n ) lo = X;
else hi = X-1;
}
return lo;
}

Resources