Number of Ranges that lie completely in a particular Range - algorithm

Let us say we have n ranges each specified by
[l_i,r_i] where 1<=i<=n
And we have a query of type [L,R] in which we have to find the number of ranges among those n ranges that lie completely in the given range i.e. [L,R]
Example:
n ranges are:
here n is 2.
2 4
3 3
for query 3 5, output should be 1.
for query 2 5 output should be 2.
I know method for O(m*n), where n is number of ranges and m is number of queries, but feels like there must be a more efficient implementation.

Yes, there is. The data structure that you want is called an interval tree.

For the following solution each query has complexity O(log(n)) but it requires to store (n^2/2) ranges and preprocessing requires to sort the ranges (complexity O(n*n log(n)) using quicksort):
Preprocessing:
Sort the n ranges by L_i
For each range [L_i,R_i]:
a) find the subset S_i of all ranges [L_j,R_j] for which L_j >= L_i is true.
b) sort S_i by R_j
Query [L,R]:
Find the S_i with the smallest L_i so that L_i >=L using binary search (complexity O(log(n)). S_i contains all candidate ranges.
Find in S_i the largest R_j so that R_j <= R again using binary search. The index of this entry corresponds to the number of ranges which satisfy your condition.
Lets get back to your example:
S_1: [3 3] [2 4]
S_2: [3 3]
query 3 5 ends up in S_2. Apparently, there is one entry satisfying also the second condition
query 2 5 ends up in S_2. The second entry is the largest one satisfying the R condition and thus the result is 2

Related

How to find length of largest contiguous segment which forms an arithmetic progression in range [L R] of a given array of length N?

Consider Array 1 2 3 5 5
for query [L R D]=[1 5 1], output is 3
for query [1 1 1], output is 1
Also there are Q queries to this questions where 0<Q<10^6 so Brute Forces is not working !
Note: indexing starts at 1
Note2: D represents common difference of AP
Your question can be simplified to find the length longest arithmetic progression with the given common difference, with the array = [L, R] and sizen = R - L + 1
You can then find to solution from Geeksforgeeks
Naive Approach: For each element calculate the length of the longest AP it could form and print the maximum among them. It involves
O(n^2) time complexity.
An efficient approach is to use Hashing.
Create a map where the key is the starting element of an AP and its
value is the number of elements in that AP. The idea is to update the
value at key ‘a'(which is at index i and whose starting element is not
yet there in the map) by 1 whenever the element at index j(>i) could
be in the AP of ‘a'(as the starting element). Then we print the
maximum value among all values in the map.

Preprocess-Query to find number of pairs containing a number X

Formally we are given N pairs of rational numbers . We want to somehow preprocess on this data so as to answer queries like "Find number of pairs which contain a given rational number X" .
By ' a pair contains X' i mean [2,5] contains 3 & so on.
At worst , expected time for each query should be O(log N) or O(sqrt(N)) (or anything similair better than O(N)) & preprocessing should be at worst O(N^2) .
My approach:
I tried sorting pairs , first by first number & break ties by second number [First nos in pair < Second nos in pair]. Then applying a lower_bound form of binary search reduces the search space but now i can't apply another Binary search in this search space since pairs are sorted first by first nos. so after reducing search space i have to linearly check . This is again having worst case O(N) per query.
First you should try to make the ranges disjoint. For example ranges [1 5],[2 6],[3 7] will result in disjoint ranges of [1 2],[2 3],[3 5],[5 6],[6 7] and for each range you should save in how many original ranges it was present. Like this
1-------5 // original ranges
2------6
3------7
1-2, 2-3, 3-5, 5-6, 6-7 // disjoint ranges
1 2 3 2 1 // number of presence of each range in original ranges
You can do this by a sweep line algorithm in O(NlogN). After that You can use the method you described by sorting the ranges by its start and then for each query finding the lower_bound of Xi and printing the presence count of that range. For example in this case if the query is 4 you can find the range 3-5 by a binary search and then the result is 3 because the presence of range 3-5 is equal to 3.

Closest equal numbers

Suppose you have a1..an numbers and some queries [l, k] (1 < l, k < n). The problem is to find in [l, k] interval minimum distance between two equal numbers.
Examples: (interval l,k shown as |...|)
1 2 2 |1 0 1| 2 3 0 1 2 3
Answer 2 (101)
1 |2 2| 1 0 1 2 3 0 1 2 3
Answer 1 (22)
1 2 2 1 0 |1 2 3 0 3 2 3|
Answer 2 (303) or (323)
I have thought about segment tree, but it is hard to join results from each tree node, when query is shared between several nodes. I have tried some ways to join them, but it looks ugly. Can somebody give me a hint?
Clarification
Thanks for your answers.
The problem is that there are a lot of queries, so o(n) is not good. I do not accidentally mentioned a segment tree. It performs [l, r] query for finding [l, r]SUM or [l, r]MIN in array with log(n) complexity. Can we do some preprocessing to fit in o(logn) here?
Call an interval minimal if its first number equals its last but each of the numbers in between appears exactly once in the interval. 11 and 101 are minimal, but 12021 and 10101 are not.
In linear time (assuming constant-time hashing), enumerate all of the minimal intervals. This can be done by keeping two indices, l and k, and a hash map that maps each symbol in between l and k to its index. Initially, l = 1 and k = 0. Repeatedly do the following. Increment k (if it's too large, we stop). If the symbol at the new value of k is in the map, then advance l to the map value, deleting stuff from the map as we go. Yield the interval [l, k] and increment l once more. In all cases, write k as the map value of the symbol.
Because of minimality, the minimal intervals are ordered the same way by their left and right endpoints. To answer a query, we look up the first interval that it could contain and the last and then issue a range-minimum query of the lengths of the range of intervals. The result is, in theory, an online algorithm that does linear-time preprocessing and answers queries in constant time, though for convenience you may not implement it that way.
We can do it in O(nlog(n)) with a sort. First, mark all the elements in [l,k] with their original indices. Then, sort the elements in [l,k], first based on value, and second based on original index, both ascending.
Then you can loop over the sorted list, keeping a currentValue variable, and checking adjacent values that are the same for distance and setting minDistance if necessary. currentValue is updated when you reach a new value in the sorted list.
Suppose we have this [l,k] range from your second example:
1 2 3 0 3 2 3
We can mark them as
1(1) 2(2) 3(3) 0(4) 3(5) 2(6) 3(7)
and sort them as
0(4) 1(1) 2(2) 2(6) 3(3) 3(5) 3(7)
Looping over this, there are no ranges for 0 and 1. The minimum distance for 2s is 4, and the minimum distance for 3s is 2 ([3,5] or [3,7], depending on if you reset minDistance when the new minimum distance is equal to the current minimum distance).
Thus we get
[3,5] in [l,k] or [5,7] in [l,k]
EDIT
Since you mention some queries, you can preprocess the list in O(nlog(n)) time, and then only use O(n) time for each individual query. You would just ignore indices that are not in [l,k] while looping over the sorted list.
EDIT 2
This is addressing the clarification in the question, which now states that there will always be lots of queries to run. We can preprocess in O(n^2) time using dynamic programming and then run each query in O(1) time.
First, perform the preprocessing on the entire list that I described above. Then form links in O(n) time from the original list into the sorted list.
We can imagine that:
[l,k] = min([l+1,k], [l,k-1], /*some other sequence starting at l or ending at k*/)
We have one base case
[l,k] = infinity where l = k
If [l,k] is not min([l+1,k], [l,k-1]), then it either starts at l or ends at k. We can take each of these, look into the sorted list and look at the adjacent element in the correct direction and check the distances (making sure we're in bounds). We only have to check 2 elements, so it is a constant factor.
Using this algorithm, we can run the following
for l = n downto 1
for k = l to n
M[l,k] = min(M[l+1,k], M[l,k-1], sequence starting at l, sequence ending at k)
You can also store the solutions in the matrix (which is actually a pyramid). Then, when you are given a query [l,k], you just look it up in the matrix.

Sum of product of distinct numbers and their count

Given an array of N elements where N is up to 200000. Array elements are at max 100000. Now we are providing Q queries of form [a b]. For each query we need to tell the sum of:
((Count of each distinct number in range a to b)^2)*(Value of that distinct number)
Example Let N=8 and array be [1 1 2 2 1 3 1 1], and let Q=1. That means just one query. Let a=2 and b=7, then the answer is 20
Explanation :
occurrence of 1-> 3
occurrence of 2-> 2
occurrence of 3-> 1
cost=3*3*1 + 2*2*2 + 1*1*3= 20
Now if there were less queries than it would not have been so difficult question But Q can be up to 200000. So what must be best suited data structure for this problem ?
Here is offline an O((n + q) * sqrt(n)) solution:
Let's divide the given array into sqrt(n) consecutive blocks with sqrt(n) elements each.
Let's divide all queries based on the number a block which contains their left border.
Now we will answer queries from each group individually:
Inside one group, we should sort the queries by their right border(in increasing order).
Let's iterate over all queries from this group in the sorted order and maintain the following invariant: all numbers that lie inside a block covered by this query except, maybe, the first and the last blocks, are already processed. We can maintain it by processing the next block when we need it.
Given this invariant, we can get the answer to this query by looking only at numbers in the first and the last block(that contain borders of this query). There are at most O(sqrt(n)) such numbers, so we can simply iterate over them.
Clarification: we maintain an array count of size MAX_VALUE, where count[i] is the number of occurrences of i among the processed numbers and curSum - the sum of the target function for the processed numbers. We can add or remove one number in O(1): increment or decrement count[i] and adjust curSum. The number was processed means that it has been taken into account in the count array and the curSum variable.
Time complexity: for each group, we traverse the array from left to right at once to process the number in inner blocks. It takes an O(n * sqrt(n)) times. Each query gives additional O(sqrt(n)) time for processing numbers in the first and the last block for this query. Thus, the total time complexity is O((n + q) * sqrt(n)).

Generate a random integer from 0 to N-1 which is not in the list

You are given N and an int K[].
The task at hand is to generate a equal probabilistic random number between 0 to N-1 which doesn't exist in K.
N is strictly a integer >= 0.
And K.length is < N-1. And 0 <= K[i] <= N-1. Also assume K is sorted and each element of K is unique.
You are given a function uniformRand(int M) which generates uniform random number in the range 0 to M-1 And assume this functions's complexity is O(1).
Example:
N = 7
K = {0, 1, 5}
the function should return any random number { 2, 3, 4, 6 } with equal
probability.
I could get a O(N) solution for this : First generate a random number between 0 to N - K.length. And map the thus generated random number to a number not in K. The second step will take the complexity to O(N). Can it be done better in may be O(log N) ?
You can use the fact that all the numbers in K[] are between 0 and N-1 and they are distinct.
For your example case, you generate a random number from 0 to 3. Say you get a random number r. Now you conduct binary search on the array K[].
Initialize i = K.length/2.
Find K[i] - i. This will give you the number of numbers missing from the array in the range 0 to i.
For example K[2] = 5. So 3 elements are missing from K[0] to K[2] (2,3,4)
Hence you can decide whether you have to conduct the remaining search in the first part of array K or the next part. This is because you know r.
This search will give you a complexity of log(K.length)
EDIT: For example,
N = 7
K = {0, 1, 4} // modified the array to clarify the algorithm steps.
the function should return any random number { 2, 3, 5, 6 } with equal probability.
Random number generated between 0 and N-K.length = random{0-3}. Say we get 3. Hence we require the 4th missing number in array K.
Conduct binary search on array K[].
Initial i = K.length/2 = 1.
Now we see K[1] - 1 = 0. Hence no number is missing upto i = 1. Hence we search on the latter part of the array.
Now i = 2. K[2] - 2 = 4 - 2 = 2. Hence there are 2 missing numbers up to index i = 2. But we need the 4th missing element. So we again have to search in the latter part of the array.
Now we reach an empty array. What should we do now? If we reach an empty array between say K[j] & K[j+1] then it simply means that all elements between K[j] and K[j+1] are missing from the array K.
Hence all elements above K[2] are missing from the array, namely 5 and 6. We need the 4th element out of which we have already discarded 2 elements. Hence we will choose the second element which is 6.
Binary search.
The basic algorithm:
(not quite the same as the other answer - the number is only generated at the end)
Start in the middle of K.
By looking at the current value and it's index, we can determine the number of pickable numbers (numbers not in K) to the left.
Similarly, by including N, we can determine the number of pickable numbers to the right.
Now randomly go either left or right, weighted based on the count of pickable numbers on each side.
Repeat in the chosen subarray until the subarray is empty.
Then generate a random number in the range consisting of the numbers before and after the subarray in the array.
The running time would be O(log |K|), and, since |K| < N-1, O(log N).
The exact mathematics for number counts and weights can be derived from the example below.
Extension with K containing a bigger range:
Now let's say (for enrichment purposes) K can also contain values N or larger.
Then, instead of starting with the entire K, we start with a subarray up to position min(N, |K|), and start in the middle of that.
It's easy to see that the N-th position in K (if one exists) will be >= N, so this chosen range includes any possible number we can generate.
From here, we need to do a binary search for N (which would give us a point where all values to the left are < N, even if N could not be found) (the above algorithm doesn't deal with K containing values greater than N).
Then we just run the algorithm as above with the subarray ending at the last value < N.
The running time would be O(log N), or, more specifically, O(log min(N, |K|)).
Example:
N = 10
K = {0, 1, 4, 5, 8}
So we start in the middle - 4.
Given that we're at index 2, we know there are 2 elements to the left, and the value is 4, so there are 4 - 2 = 2 pickable values to the left.
Similarly, there are 10 - (4+1) - 2 = 3 pickable values to the right.
So now we go left with probability 2/(2+3) and right with probability 3/(2+3).
Let's say we went right, and our next middle value is 5.
We are at the first position in this subarray, and the previous value is 4, so we have 5 - (4+1) = 0 pickable values to the left.
And there are 10 - (5+1) - 1 = 3 pickable values to the right.
We can't go left (0 probability). If we go right, our next middle value would be 8.
There would be 2 pickable values to the left, and 1 to the right.
If we go left, we'd have an empty subarray.
So then we'd generate a number between 5 and 8, which would be 6 or 7 with equal probability.
This can be solved by basically solving this:
Find the rth smallest number not in the given array, K, subject to
conditions in the question.
For that consider the implicit array D, defined by
D[i] = K[i] - i for 0 <= i < L, where L is length of K
We also set D[-1] = 0 and D[L] = N
We also define K[-1] = 0.
Note, we don't actually need to construct D. Also note that D is sorted (and all elements non-negative), as the numbers in K[] are unique and increasing.
Now we make the following claim:
CLAIM: To find the rth smallest number not in K[], we need to find right most occurrence of r' in D (which occurs at position defined by j), where r' is the largest number in D, which is < r. Such an r' exists, because D[-1] = 0. Once we find such an r' (and j), the number we are looking for is r-r' + K[j].
Proof: Basically the definition of r' and j tells us that there are exactlyr' numbers missing from 0 to K[j], and more than r numbers missing from 0 to K[j+1]. Thus all the numbers from K[j]+1 to K[j+1]-1 are missing (and these missing are at least r-r' in number), and the number we seek is among them, given by K[j] + r-r'.
Algorithm:
In order to find (r',j) all we need to do is a (modified) binary search for r in D, where we keep moving to the left even if we find r in the array.
This is an O(log K) algorithm.
If you are running this many times, it probably pays to speed up your generation operation: O(log N) time just isn't acceptable.
Make an empty array G. Starting at zero, count upwards while progressing through the values of K. If a value isn't in K add it to G. If it is in K don't add it and progress your K pointer. (This relies on K being sorted.)
Now you have an array G which has only acceptable numbers.
Use your random number generator to choose a value from G.
This requires O(N) preparatory work and each generation happens in O(1) time. After N look-ups the amortized time of all operations is O(1).
A Python mock-up:
import random
class PRNG:
def __init__(self, K,N):
self.G = []
kptr = 0
for i in range(N):
if kptr<len(K) and K[kptr]==i:
kptr+=1
else:
self.G.append(i)
def getRand(self):
rn = random.randint(0,len(self.G)-1)
return self.G[rn]
prng=PRNG( [0,1,5], 7)
for i in range(20):
print prng.getRand()

Resources