I'm having troubles with the following problem
Given N x S grid and m segments parallel to the horizontal axis (all of them are tuples (x', x'', y)), answer Q online queries of form (x', x''). The answer to such a query is the smallest y (if there is one) such that we can place a segment (x', x'', y). All segments are non-overlapping yet beginning of one segment can be the ending of another i.e. segments (x', x'', y) and (x'', x''', y) are allowed. Being able to place a segment means there could exist a segment (x', x'', y) that wouldn't violate stated rules, segment isn't actually placed(board with initial segments isn't modified) but we only state there could be one.
Constraints
1 ≤ Q, S, m ≤ 10^5
1 ≤ N ≤ 10^9
Time: 1s
Memory: 256 Mb
Here is an example from the link below. Input segments are (2, 5, 1), (1, 2, 2), (4, 5, 2), (2, 3, 3), (3, 4, 3).
Answer for queries
1) (1, 2) → 1
2) (2, 3) → 2
3) (3, 4) → 2
4) (4, 5) → 3
5) (1, 3) → can't place a segment
A visualized answer for the third query (blue segment):
I don't quite understand how to approach the problem. It is supposed to be solved with persistent segment tree, but I am still unable to come up with something.
Could you help me please.
This is not my homework. The source problem can be found here http://informatics.mccme.ru/mod/statements/view3.php?chapterid=111614 . There's no English version of the statement avaliable and the test case presents input data in a different way, so don't mind the souce.
Here is an O(N log N) time solution.
Preliminaries (a good tutorial available here): segment tree, persistent segment tree.
Part 0. Original problem statement
I briefly describe the original problem statement as later I'm going to speak in its terms rather than in terms of abstract segments.
There is a train with S seats (S <= 10^5). It is known that seat s_i is occupied from time l_i to time r_i (no more than 10^5 such constraints, or passengers). Then we have to answer 10^5 queries of kind "find the lowest number of a seat with is free from time l_i to time r_i or say if there is none". All queries must be answered online, that is, you have to answer the previous query before seeing the next.
Throughout the text I denote with N both the number of seats, the number of passengers, and the number of queries, assuming they are the same order of magnitude. You can do more accurate analysis if needed.
Part 1. Answering a single query
Let's answer a query [L, R] assuming that there are no occupied places after time R. For each seat we maintain the last time when it is occupied. Call it last(S). Now the answer for the query is minimum S such that last(S) <= L. If we build a segment tree on seats then we'll be able to answer this query in O(log^2 N) time: binary search the value of S, check if range minimum on segment [0, S] is at most L.
However, it might be not enough to get Accepted. We need O(log N). Recall that each node of a segment tree stores minimum in corresponding range. We start at the root. If the minimum there is >= L then there is no available seat for such query. Otherwise either minimum in the left child or in the right child is <= L (or in both). In the first case we descend to the left child, in the second – to the right, and repeat until we reach a leaf. This leaf will correspond to the minimum seat with last(S) <= L.
Part 2. Solving the problem
We maintain a persistent tree on seats, storing last(S) for each seat (same as in the previous part). Let's process initial passengers one by one sorted by their left endpoint in increasing order. For a passenger (s_i, l_i, r_i) we update the segment tree at position s_i with value r_i. The tree is persistent, so we store the new copy somewhere.
To answer a query [L, R], find a latest version of the segment tree such that the update happened before R. If you do a binary search on versions, it takes O(log N) time.
In the version of the segment tree only passengers with their left endpoint < R are considered (even more, exactly such passengers are). So we can use the algorithm from the Part 1 to answer the query using this segment tree.
Statement :
Input : list<x',x'',Y>
Query Input : (X',X'')
Output : Ymin
Constraints :
1 ≤ Q, S, m ≤ 10^5
1 ≤ N ≤ 10^9
Time: 1s
Memory: 256 Mb
Answer:
Data structure method you can use :
1. Brute force : Directly iterate through the list and perform the check.
2. Sort : sort the list on Y [lowest to highest] and then iterate through it.
Note : Sorting large list will be time consuming.
Sort on Y
Ymin = -1 //Maintain Ymin
for i[0 : len(input)] : //Iterate through tuples
if Ymin != -1 && Y(i-1) != Yi : return Ymin // end case
else if x' > X'' : Ymin = Yi //its on right of query tuple
else if x'<X' && (x' + length <= X') : Ymin = Yi //on left of query tuple
else next
3. Hashmap : Map<Y, list< tuple<x',length> > > to store list of lines for each Y and iterate through them to get least Y.
Note : will take additional time for Map build.
Iterate through list and build a Map
Iterate through Map keys :
Iterate through list of tuples, for each tuple :
if x' > X'': Continue //its on right of query tuple
else if x'<X' && (x' + length <= X') : return Y //on left of query tuple
else next Y
4. Matrix : you can build matrix with 1 for occupied point and 0 for empty.
Note : will take additional time for Matrix build and iteration through matrix is time consuming so not useful.
Example :
0 1 1 1 0 0
1 1 0 1 0 0
0 1 1 1 1 0
Related
I came across an interesting problem and I can't solve it in a good complexity (better than O(qn)):
There are n persons in a row. Initially every person in this row has some value - lets say that i-th person has value a_i. These values are pairwise distinct.
Every person gets a mark. There are two conditions:
If a_i < a_j then j-th person cant get worse mark than i-th person.
If i < j then j-th person can't get worse mark than i-th person (this condition tells us that sequence of marks is non-decreasing sequence).
There are q operations. In every operation two person are swapped (they swap their values).
After each operation you have tell what is maximal number of diffrent marks that these n persons can get.
Do you have any idea?
Consider any two groups, J and I (j < i and a_j < a_i for all j and i). In any swap scenario, a_i is the new max for J and a_j is the new min for I, and J gets extended to the right at least up to and including i.
Now if there was any group of is to the right of i whos values were all greater than the values in the left segment of I up to i, this group would not have been part of I, but rather its own group or part of another group denoting a higher mark.
So this kind of swap would reduce the mark count by the count of groups between J and I and merge groups J up to I.
Now consider an in-group swap. The only time a mark would be added is if a_i and a_j (j < i), are the minimum and maximum respectively of two adjacent segments, leading to the group splitting into those two segments. Banana123 showed in a comment below that this condition is not sufficient (e.g., 3,6,4,5,1,2 => 3,1,4,5,6,2). We can address this by also checking before the switch that the second smallest i is greater than the second largest j.
Banana123 also showed in a comment below that more than one mark could be added in this instance, for example 6,2,3,4,5,1. We can handle this by keeping in a segment tree a record of min,max and number of groups, which correspond with a count of sequential maxes.
Example 1:
(1,6,1) // (min, max, group_count)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 2 and 5. Updates happen in log(n) along the intervals containing 2 and 5.
To add group counts in a larger interval the left group's max must be lower than the right group's min. But if it's not, as in the second example, we must check one level down in the tree.
(1,6,1)
(2,6,1) (1,5,1)
(6,6,1) (2,3,2) (4,4,1) (1,5,1)
6 2 3 4 5 1
Swap 1 and 6:
(1,6,6)
(1,3,3) (4,6,3)
(1,1,1) (2,3,2) (4,4,1) (5,6,2)
1 2 3 4 5 6
Example 2:
(1,6,1)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 1 and 6. On the right side, we have two groups where the left group's max is greater than the right group's min, (4,4,1) (2,6,2). To get an accurate mark count, we go down a level and move 2 into 4's group to arrive at a count of two marks. A similar examination is then done in the level before the top.
(1,6,3)
(1,5,2) (2,6,2)
(1,1,1) (3,5,1) (4,4,1) (2,6,2)
1 5 3 4 2 6
Here's an O(n log n) solution:
If n = 0 or n = 1, then there are n distinct marks.
Otherwise, consider the two "halves" of the list, LEFT = [1, n/2] and RIGHT = [n/2 + 1, n]. (If the list has an odd number of elements, the middle element can go in either half, it doesn't matter.)
Find the greatest value in LEFT — call it aLEFT_MAX — and the least value in the second half — call it aRIGHT_MIN.
If aLEFT_MAX < aRIGHT_MIN, then there's no need for any marks to overlap between the two, so you can just recurse into each half and return the sum of the two results.
Otherwise, we know that there's some segment, extending at least from LEFT_MAX to RIGHT_MIN, where all elements have to have the same mark.
To find the leftmost extent of this segment, we can scan leftward from RIGHT_MIN down to 1, keeping track of the minimum value we've seen so far and the position of the leftmost element we've found to be greater than some further-rightward value. (This can actually be optimized a bit more, but I don't think we can improve the algorithmic complexity by doing so, so I won't worry about that.) And, conversely to find the rightmost extent of the segment.
Suppose the segment in question extends from LEFTMOST to RIGHTMOST. Then we just need to recursively compute the number of distinct marks in [1, LEFTMOST) and in (RIGHTMOST, n], and return the sum of the two results plus 1.
I wasn't able to get a complete solution, but here are a few ideas about what can and can't be done.
First: it's impossible to find the number of marks in O(log n) from the array alone - otherwise you could use your algorithm to check if the array is sorted faster than O(n), and that's clearly impossible.
General idea: spend O(n log n) to create any additional data which would let you to compute number of marks in O(log n) time and said data can be updated after a swap in O(log n) time. One possibly useful piece to include is the current number of marks (i.e. finding how number of marks changed may be easier than to compute what it is).
Since update time is O(log n), you can't afford to store anything mark-related (such as "the last person with the same mark") for each person - otherwise taking an array 1 2 3 ... n and repeatedly swapping first and last element would require you to update this additional data for every element in the array.
Geometric interpretation: taking your sequence 4 1 3 2 5 7 6 8 as an example, we can draw points (i, a_i):
|8
+---+-
|7 |
| 6|
+-+---+
|5|
-------+-+
4 |
3 |
2|
1 |
In other words, you need to cover all points by a maximal number of squares. Corollary: exchanging points from different squares a and b reduces total number of squares by |a-b|.
Index squares approach: let n = 2^k (otherwise you can add less than n fictional persons who will never participate in exchanges), let 0 <= a_i < n. We can create O(n log n) objects - "index squares" - which are "responsible" for points (i, a_i) : a*2^b <= i < (a+1)*2^b or a*2^b <= a_i < (a+1)*2^b (on our plane, this would look like a cross with center on the diagonal line a_i=i). Every swap affects only O(log n) index squares.
The problem is, I can't find what information to store for each index square so that it would allow to find number of marks fast enough? all I have is a feeling that such approach may be effective.
Hope this helps.
Let's normalize the problem first, so that a_i is in the range of 0 to n-1 (can be achieved in O(n*logn) by sorting a, but just hast to be done once so we are fine).
function normalize(a) {
let b = [];
for (let i = 0; i < a.length; i++)
b[i] = [i, a[i]];
b.sort(function(x, y) {
return x[1] < y[1] ? -1 : 1;
});
for (let i = 0; i < a.length; i++)
a[b[i][0]] = i;
return a;
}
To get the maximal number of marks we can count how many times
i + 1 == mex(a[0..i]) , i integer element [0, n-1]
a[0..1] denotes the sub-array of all the values from index 0 to i.
mex() is the minimal exclusive, which is the smallest value missing in the sequence 0, 1, 2, 3, ...
This allows us to solve a single instance of the problem (ignoring the swaps for the moment) in O(n), e.g. by using the following algorithm:
// assuming values are normalized to be element [0,n-1]
function maxMarks(a) {
let visited = new Array(a.length + 1);
let smallestMissing = 0, marks = 0;
for (let i = 0; i < a.length; i++) {
visited[a[i]] = true;
if (a[i] == smallestMissing) {
smallestMissing++;
while (visited[smallestMissing])
smallestMissing++;
if (i + 1 == smallestMissing)
marks++;
}
}
return marks;
}
If we swap the values at indices x and y (x < y) then the mex for all values i < x and i > y doesn't change, although it is an optimization, unfortunately that doesn't improve complexity and it is still O(qn).
We can observe that the hits (where mark is increased) are always at the beginning of an increasing sequence and all matches within the same sequence have to be a[i] == i, except for the first one, but couldn't derive an algorithm from it yet:
0 6 2 3 4 5 1 7
*--|-------|*-*
3 0 2 1 4 6 5 7
-|---|*-*--|*-*
Input: S = {p1, . . . , pn}, n points on 2D plane each point is given by its x and y-coordinate.
For simplicity, we assume:
The origin (0, 0) is NOT in S.
Any line L passing through (0, 0) contains at most one point in S.
No three points in S lie on the same line.
If we pick any three points from S, we can form a triangle. So the total number of triangles that can be formed this way is Θ(n^3).
Some of these triangles contain (0, 0), some do not.
Problem: Calculate the number of triangles that contain (0, 0).
You may assume we have an O(1) time function Test(pi, pj , pk) that, given three points pi, pj , pk in S, returns 1, if the triangle formed by {pi, pj , pk} contains (0, 0), and returns 0 otherwise. It’s trivial to solve the problem in Θ(n^3) time (just enumerate and test all triangles).
Describe an algorithm for solving this problem with O(n log n) run time.
My analysis of the above problem leads to the following conclusion
There are 4 coordinates ( + ,+ ) , ( + ,- ) , ( -, - ), ( -, + ) { x and y coordinate > 0 or not }.
Let
s1 = coordinate x < 0 and y > 0
s2 = x > 0 , y > 0
s3 = x < 0 , y < 0
s4 = x > 0 , y < 0
Now we need to do the testing of points in between sets of the following combinations only
S1 S2 S3
S1 S1 S4
S2 S2 S3
S3 S3 S2
S1 S4 S4
S1 S3 S4
S1 S2 S4
S2 S3 S4
I now need to test the points in the above combination of sets only ( e.g. one point from s1 , one point from s2 and one point from s3 < first combinaton > ) and see the points contain (0,0) by calling Test function ( which is assumed as constant time function here) .
Can someone guide me on this ?
Image added below for clarification on why only some subsets (s1,s2 , s4 ) can contain (0,0) and some ( s1,s1,s3) cannot.
I'm guessing we're in the same class (based on the strange wording of the question), so now that the due date is past, I feel alright giving out my solution. I managed to find the n log n algorithm, which, as the question stated, is more a matter of cleverly transforming the problem, and less of a Dynamic Programming / DaC solution.
Note: This is not an exhaustive proof, I leave that to you.
First, some visual observations. Take some triangle that obviously contains the origin.
Then, convert the points to vectors.
Convince yourself that any selection of three points, one from each vector, describes a triangle that also contains the origin.
It also follows that, if you perform the above steps on a triangle that doesn't enclose the origin, any combination of points along those vectors will also not contain the origin.
The main point to get from this is, the magnitude of the vector does not matter, only the direction. Additionally, a hint to the question says that "any line crossing (0,0) only contains one point in S", from which we can extrapolate that the direction of each vector is unique.
So, if only the angle matters, it would follow that there is some logic that determines what range of points, given two points, could possibly form a triangle that encloses the origin. For simplicity, we'll assume we've taken all the points in S and converted them to vectors, then normalized them, effectively making all points lie on the unit circle.
So, take two points along this circle.
Then, draw a line from each point through the origin and to the opposite side of the circle.
It follows that, given the two points, any point that lies along the red arc can form a triangle.
So our algorithm should do the following:
Take each point in S. Make a secondary array A, and for each point, add the angle along the unit circle (atan2(x,y)) to A (0 ≤ Ai ≤ 2π). Let's assume this is O(n)
Sort A by increasing. O(n log n), assuming we use Merge Sort.
Count the number of triangles possible for each pair (Ai,Aj). This means that we count the number of Ai + π ≤ Ak ≤ Aj + π. Since the array is sorted, we can use a Binary Search to find the indices of Ai + π and Aj + π, which is O(2 log n) = O(log n)
However, we run into a problem, there are n^2 points, and if we have to do an O(log n) search for each, we have O(n^2 log n). So, we need to make one more observation.
Given some Ai < Aj, we'll say Tij describes the number of triangles possible, as calculated by the above method. Then, given a third Ak > Aj, we know that Tij ≤ Tik, as the number of points between Ai + π and Ak + π must be at least as many as there are betwen Ai + π and Aj + π. In fact, it is exactly the count between Ai + π and Aj + π, plus the count between Aj + π and Ak + π. Since we already know the count between Ai + π and Aj + π, we don't need to recalculate it - we only need to calculate the number between Aj + π and Ak + π, then add the previous count. It follows that:
A(n) = count(A(n),A(n-1)) + count(A(n-1),A(n-2)) + ... + count(A(1),A(0))
And this means we don't need to check all n^2 pairs, we only need to check consecutive pairs - so, only n-1.
So, all the above can give us the following psuedocode solution.
int triangleCount(point P[],int n)
int A[n], C[n], totalCount = 0;
for(i=0...n)
A[i] = atan2(P[i].x,P[i].y);
mergeSort(A);
int midPoint = binarySearch(A,π);
for(i=0...midPoint-1)
int left = A[i] + π, right = A[i+1] + π;
C[i] = binarySearch(a,right) - binarySearch(a,left);
for(j=0...i)
totalCount += C[j]
return totalCount;
It seems that in the worst case there are Θ(n3) triangles containing the origin, and since you need them all, the answer is no, there is no better algorithm.
For a worst case consider a regular polygon of an odd degree n, centered at the origin.
Here is an outline of the calculations. A chord connecting two vertices which are k < n/2 vertices apart is a base for Θ(k) triangles. Fix a vertex; its contribution is a sum over all chords coming from it, yielding Θ(n2), and a total (a contribution of all n vertices) is Θ(n3) (each triangle is counted 3 times, which doesn't affect the asymptotic).
Suppose you have a1..an numbers and some queries [l, k] (1 < l, k < n). The problem is to find in [l, k] interval minimum distance between two equal numbers.
Examples: (interval l,k shown as |...|)
1 2 2 |1 0 1| 2 3 0 1 2 3
Answer 2 (101)
1 |2 2| 1 0 1 2 3 0 1 2 3
Answer 1 (22)
1 2 2 1 0 |1 2 3 0 3 2 3|
Answer 2 (303) or (323)
I have thought about segment tree, but it is hard to join results from each tree node, when query is shared between several nodes. I have tried some ways to join them, but it looks ugly. Can somebody give me a hint?
Clarification
Thanks for your answers.
The problem is that there are a lot of queries, so o(n) is not good. I do not accidentally mentioned a segment tree. It performs [l, r] query for finding [l, r]SUM or [l, r]MIN in array with log(n) complexity. Can we do some preprocessing to fit in o(logn) here?
Call an interval minimal if its first number equals its last but each of the numbers in between appears exactly once in the interval. 11 and 101 are minimal, but 12021 and 10101 are not.
In linear time (assuming constant-time hashing), enumerate all of the minimal intervals. This can be done by keeping two indices, l and k, and a hash map that maps each symbol in between l and k to its index. Initially, l = 1 and k = 0. Repeatedly do the following. Increment k (if it's too large, we stop). If the symbol at the new value of k is in the map, then advance l to the map value, deleting stuff from the map as we go. Yield the interval [l, k] and increment l once more. In all cases, write k as the map value of the symbol.
Because of minimality, the minimal intervals are ordered the same way by their left and right endpoints. To answer a query, we look up the first interval that it could contain and the last and then issue a range-minimum query of the lengths of the range of intervals. The result is, in theory, an online algorithm that does linear-time preprocessing and answers queries in constant time, though for convenience you may not implement it that way.
We can do it in O(nlog(n)) with a sort. First, mark all the elements in [l,k] with their original indices. Then, sort the elements in [l,k], first based on value, and second based on original index, both ascending.
Then you can loop over the sorted list, keeping a currentValue variable, and checking adjacent values that are the same for distance and setting minDistance if necessary. currentValue is updated when you reach a new value in the sorted list.
Suppose we have this [l,k] range from your second example:
1 2 3 0 3 2 3
We can mark them as
1(1) 2(2) 3(3) 0(4) 3(5) 2(6) 3(7)
and sort them as
0(4) 1(1) 2(2) 2(6) 3(3) 3(5) 3(7)
Looping over this, there are no ranges for 0 and 1. The minimum distance for 2s is 4, and the minimum distance for 3s is 2 ([3,5] or [3,7], depending on if you reset minDistance when the new minimum distance is equal to the current minimum distance).
Thus we get
[3,5] in [l,k] or [5,7] in [l,k]
EDIT
Since you mention some queries, you can preprocess the list in O(nlog(n)) time, and then only use O(n) time for each individual query. You would just ignore indices that are not in [l,k] while looping over the sorted list.
EDIT 2
This is addressing the clarification in the question, which now states that there will always be lots of queries to run. We can preprocess in O(n^2) time using dynamic programming and then run each query in O(1) time.
First, perform the preprocessing on the entire list that I described above. Then form links in O(n) time from the original list into the sorted list.
We can imagine that:
[l,k] = min([l+1,k], [l,k-1], /*some other sequence starting at l or ending at k*/)
We have one base case
[l,k] = infinity where l = k
If [l,k] is not min([l+1,k], [l,k-1]), then it either starts at l or ends at k. We can take each of these, look into the sorted list and look at the adjacent element in the correct direction and check the distances (making sure we're in bounds). We only have to check 2 elements, so it is a constant factor.
Using this algorithm, we can run the following
for l = n downto 1
for k = l to n
M[l,k] = min(M[l+1,k], M[l,k-1], sequence starting at l, sequence ending at k)
You can also store the solutions in the matrix (which is actually a pyramid). Then, when you are given a query [l,k], you just look it up in the matrix.
You are given N and an int K[].
The task at hand is to generate a equal probabilistic random number between 0 to N-1 which doesn't exist in K.
N is strictly a integer >= 0.
And K.length is < N-1. And 0 <= K[i] <= N-1. Also assume K is sorted and each element of K is unique.
You are given a function uniformRand(int M) which generates uniform random number in the range 0 to M-1 And assume this functions's complexity is O(1).
Example:
N = 7
K = {0, 1, 5}
the function should return any random number { 2, 3, 4, 6 } with equal
probability.
I could get a O(N) solution for this : First generate a random number between 0 to N - K.length. And map the thus generated random number to a number not in K. The second step will take the complexity to O(N). Can it be done better in may be O(log N) ?
You can use the fact that all the numbers in K[] are between 0 and N-1 and they are distinct.
For your example case, you generate a random number from 0 to 3. Say you get a random number r. Now you conduct binary search on the array K[].
Initialize i = K.length/2.
Find K[i] - i. This will give you the number of numbers missing from the array in the range 0 to i.
For example K[2] = 5. So 3 elements are missing from K[0] to K[2] (2,3,4)
Hence you can decide whether you have to conduct the remaining search in the first part of array K or the next part. This is because you know r.
This search will give you a complexity of log(K.length)
EDIT: For example,
N = 7
K = {0, 1, 4} // modified the array to clarify the algorithm steps.
the function should return any random number { 2, 3, 5, 6 } with equal probability.
Random number generated between 0 and N-K.length = random{0-3}. Say we get 3. Hence we require the 4th missing number in array K.
Conduct binary search on array K[].
Initial i = K.length/2 = 1.
Now we see K[1] - 1 = 0. Hence no number is missing upto i = 1. Hence we search on the latter part of the array.
Now i = 2. K[2] - 2 = 4 - 2 = 2. Hence there are 2 missing numbers up to index i = 2. But we need the 4th missing element. So we again have to search in the latter part of the array.
Now we reach an empty array. What should we do now? If we reach an empty array between say K[j] & K[j+1] then it simply means that all elements between K[j] and K[j+1] are missing from the array K.
Hence all elements above K[2] are missing from the array, namely 5 and 6. We need the 4th element out of which we have already discarded 2 elements. Hence we will choose the second element which is 6.
Binary search.
The basic algorithm:
(not quite the same as the other answer - the number is only generated at the end)
Start in the middle of K.
By looking at the current value and it's index, we can determine the number of pickable numbers (numbers not in K) to the left.
Similarly, by including N, we can determine the number of pickable numbers to the right.
Now randomly go either left or right, weighted based on the count of pickable numbers on each side.
Repeat in the chosen subarray until the subarray is empty.
Then generate a random number in the range consisting of the numbers before and after the subarray in the array.
The running time would be O(log |K|), and, since |K| < N-1, O(log N).
The exact mathematics for number counts and weights can be derived from the example below.
Extension with K containing a bigger range:
Now let's say (for enrichment purposes) K can also contain values N or larger.
Then, instead of starting with the entire K, we start with a subarray up to position min(N, |K|), and start in the middle of that.
It's easy to see that the N-th position in K (if one exists) will be >= N, so this chosen range includes any possible number we can generate.
From here, we need to do a binary search for N (which would give us a point where all values to the left are < N, even if N could not be found) (the above algorithm doesn't deal with K containing values greater than N).
Then we just run the algorithm as above with the subarray ending at the last value < N.
The running time would be O(log N), or, more specifically, O(log min(N, |K|)).
Example:
N = 10
K = {0, 1, 4, 5, 8}
So we start in the middle - 4.
Given that we're at index 2, we know there are 2 elements to the left, and the value is 4, so there are 4 - 2 = 2 pickable values to the left.
Similarly, there are 10 - (4+1) - 2 = 3 pickable values to the right.
So now we go left with probability 2/(2+3) and right with probability 3/(2+3).
Let's say we went right, and our next middle value is 5.
We are at the first position in this subarray, and the previous value is 4, so we have 5 - (4+1) = 0 pickable values to the left.
And there are 10 - (5+1) - 1 = 3 pickable values to the right.
We can't go left (0 probability). If we go right, our next middle value would be 8.
There would be 2 pickable values to the left, and 1 to the right.
If we go left, we'd have an empty subarray.
So then we'd generate a number between 5 and 8, which would be 6 or 7 with equal probability.
This can be solved by basically solving this:
Find the rth smallest number not in the given array, K, subject to
conditions in the question.
For that consider the implicit array D, defined by
D[i] = K[i] - i for 0 <= i < L, where L is length of K
We also set D[-1] = 0 and D[L] = N
We also define K[-1] = 0.
Note, we don't actually need to construct D. Also note that D is sorted (and all elements non-negative), as the numbers in K[] are unique and increasing.
Now we make the following claim:
CLAIM: To find the rth smallest number not in K[], we need to find right most occurrence of r' in D (which occurs at position defined by j), where r' is the largest number in D, which is < r. Such an r' exists, because D[-1] = 0. Once we find such an r' (and j), the number we are looking for is r-r' + K[j].
Proof: Basically the definition of r' and j tells us that there are exactlyr' numbers missing from 0 to K[j], and more than r numbers missing from 0 to K[j+1]. Thus all the numbers from K[j]+1 to K[j+1]-1 are missing (and these missing are at least r-r' in number), and the number we seek is among them, given by K[j] + r-r'.
Algorithm:
In order to find (r',j) all we need to do is a (modified) binary search for r in D, where we keep moving to the left even if we find r in the array.
This is an O(log K) algorithm.
If you are running this many times, it probably pays to speed up your generation operation: O(log N) time just isn't acceptable.
Make an empty array G. Starting at zero, count upwards while progressing through the values of K. If a value isn't in K add it to G. If it is in K don't add it and progress your K pointer. (This relies on K being sorted.)
Now you have an array G which has only acceptable numbers.
Use your random number generator to choose a value from G.
This requires O(N) preparatory work and each generation happens in O(1) time. After N look-ups the amortized time of all operations is O(1).
A Python mock-up:
import random
class PRNG:
def __init__(self, K,N):
self.G = []
kptr = 0
for i in range(N):
if kptr<len(K) and K[kptr]==i:
kptr+=1
else:
self.G.append(i)
def getRand(self):
rn = random.randint(0,len(self.G)-1)
return self.G[rn]
prng=PRNG( [0,1,5], 7)
for i in range(20):
print prng.getRand()
Suppose that we wish to keep track of a point of maximum overlap in a set of intervals—a point that has the largest number of intervals in the database overlapping it.
a. Show that there will always be a point of maximum overlap which is an endpoint of one of the segments.
b. Design a data structure that efficiently supports the operations INTERVAL-INSERT, INTERVAL-DELETE, and FIND-POM, which returns a point of maximum overlap. (Hint: Keep a red-black tree of all the endpoints. Associate a value of +1 with each left endpoint, and associate a value of -1 with each right endpoint. Augment each node of the tree with some extra information to maintain the point of maximum overlap.)
this problem is in the book introduction to algorithm. But I have no idea how to solve the second question. if a greater mind has an elegant solution, please share your idea with me! Thanks.
quote:http://ripcrixalis.blog.com/2011/02/08/clrs-chapter-14/
Keep a RB-tree of all the endpoints.
We insert endpoints one by one as a sweep line scaning from left to right. With each left endpoint e, associate a value p[e] = +1 (increasing the overlap by 1). With each right endpoint e associate a value p[e] = −1 (decreasing the overlap by 1).
When multiple endpoints have the same value, insert all the left endpoints with that value before inserting any of the right endpoints with that value.
Here is some intuition. Let e1, e2, . . . , en be the sorted sequence of endpoints corresponding to our intervals. Let s(i, j) denote the sum p[ei] + p[ei+1] + · · · + p[ej] for 1 ≤ i ≤ j ≤ n.
We wish to find an i maximizing s(1, i ).
Each node x stores three new attributes.
We store v[x] = s(l[x], r [x]), the sum of the values of all nodes in x’s subtree.
We also store m[x], the maximum value obtained by the expression s(l[x], i) for any i.
We store o[x] as the value of i for which m[x] achieves its maximum. For the sentinel, we define v[nil[T]] = m[nil[T]] = 0.
We can compute these attributes in a bottom-up fashion so as to satisfy the requirements of Theorem 14.1:
v[x] = v[left[x]] + p[x] + v[right[x]] ,
m[x] = max{
m[left[x]] (max is in x’s left subtree),
v[left[x]] + p[x] (max is at x),
v[left[x]] + p[x] + m[right[x]] (max is in x’s right subtree). }
Once we understand how to compute m[x], it is straightforward to compute o[x] from the information in x and its two children.
FIND-POM: return the interval whose endpoint is represented by o[root[T]].
Because of how we have deÞned the new attributes, Theorem 14.1 says that each operation runs in O(lg n) time. In fact, FIND-POM takes only O(1) time.