How to calculate the lexicographical rank of a given permutation - algorithm

For example there are 6 chairs in the room and there are 4 girls and 2 boys. There is 15 unique possible ways they can sit on this chairs 6!/(4!*2!)=15.
My problem is to find efficient way to calculate position of possibility they choose to sit. By position I mean following:
BBGGGG - possible position #1
BGBGGG - possible position #2
BGGBGG - possible position #3
BGGGBG - possible position #4
BGGGGB - possible position #5
GBBGGG - possible position #6
GBGBGG - possible position #7
GBGGBG - possible position #8
GBGGGB - possible position #9
GGBBGG - possible position #10
GGBGBG - possible position #11
GGBGGB - possible position #12
GGGBBG - possible position #13
GGGBGB - possible position #14
GGGGBB - possible position #15
For example they choose position GBBGGG... For now my solution to calculate number of this position (#6) is to loop all possible positions and compare each of them to selected order and return current position number if they are equal.
In this range from example above, it's not a big deal to loop in 15 possible combinations, but if you increase range of chairs and people, this method is way far from efficient.
Is there any formula or more efficient way I can use to determinate position of selected possibility? Feel free to use any programming language in your examples.
UPDATE: I know exactly how many chairs, boys and girls are in the room. Only problem is to find position number of possibility they choose to sit.
Sorting I'm using in my example is for better readability only. Answers with any type of sorting are welcome.

Finding the rank of a permutation by position of G's
The permutations in the example are in lexicographical order; the first permutation has all the B's on the left and the G's on the right; the other permutations are made by gradually moving G's to the left. (Similar to a rising sequence of binary numbers: 0011, 0101, 0110, 1001, 1010, 1100)
To count how far into this process a given permutation is, look at the characters one by one from left to right: whenever you encounter a G, the number of permutations needed to move it there is (N choose K) where N is the number of positions to the right of the current position, and K is the number of G's left, including the current G.
123456 ← positions
BBGGGG ← rank 0 (or 1)
BGBGGG ← rank 1 (or 2)
BGGBGG ← rank 2 (or 3)
BGGGBG ← rank 3 (or 4)
BGGGGB ← rank 4 (or 5)
GBBGGG ← rank 5 (or 6)
GBGBGG ← rank 6 (or 7)
GBGGBG ← rank 7 (or 8)
E.g. for GBGGBG in your example, there are 4 G's in 6 possible positions, and the first G is at position 1, so we count (6-1 choose 4) = 5; the second G is at position 3, so we add (6-3 choose 3) = 1; the third G is at position 4, so we add (6-4 choose 2) = 1; the last G is at position 6, so it's in its original position and can be ignored. This adds up to 7, which means the permutation has rank 7 (or 8 if you start counting from 1, like you do in the question).
Calculating (N choose K) with Pascal's Triangle
You can use e.g. Pascal's Triangle to calculate (N choose K). This is a triangular array where each number is the sum of the two numbers above it:
K=0 K=1 K=2 K=3 K=4 K=5 K=6
N=0 1
N=1 1 1
N=2 1 2 1
N=3 1 3 3 1
N=4 1 4 6 4 1
N=5 1 5 10 10 5 1
N=6 1 6 15 20 15 6 1
Code example
Below is a simple Javascript implementation. Run the code snippet to see a few examples. The execution time is linear to the number of chairs, not to the number of possible permutations, which could be huge. (update: the code now iterates over the characters from right-to-left, so that it doesn't have to count the number of G's first.)
function permutationRank(perm) {
var chairs = perm.length, girls = 0, rank = 1; // count permutations from 1
var triangle = PascalsTriangle(chairs - 1); // triangle[n][k] = (n choose k)
for (var i = 1; i <= chairs; i++) {
if (perm.charAt(chairs - i) == 'G' && ++girls < i) {
rank += triangle[i - 1][girls];
}
}
return rank;
function PascalsTriangle(size) {
var tri = [[1]];
for (var n = 1; n <= size; n++) {
tri[n] = [1];
for (var k = 1; k < n; k++) {
tri[n][k] = tri[n - 1][k - 1] + tri[n - 1][k];
}
tri[n][n] = 1;
}
return tri;
}
}
document.write(permutationRank("BBGGGG") + "<BR>");
document.write(permutationRank("GBGGBG") + "<BR>");
document.write(permutationRank("GGGGBB") + "<BR>");
document.write(permutationRank("GGBGBBGBBBGBBBBGGGGGBBBBBGGGGBGGGBGGBGBB"));
Inverse algorithm: generate permutation
This algorithm will do the inverse: given the number of B's, the number of G's, and the rank of the permutation, it will return the permutation. Again, this is done without having to generate all the permutations. (note: I have not included any checking of the validity of the input)
function permutationGenerator(boys, girls, rank) {
var chairs = boys + girls, perm = "";
var triangle = PascalsTriangle(chairs - 1); // triangle[n][k] = (n choose k)
for (var i = chairs; i > 0; i--) {
if (i > girls) {
var choose = triangle[i - 1][girls];
if (rank > choose) { // > if counting from 1, >= if counting from 0
rank -= choose;
perm += 'G';
--girls;
}
else perm += 'B';
}
else perm += 'G'; // only girls left
}
return perm;
function PascalsTriangle(size) {
var tri = [[1]];
for (var n = 1; n <= size; n++) {
tri[n] = [1];
for (var k = 1; k < n; k++) {
tri[n][k] = tri[n - 1][k - 1] + tri[n - 1][k];
}
tri[n][n] = 1;
}
return tri;
}
}
document.write(permutationGenerator(2, 4, 1) + "<BR>");
document.write(permutationGenerator(2, 4, 8) + "<BR>");
document.write(permutationGenerator(2, 4, 15) + "<BR>");
document.write(permutationGenerator(20, 20, 114581417274));

My problem is to find efficient way to calculate position of possibility they choose to sit. Answers with any type of sorting are welcome. Is there any formula or more efficient way I can use to determinate position of selected possibility?
I will pick the mapping of configuration to binary: B is 1 and G is 0.
For 7 boys and 3 girls there are 10!/(7! 3!) = 120 combinations, here are some positions of combinations:
GGGBBBBBBB <--> 0001111111
BBGBBGBBGB <--> 1101101101
BBBBBBBGGG <--> 1111111000
You can convert to decimal if you need to, but in any case it's a 1 to 1 mapping which allows you to determine the position almost immediately.

Branch and bound (BB or B&B) is an algorithm design paradigm for discrete and combinatorial optimization problems, as well as general real valued problems. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm explores branches of this tree, which represent subsets of the solution set. Before enumerating the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution, and is discarded if it cannot produce a better solution than the best one found so far by the algorithm.
The essence of the branch-and-bound approach is the following observation: in the total enumeration tree, at any node, if I can show that the optimal solution cannot occur in any of its descendents, then there is no need for me to consider those descendent nodes. Hence, I can "prune" the tree at that node. If I can prune enough branches of the tree in this way, I may be able to reduce it to a computationally manageable size. Note that, I am not ignoring those solutions in the leaves of the branches that I have pruned, I have left them out of consideration after I have made sure that the optimal solution cannot be at any one of these nodes. Thus, the branch-and-bound approach is not a heuristic, or approximating, procedure, but it is an exact, optimizing procedure that finds an optimal solution.

Here is an O(n) efficient algorithm. No pascals triangle - it computes the combinations on the fly.
I have tested against large values, generating the combinations and matching the ranks, yet if you find an example it does not work, let me know.
http://dev-task.blogspot.com/2015/12/rank-of-n-bit-numbers-with-exactly-k.html

I would recommend you use a binary search tree. Every time you add a chair each side of the tree will be cloned and the new choice of either B or G will be the only difference. Basically, you clone what you have and then add B or G to each entry on the side.
EDIT : Note that this can be used for a LogN search of the positioning as well.

Related

Number of different marks

I came across an interesting problem and I can't solve it in a good complexity (better than O(qn)):
There are n persons in a row. Initially every person in this row has some value - lets say that i-th person has value a_i. These values are pairwise distinct.
Every person gets a mark. There are two conditions:
If a_i < a_j then j-th person cant get worse mark than i-th person.
If i < j then j-th person can't get worse mark than i-th person (this condition tells us that sequence of marks is non-decreasing sequence).
There are q operations. In every operation two person are swapped (they swap their values).
After each operation you have tell what is maximal number of diffrent marks that these n persons can get.
Do you have any idea?
Consider any two groups, J and I (j < i and a_j < a_i for all j and i). In any swap scenario, a_i is the new max for J and a_j is the new min for I, and J gets extended to the right at least up to and including i.
Now if there was any group of is to the right of i whos values were all greater than the values in the left segment of I up to i, this group would not have been part of I, but rather its own group or part of another group denoting a higher mark.
So this kind of swap would reduce the mark count by the count of groups between J and I and merge groups J up to I.
Now consider an in-group swap. The only time a mark would be added is if a_i and a_j (j < i), are the minimum and maximum respectively of two adjacent segments, leading to the group splitting into those two segments. Banana123 showed in a comment below that this condition is not sufficient (e.g., 3,6,4,5,1,2 => 3,1,4,5,6,2). We can address this by also checking before the switch that the second smallest i is greater than the second largest j.
Banana123 also showed in a comment below that more than one mark could be added in this instance, for example 6,2,3,4,5,1. We can handle this by keeping in a segment tree a record of min,max and number of groups, which correspond with a count of sequential maxes.
Example 1:
(1,6,1) // (min, max, group_count)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 2 and 5. Updates happen in log(n) along the intervals containing 2 and 5.
To add group counts in a larger interval the left group's max must be lower than the right group's min. But if it's not, as in the second example, we must check one level down in the tree.
(1,6,1)
(2,6,1) (1,5,1)
(6,6,1) (2,3,2) (4,4,1) (1,5,1)
6 2 3 4 5 1
Swap 1 and 6:
(1,6,6)
(1,3,3) (4,6,3)
(1,1,1) (2,3,2) (4,4,1) (5,6,2)
1 2 3 4 5 6
Example 2:
(1,6,1)
(3,6,1) (1,4,1)
(6,6,1) (3,5,1) (4,4,1) (1,2,1)
6 5 3 4 2 1
Swap 1 and 6. On the right side, we have two groups where the left group's max is greater than the right group's min, (4,4,1) (2,6,2). To get an accurate mark count, we go down a level and move 2 into 4's group to arrive at a count of two marks. A similar examination is then done in the level before the top.
(1,6,3)
(1,5,2) (2,6,2)
(1,1,1) (3,5,1) (4,4,1) (2,6,2)
1 5 3 4 2 6
Here's an O(n log n) solution:
If n = 0 or n = 1, then there are n distinct marks.
Otherwise, consider the two "halves" of the list, LEFT = [1, n/2] and RIGHT = [n/2 + 1, n]. (If the list has an odd number of elements, the middle element can go in either half, it doesn't matter.)
Find the greatest value in LEFT — call it aLEFT_MAX — and the least value in the second half — call it aRIGHT_MIN.
If aLEFT_MAX < aRIGHT_MIN, then there's no need for any marks to overlap between the two, so you can just recurse into each half and return the sum of the two results.
Otherwise, we know that there's some segment, extending at least from LEFT_MAX to RIGHT_MIN, where all elements have to have the same mark.
To find the leftmost extent of this segment, we can scan leftward from RIGHT_MIN down to 1, keeping track of the minimum value we've seen so far and the position of the leftmost element we've found to be greater than some further-rightward value. (This can actually be optimized a bit more, but I don't think we can improve the algorithmic complexity by doing so, so I won't worry about that.) And, conversely to find the rightmost extent of the segment.
Suppose the segment in question extends from LEFTMOST to RIGHTMOST. Then we just need to recursively compute the number of distinct marks in [1, LEFTMOST) and in (RIGHTMOST, n], and return the sum of the two results plus 1.
I wasn't able to get a complete solution, but here are a few ideas about what can and can't be done.
First: it's impossible to find the number of marks in O(log n) from the array alone - otherwise you could use your algorithm to check if the array is sorted faster than O(n), and that's clearly impossible.
General idea: spend O(n log n) to create any additional data which would let you to compute number of marks in O(log n) time and said data can be updated after a swap in O(log n) time. One possibly useful piece to include is the current number of marks (i.e. finding how number of marks changed may be easier than to compute what it is).
Since update time is O(log n), you can't afford to store anything mark-related (such as "the last person with the same mark") for each person - otherwise taking an array 1 2 3 ... n and repeatedly swapping first and last element would require you to update this additional data for every element in the array.
Geometric interpretation: taking your sequence 4 1 3 2 5 7 6 8 as an example, we can draw points (i, a_i):
|8
+---+-
|7 |
| 6|
+-+---+
|5|
-------+-+
4 |
3 |
2|
1 |
In other words, you need to cover all points by a maximal number of squares. Corollary: exchanging points from different squares a and b reduces total number of squares by |a-b|.
Index squares approach: let n = 2^k (otherwise you can add less than n fictional persons who will never participate in exchanges), let 0 <= a_i < n. We can create O(n log n) objects - "index squares" - which are "responsible" for points (i, a_i) : a*2^b <= i < (a+1)*2^b or a*2^b <= a_i < (a+1)*2^b (on our plane, this would look like a cross with center on the diagonal line a_i=i). Every swap affects only O(log n) index squares.
The problem is, I can't find what information to store for each index square so that it would allow to find number of marks fast enough? all I have is a feeling that such approach may be effective.
Hope this helps.
Let's normalize the problem first, so that a_i is in the range of 0 to n-1 (can be achieved in O(n*logn) by sorting a, but just hast to be done once so we are fine).
function normalize(a) {
let b = [];
for (let i = 0; i < a.length; i++)
b[i] = [i, a[i]];
b.sort(function(x, y) {
return x[1] < y[1] ? -1 : 1;
});
for (let i = 0; i < a.length; i++)
a[b[i][0]] = i;
return a;
}
To get the maximal number of marks we can count how many times
i + 1 == mex(a[0..i]) , i integer element [0, n-1]
a[0..1] denotes the sub-array of all the values from index 0 to i.
mex() is the minimal exclusive, which is the smallest value missing in the sequence 0, 1, 2, 3, ...
This allows us to solve a single instance of the problem (ignoring the swaps for the moment) in O(n), e.g. by using the following algorithm:
// assuming values are normalized to be element [0,n-1]
function maxMarks(a) {
let visited = new Array(a.length + 1);
let smallestMissing = 0, marks = 0;
for (let i = 0; i < a.length; i++) {
visited[a[i]] = true;
if (a[i] == smallestMissing) {
smallestMissing++;
while (visited[smallestMissing])
smallestMissing++;
if (i + 1 == smallestMissing)
marks++;
}
}
return marks;
}
If we swap the values at indices x and y (x < y) then the mex for all values i < x and i > y doesn't change, although it is an optimization, unfortunately that doesn't improve complexity and it is still O(qn).
We can observe that the hits (where mark is increased) are always at the beginning of an increasing sequence and all matches within the same sequence have to be a[i] == i, except for the first one, but couldn't derive an algorithm from it yet:
0 6 2 3 4 5 1 7
*--|-------|*-*
3 0 2 1 4 6 5 7
-|---|*-*--|*-*

Pyramids dynamic programming

I encountered this question in an interview and could not figure it out. I believe it has a dynamic programming solution but it eludes me.
Given a number of bricks, output the total number of 2d pyramids possible, where a pyramid is defined as any structure where a row of bricks has strictly less bricks than the row below it. You do not have to use all the bricks.
A brick is simply a square, the number of bricks in a row is the only important bit of information.
Really stuck with this one, I thought it would be easy to solve each problem 1...n iteratively and sum. But coming up with the number of pyramids possible with exactly i bricks is evading me.
example, n = 6
X
XX
X
XX XXX
X
XXX XXXX
XX X
XXX XXXX XXXXX
X
XX XX X
XXX XXXX XXXXX XXXXXX
So the answer is 13 possible pyramids from 6 bricks.
edit
I am positive this is a dynamic programming problem, because it makes sense to (once you've determined the first row) simply look to the index in your memorized array of your remainder of bricks to see how many pyramids fit atop.
It also makes sense to consider bottom rows of width at least n/2 because we can't have more bricks atop than on the bottom row EXCEPT and this is where I lose it and my mind falls apart, in certain (few cases) you can I.e. N = 10
X
XX
XXX
XXXX
Now the bottom row has 4 but there are 6 left to place on top
But with n = 11 we cannot have a bottom row with less than n/2 bricks. There is another wierd inconsistency like that with n = 4 where we cannot have a bottom row of n/2 = 2 bricks.
Let's choose a suitable definition:
f(n, m) = # pyramids out of n bricks with base of size < m
The answer you are looking for now is (given that N is your input number of bricks):
f(N, N+1) - 1
Let's break that down:
The first N is obvious: that's your number of bricks.
Your bottom row will contain at most N bricks (because that's all you have), so N+1 is a sufficient lower bound.
Finally, the - 1 is there because technically the empty pyramid is also a pyramid (and will thus be counted) but you exclude that from your solutions.
The base cases are simple:
f(n, 0) = 1 for any n >= 0
f(0, m) = 1 for any m >= 0
In both cases, it's the empty pyramid that we are counting here.
Now, all we need still is a recursive formula for the general case.
Let's assume we are given n and m and choose to have i bricks on the bottom layer. What can we place on top of this layer? A smaller pyramid, for which we have n - i bricks left and whose base has size < i. This is exactly f(n - i, i).
What is the range for i? We can choose an empty row so i >= 0. Obviously, i <= n because we have only n bricks. But also, i <= m - 1, by definition of m.
This leads to the recursive expression:
f(n, m) = sum f(n - i, i) for 0 <= i <= min(n, m - 1)
You can compute f recursively, but using dynamic programming it will be faster of course. Storing the results matrix is straightforward though, so I leave that up to you.
Coming back to the original claim that f(N, N+1)-1 is the answer you are looking for, it doesn't really matter which value to choose for m as long as it is > N. Based on the recursive formula it's easy to show that f(N, N + 1) = f(N, N + k) for every k >= 1:
f(N, N + k) = sum f(N - i, i) for 0 <= i <= min(N, N + k - 1)
= sum f(N - i, i) for 0 <= i <= N
= sum f(N - i, i) for 0 <= i <= min(N, N + 1 - 1)
In how many ways can you build a pyramid of width n? By putting any pyramid of width n-1 or less anywhere atop the layer of n bricks. So if p(n) is the number of pyramids of width n, then p(n) = sum [m=1 to n-1] (p(m) * c(n, m)), where c(n, m) is the number of ways you can place a layer of width m atop a layer of width n (I trust that you can work that one out yourself).
This, however, doesn't place a limitation on the number of bricks. Generally, in DP, any resource limitation must be modeled as a separate dimension. So your problem is now p(n, b): "How many pyramids can you build of width n with a total of b bricks"? In the recursive formula, for each possible way of building a smaller pyramid atop your current one, you need to refer to the correct amount of remaining bricks. I leave it as a challenge for you to work out the recursive formula; let me know if you need any hints.
You can think of your recursion as: given x bricks left where you used n bricks on last row, how many pyramids can you build. Now you can fill up rows from either top to bottom row or bottom to top row. I will explain the former case.
Here the recursion might look something like this (left is number of bricks left and last is number of bricks used on last row)
f(left,last)=sum (1+f(left-i,i)) for i in range [last+1,left] inclusive.
Since when you use i bricks on current row you will have left-i bricks left and i will be number of bricks used on this row.
Code:
int calc(int left, int last) {
int total=0;
if(left<=0) return 0; // terminal case, no pyramid with no brick
for(int i=last+1; i<=left; i++) {
total+=1+calc(left-i,i);
}
return total;
}
I will leave it to you to implement memoized or bottom-up dp version. Also you may want to start from bottom row and fill up upper rows in pyramid.
Since we are asked to count pyramids of any cardinality less than or equal to n, we may consider each cardinality in turn (pyramids of 1 element, 2 elements, 3...etc.) and sum them up. But in how many different ways can we compose a pyramid from k elements? The same number as the count of distinct partitions of k (for example, for k = 6, we can have (6), (1,5), (2,4), and (1,2,3)). A generating function/recurrence for the count of distinct partitions is described in Wikipedia and a sequence at OEIS.
Recurrence, based on the Pentagonal number Theorem:
q(k) = ak + q(k − 1) + q(k − 2) − q(k − 5) − q(k − 7) + q(k − 12) + q(k − 15) − q(k − 22)...
where ak is (−1)^(abs(m)) if k = 3*m^2 − m for some integer m and is 0 otherwise.
(The subtracted coefficients are generalized pentagonal numbers.)
Since the recurrence described in Wikipedia obliges the calculation of all preceding q(n)'s to arrive at a larger q(n), we can simply sum the results along the way to obtain our result.
JavaScript code:
function numPyramids(n){
var distinctPartitions = [1,1],
pentagonals = {},
m = _m = 1,
pentagonal_m = 2,
result = 1;
while (pentagonal_m / 2 <= n){
pentagonals[pentagonal_m] = Math.abs(_m);
m++;
_m = m % 2 == 0 ? -m / 2 : Math.ceil(m / 2);
pentagonal_m = _m * (3 * _m - 1);
}
for (var k=2; k<=n; k++){
distinctPartitions[k] = pentagonals[k] ? Math.pow(-1,pentagonals[k]) : 0;
var cs = [1,1,-1,-1],
c = 0;
for (var i in pentagonals){
if (i / 2 > k)
break;
distinctPartitions[k] += cs[c]*distinctPartitions[k - i / 2];
c = c == 3 ? 0 : c + 1;
}
result += distinctPartitions[k];
}
return result;
}
console.log(numPyramids(6)); // 13

nth smallest element in a union of an array of intervals with repetition

I want to know if there is a more efficient solution than what I came up with(not coded it yet but described the gist of it at the bottom).
Write a function calcNthSmallest(n, intervals) which takes as input a non-negative int n, and a list of intervals [[a_1; b_1]; : : : ; [a_m; b_m]] and calculates the nth smallest number (0-indexed) when taking the union of all the intervals with repetition. For example, if the intervals were [1; 5]; [2; 4]; [7; 9], their union with repetition would be [1; 2; 2; 3; 3; 4; 4; 5; 7; 8; 9] (note 2; 3; 4 each appear twice since they're in both the intervals [1; 5] and [2; 4]). For this list of intervals, the 0th smallest number would be 1, and the 3rd and 4th smallest would both be 3. Your implementation should run quickly even when the a_i; b_i can be very large (like, one trillion), and there are several intervals
The way I thought to go about it is the straightforward solution which is to make the union array and traverse it.
This problem can be solved in O(N log N) where N is the number of intervals in the list, regardless of the actual values of the interval endpoints.
The key to solving this problem efficiently is to transform the list of possibly-overlapping intervals into a list of intervals which are either disjoint or identical. In the given example, only the first interval needs to be split:
{ [1,5], [2,4], [7,9]} =>
+-----------------+ +---+ +---+
{[1,1], [2,4], [5,5], [2,4], [7,9]}
(This doesn't have to be done explicitly, though: see below.) Now, we can sort the new intervals, replacing duplicates with a count. From that, we can compute the number of values each (possibly-duplicated) interval represents. Now, we simply need to accumulate the values to figure out which interval the solution lies in:
interval count size values cumulative
in interval values
[1,1] 1 1 1 [0, 1)
[2,4] 2 3 6 [1, 7) (eg. from n=1 to n=6 will be here)
[5,5] 1 1 1 [7, 8)
[7,9] 1 3 3 [8, 11)
I wrote the cumulative values as a list of half-open intervals, but obviously we only need the end-points. We can then find which interval holds value n by, for example, binary-searching the cumulative values list, and we can figure out which value in the interval we want by subtracting the start of the interval from n and then integer-dividing by the count.
It should be clear that the maximum size of the above table is twice the number of original intervals, because every row must start and end at either the start or end of some interval in the original list. If we'd written the intervals as half-open instead of closed, this would be even clearer; in that case, we can assert that the precise size of the table will be the number of unique values in the collection of end-points. And from that insight, we can see that we don't really need the table at all; we just need the sorted list of end-points (although we need to know which endpoint each value represents). We can simply iterate through that list, maintaining the count of the number of active intervals, until we reach the value we're looking for.
Here's a quick python implementation. It could be improved.
def combineIntervals(intervals):
# endpoints will map each endpoint to a count
endpoints = {}
# These two lists represent the start and (1+end) of each interval
# Each start adds 1 to the count, and each limit subtracts 1
for start in (i[0] for i in intervals):
endpoints[start] = endpoints.setdefault(start, 0) + 1
for limit in (i[1]+1 for i in intervals):
endpoints[limit] = endpoints.setdefault(limit, 0) - 1
# Filtering is a possibly premature optimization but it was easy
return sorted(filter(lambda kv: kv[1] != 0,
endpoints.iteritems()))
def nthSmallestInIntervalList(n, intervals):
limits = combineIntervals(intervals)
cumulative = 0
count = 0
index = 0
here = limits[0][0]
while index < len(limits):
size = limits[index][0] - here
if n < cumulative + count * size:
# [here, next) contains the value we're searching for
return here + (n - cumulative) / count
# advance
cumulative += count * size
count += limits[index][1]
here += size
index += 1
# We didn't find it. We could throw an error
So, as I said, the running time of this algorithm is independent of the actual values of the intervals; it only depends in the length of the interval list. This particular solution is O(N log N) because of the cost of the sort (in combineIntervals); if we used a priority queue instead of a full sort, we could construct the heap in O(N) but making the scan O(log N) for each scanned endpoint. Unless N is really big and the expected value of the argument n is relatively small, this would be counter-productive. There might be other ways to reduce complexity, though.
Edit2:
Here's yet another take on your question.
Let's consider the intervals graphically:
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
[-------]
[-----------------]
[---------]
[--------------]
[-----]
When sorted in increasing order on the lower bound, we could get something that looks like the above for the interval list ([2;10];[4;24];[7;17];[13;30];[20;27]). Each lower bound indicates the start of a new interval, and would also marks the beginning of one more "level" of duplication of the numbers. Conversely, upper bounds mark the end of that level, and decrease the duplication level of one.
We could therefore convert the above into the following list:
[2;+];[4;+];[7;+][10;-];[13;+];[17;-][20;+];[24;-];[27;-];[30;-]
Where the first value indicates the rank of the bound, and the second value whether the bound is lower (+) or upper (-). The computation of the nth element is done by following the list, raising or lowering the duplication level when encountering an lower or upper bound, and using the duplication level as a counting factor.
Let's consider again the list graphically, but as an histogram:
3333 44444 5555
2222222333333344444555
111111111222222222222444444
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
The view above is the same as the first one, with all the intervals packed vertically.
1 being the elements of the 1st one, 2 the second one, etc. In fact, what matters here
is the height at each index, corresponding of the number of time each index is duplicated in the union of all intervals.
3333 55555 7777
2223333445555567777888
112223333445555567777888999
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
| | | | | | || | |
We can see that histogram blocks start at lower bounds of intervals, and end either on upper bounds, or one unit before lower bounds, so the new notation must be modified accordingly.
With a list containing n intervals, as a first step, we convert the list into the notation above (O(n)), and sort it in increasing bound order (O(nlog(n))). The second step of computing the number is then in O(n), for a total average time in O(nlog(n)).
Here's a simple implementation in OCaml, using 1 and -1 instead of '+' and '-'.
(* transform the list in the correct notation *)
let rec convert = function
[] -> []
| (l,u)::xs -> (l,1)::(u+1,-1)::convert xs;;
(* the counting function *)
let rec count r f = function
[] -> raise Not_found
| [a,x] -> (match f + x with
0 -> if r = 0 then a else raise Not_found
| _ -> a + (r / f))
| (a,x)::(b,y)::l ->
if a = b
then count r f ((b,x+y)::l)
else
let f = f + x in
if f > 0 then
let range = (b - a) * f in
if range > r
then a + (r / f)
else count (r - range) f ((b,y)::l)
else count r f ((b,y)::l);;
(* the compute function *)
let compute l =
let compare (x,_) (y,_) = compare x y in
let l = List.sort compare (convert l) in
fun m -> count m 0 l;;
Notes:
- the function above will raise an exception if the sought number is above the intervals. This corner case isn't taken in account by the other methods below.
- the list sorting function used in OCaml is merge sort, which effectively performs in O(nlog(n)).
Edit:
Seeing that you might have very large intervals, the solution I gave initially (see down below) is far from optimal.
Instead, we could make things much faster by transforming the list:
we try to compress the interval list by searching for overlapping ones and replace them by prefixing intervals, several times the overlapping one, and suffixing intervals. We can then directly compute the number of entries covered by each element of the list.
Looking at the splitting above (prefix, infix, suffix), we see that the optimal structure to do the processing is a binary tree. A node of that tree may optionally have a prefix and a suffix. So the node must contain :
an interval i in the node
an integer giving the number of repetition of i in the list,
a left subtree of all the intervals below i
a right subtree of all the intervals above i
with this structure in place, the tree is automatically sorted.
Here's an example of an ocaml type embodying that tree.
type tree = Empty | Node of int * interval * tree * tree
Now the transformation algorithm boils down to building the tree.
This function create a tree out of its component:
let cons k r lt rt =
the tree made of count k, interval r, left tree lt and right tree rt
This function recursively insert an interval in a tree.
let rec insert i it =
let r = root of it
let lt = the left subtree of it
let rt = the right subtree of it
let k = the count of r
let prf, inf, suf = the prefix, infix and suffix of i according to r
return cons (k+1) inf (insert prf lt) (insert suf rt)
Once the tree is built, we do a pre-order traversal of the tree, using the count of the node to accelerate the computation of the nth element.
Below is my previous answer.
Here are the steps of my solution:
you need to sort the interval list in increasing order on the lower bound of each interval
you need a deque dq (or a list which will be reversed at some point) to store the intervals
here's the code:
let lower i = lower bound of interval i
let upper i = upper bound of i
let il = sort of interval list
i <- 0
j <- lower (head of il)
loop on il:
i <- i + 1
let h = the head of il
let il = the tail of il
if upper h > j then push h to dq
if lower h > j then
il <- concat dq and il
j <- j + 1
dq <- empty
loop
if i = k then return j
loop
This algorithm works by simply iterating through the intervals, only taking in account the relevant intervals, and counting both the rank i of the element in the union, and the value j of that element. When the targeted rank k has been reached, the value is returned.
The complexity is roughly in O(k) + O(sort(l)).
if i have understood your question correctly, you want to find the kth largest element in union of list of intervals.
If we assume that no of list = 2 the question is :
Find the kth smallest element in union of two sorted arrays (where an interval [2,5] is nothing but elements from 2 to 5 {2,3,4,5}) this sollution can be solved in (n+m)log(n+m) time where (n and m are sizes of list) . where i and j are list iterators .
Maintaining the invariant
i + j = k – 1,
If Bj-1 < Ai < Bj, then Ai must be the k-th smallest,
or else if Ai-1 < Bj < Ai, then Bj must be the k-th smallest.
For details click here
Now the problem is if you have no of lists=3 lists then
Maintaining the invariant
i + j+ x = k – 1,
i + j=k-x-1
The value k-x-1 can take y (size of third list, because x iterates from start point of list to end point) .
problem of 3 lists size can be reduced to y*(problem of size 2 list). So complexity is `y*((n+m)log(n+m))`
If Bj-1 < Ai < Bj, then Ai must be the k-th smallest,
or else if Ai-1 < Bj < Ai, then Bj must be the k-th smallest.
So for problem of size n list the complexity is NP .
But yes we can do minor improvement if we know that k< sizeof(some lists) we can chop the elements starting from k+1th element to end(from our search space ) in those list whose size is bigger than k (i think it doesnt help for large k).If there is any mistake please let me know.
Let me explain with an example:
Assume we are given these intervals [5,12],[3,9],[8,13].
The union of these intervals is:
number : 3 4 5 5 6 6 7 7 8 8 8 9 9 9 10 10 11 11 12 12 13.
indices: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
The lowest will return 11 when 9 is passed an input.
The highest will return 14 when 9 is passed an input.
Lowest and highest function just check whether the x is present in that interval, if it is present then adds x-a(lower index of interval) to return value for that one particular interval. If an interval is completely smaller than x, then adds total number of elements in that interval to the return value.
The find function will return 9 when 13 is passed.
The find function will use the concept of binary search to find the kth smallest element. In the given range [0,N] (if range is not given we can find high range in O(n)) find the mid and calculate the lowest and highest for mid. If given k falls in between lowest and highest return mid else if k is less than or equal to lowest search in the lower half(0,mid-1) else search in the upper half(mid+1,high).
If the number of intervals are n and the range is N, then the running time of this algorithm is n*log(N). we will find lowest and highest (which runs in O(n)) log(N) times.
//Function call will be `find(0,N,k,in)`
//Retrieves the no.of smaller elements than first x(excluding) in union
public static int lowest(List<List<Integer>> in, int x){
int sum = 0;
for(List<Integer> lst: in){
if(x > lst.get(1))
sum += lst.get(1) - lst.get(0)+1;
else if((x >= lst.get(0) && x<lst.get(1)) || (x > lst.get(0) && x<=lst.get(1))){
sum += x - lst.get(0);
}
}
return sum;
}
//Retrieve the no.of smaller elements than last x(including) in union.
public static int highest(List<List<Integer>> in, int x){
int sum = 0;
for(List<Integer> lst: in){
if(x > lst.get(1))
sum += lst.get(1) - lst.get(0)+1;
else if((x >= lst.get(0) && x<lst.get(1)) || (x > lst.get(0) && x<=lst.get(1))){
sum += x - lst.get(0)+1;
}
}
return sum;
}
//Do binary search on the range.
public static int find(int low, int high, int k,List<List<Integer>> in){
if(low > high)
return -1;
int mid = low + (high-low)/2;
int lowIdx = lowest(in,mid);
int highIdx = highest(in,mid);
//k lies between the current numbers high and low indices
if(k > lowIdx && k <= highIdx) return mid;
//k less than lower index. go on to left side
if(k <= lowIdx) return find(low,mid-1,k,in);
// k greater than higher index go to right
if(k > highIdx) return find(mid+1,high,k,in);
else
return -1; // catch statement
}
It's possible to count how many numbers in the list are less than some chosen number X (by iterating through all of the intervals). Now, if this number is greater than n, the solution is certainly smaller than X. Similarly, if this number is less than or equal to n, the solution is greater than or equal to X. Based on these observation we can use binary search.
Below is a Java implementation :
public int nthElement( int[] lowerBound, int[] upperBound, int n )
{
int lo = Integer.MIN_VALUE, hi = Integer.MAX_VALUE;
while ( lo < hi ) {
int X = (int)( ((long)lo+hi+1)/2 );
long count = 0;
for ( int i=0; i<lowerBound.length; ++i ) {
if ( X >= lowerBound[i] && X <= upperBound[i] ) {
// part of interval i is less than X
count += (long)X - lowerBound[i];
}
if ( X >= lowerBound[i] && X > upperBound[i] ) {
// all numbers in interval i are less than X
count += (long)upperBound[i] - lowerBound[i] + 1;
}
}
if ( count <= n ) lo = X;
else hi = X-1;
}
return lo;
}

Find subset with elements that are furthest apart from eachother

I have an interview question that I can't seem to figure out. Given an array of size N, find the subset of size k such that the elements in the subset are the furthest apart from each other. In other words, maximize the minimum pairwise distance between the elements.
Example:
Array = [1,2,6,10]
k = 3
answer = [1,6,10]
The bruteforce way requires finding all subsets of size k which is exponential in runtime.
One idea I had was to take values evenly spaced from the array. What I mean by this is
Take the 1st and last element
find the difference between them (in this case 10-1) and divide that by k ((10-1)/3=3)
move 2 pointers inward from both ends, picking out elements that are +/- 3 from your previous pick. So in this case, you start from 1 and 10 and find the closest elements to 4 and 7. That would be 6.
This is based on the intuition that the elements should be as evenly spread as possible. I have no idea how to prove it works/doesn't work. If anyone knows how or has a better algorithm please do share. Thanks!
This can be solved in polynomial time using DP.
The first step is, as you mentioned, sort the list A. Let X[i,j] be the solution for selecting j elements from first i elements A.
Now, X[i+1, j+1] = max( min( X[k,j], A[i+1]-A[k] ) ) over k<=i.
I will leave initialization step and memorization of subset step for you to work on.
In your example (1,2,6,10) it works the following way:
1 2 6 10
1 - - - -
2 - 1 5 9
3 - - 1 4
4 - - - 1
The basic idea is right, I think. You should start by sorting the array, then take the first and the last elements, then determine the rest.
I cannot think of a polynomial algorithm to solve this, so I would suggest one of the two options.
One is to use a search algorithm, branch-and-bound style, since you have a nice heuristic at hand: the upper bound for any solution is the minimum size of the gap between the elements picked so far, so the first guess (evenly spaced cells, as you suggested) can give you a good baseline, which will help prune most of the branches right away. This will work fine for smaller values of k, although the worst case performance is O(N^k).
The other option is to start with the same baseline, calculate the minimum pairwise distance for it and then try to improve it. Say you have a subset with minimum distance of 10, now try to get one with 11. This can be easily done by a greedy algorithm -- pick the first item in the sorted sequence such that the distance between it and the previous item is bigger-or-equal to the distance you want. If you succeed, try increasing further, if you fail -- there is no such subset.
The latter solution can be faster when the array is large and k is relatively large as well, but the elements in the array are relatively small. If they are bound by some value M, this algorithm will take O(N*M) time, or, with a small improvement, O(N*log(M)), where N is the size of the array.
As Evgeny Kluev suggests in his answer, there is also a good upper bound on the maximum pairwise distance, which can be used in either one of these algorithms. So the complexity of the latter is actually O(N*log(M/k)).
You can do this in O(n*(log n) + n*log(M)), where M is max(A) - min(A).
The idea is to use binary search to find the maximum separation possible.
First, sort the array. Then, we just need a helper function that takes in a distance d, and greedily builds the longest subarray possible with consecutive elements separated by at least d. We can do this in O(n) time.
If the generated array has length at least k, then the maximum separation possible is >=d. Otherwise, it's strictly less than d. This means we can use binary search to find the maximum value. With some cleverness, you can shrink the 'low' and 'high' bounds of the binary search, but it's already so fast that sorting would become the bottleneck.
Python code:
def maximize_distance(nums: List[int], k: int) -> List[int]:
"""Given an array of numbers and size k, uses binary search
to find a subset of size k with maximum min-pairwise-distance"""
assert len(nums) >= k
if k == 1:
return [nums[0]]
nums.sort()
def longest_separated_array(desired_distance: int) -> List[int]:
"""Given a distance, returns a subarray of nums
of length k with pairwise differences at least that distance (if
one exists)."""
answer = [nums[0]]
for x in nums[1:]:
if x - answer[-1] >= desired_distance:
answer.append(x)
if len(answer) == k:
break
return answer
low, high = 0, (nums[-1] - nums[0])
while low < high:
mid = (low + high + 1) // 2
if len(longest_separated_array(mid)) == k:
low = mid
else:
high = mid - 1
return longest_separated_array(low)
I suppose your set is ordered. If not, my answer will be changed slightly.
Let's suppose you have an array X = (X1, X2, ..., Xn)
Energy(Xi) = min(|X(i-1) - Xi|, |X(i+1) - Xi|), 1 < i <n
j <- 1
while j < n - k do
X.Exclude(min(Energy(Xi)), 1 < i < n)
j <- j + 1
n <- n - 1
end while
$length = length($array);
sort($array); //sorts the list in ascending order
$differences = ($array << 1) - $array; //gets the difference between each value and the next largest value
sort($differences); //sorts the list in ascending order
$max = ($array[$length-1]-$array[0])/$M; //this is the theoretical max of how large the result can be
$result = array();
for ($i = 0; i < $length-1; $i++){
$count += $differences[i];
if ($length-$i == $M - 1 || $count >= $max){ //if there are either no more coins that can be taken or we have gone above or equal to the theoretical max, add a point
$result.push_back($count);
$count = 0;
$M--;
}
}
return min($result)
For the non-code people: sort the list, find the differences between each 2 sequential elements, sort that list (in ascending order), then loop through it summing up sequential values until you either pass the theoretical max or there arent enough elements remaining; then add that value to a new array and continue until you hit the end of the array. then return the minimum of the newly created array.
This is just a quick draft though. At a quick glance any operation here can be done in linear time (radix sort for the sorts).
For example, with 1, 4, 7, 100, and 200 and M=3, we get:
$differences = 3, 3, 93, 100
$max = (200-1)/3 ~ 67
then we loop:
$count = 3, 3+3=6, 6+93=99 > 67 so we push 99
$count = 100 > 67 so we push 100
min(99,100) = 99
It is a simple exercise to convert this to the set solution that I leave to the reader (P.S. after all the times reading that in a book, I've always wanted to say it :P)

array median transformation minimum steps

Given an array A with n
integers. In one turn one can apply the
following operation to any consecutive
subarray A[l..r] : assign to all A i (l <= i <= r)
median of subarray A[l..r] .
Let max be the maximum integer of A .
We want to know the minimum
number of operations needed to change A
to an array of n integers each with value
max.
For example, let A = [1, 2, 3] . We want to change it to [3, 3, 3] . We
can do this in two operations, first for
subarray A[2..3] (after that A equals to [1,
3, 3] ), then operation to A[1..3] .
Also,median is defined for some array A as follows. Let B be the same
array A , but sorted in non-decreasing
order. Median of A is B m (1-based
indexing), where m equals to (n div 2)+1 .
Here 'div' is an integer division operation.
So, for a sorted array with 5 elements,
median is the 3rd element and for a sorted
array with 6 elements, it is the 4th element.
Since the maximum value of N is 30.I thought of brute forcing the result
could there be a better solution.
You can double the size of the subarray containing the maximum element in each iteration. After the first iteration, there is a subarray of size 2 containing the maximum. Then apply your operation to a subarray of size 4, containing those 2 elements, giving you a subarray of size 4 containing the maximum. Then apply to a size 8 subarray and so on. You fill the array in log2(N) operations, which is optimal. If N is 30, five operations is enough.
This is optimal in the worst case (i.e. when only one element is the maximum), since it sets the highest possible number of elements in each iteration.
Update 1: I noticed I messed up the 4s and 8s a bit. Corrected.
Update 2: here's an example. Array size 10, start state:
[6 1 5 9 3 2 0 7 4 8]
To get two nines, run op on subarray of size two containing the nine. For instance A[4…5] gets you:
[6 1 5 9 9 2 0 7 4 8]
Now run on size four subarray that contains 4…5, for instance on A[2…5] to get:
[6 9 9 9 9 2 0 7 4 8]
Now on subarray of size 8, for instance A[1…8], get:
[9 9 9 9 9 9 9 9 4 8]
Doubling now would get us 16 nines, but we have only 10 positions, so round of with A[1…10], get:
[9 9 9 9 9 9 9 9 9 9]
Update 3: since this is only optimal in the worst case, it is actually not an answer to the original question, which asks for a way of finding the minimal number of operations for all inputs. I misinterpreted the sentence about brute forcing to be about brute forcing with the median operations, rather than in finding the minimum sequence of operations.
This is the problem from codechef Long Contest.Since the contest is already over,so awkwardiom ,i am pasting the problem setter approach (Source : CC Contest Editorial Page).
"Any state of the array can be represented as a binary mask with each bit 1 means that corresponding number is equal to the max and 0 otherwise. You can run DP with state R[mask] and O(n) transits. You can proof (or just believe) that the number of statest will be not big, of course if you run good DP. The state of our DP will be the mask of numbers that are equal to max. Of course, it makes sense to use operation only for such subarray [l; r] that number of 1-bits is at least as much as number of 0-bits in submask [l; r], because otherwise nothing will change. Also you should notice that if the left bound of your operation is l it is good to make operation only with the maximal possible r (this gives number of transits equal to O(n)). It was also useful for C++ coders to use map structure to represent all states."
The C/C++ Code is::
#include <cstdio>
#include <iostream>
using namespace std;
int bc[1<<15];
const int M = (1<<15) - 1;
void setMin(int& ret, int c)
{
if(c < ret) ret = c;
}
void doit(int n, int mask, int currentSteps, int& currentBest)
{
int numMax = bc[mask>>15] + bc[mask&M];
if(numMax == n) {
setMin(currentBest, currentSteps);
return;
}
if(currentSteps + 1 >= currentBest)
return;
if(currentSteps + 2 >= currentBest)
{
if(numMax * 2 >= n) {
setMin(currentBest, 1 + currentSteps);
}
return;
}
if(numMax < (1<<currentSteps)) return;
for(int i=0;i<n;i++)
{
int a = 0, b = 0;
int c = mask;
for(int j=i;j<n;j++)
{
c |= (1<<j);
if(mask&(1<<j)) b++;
else a++;
if(b >= a) {
doit(n, c, currentSteps + 1, currentBest);
}
}
}
}
int v[32];
void solveCase() {
int n;
scanf(" %d", &n);
int maxElement = 0;
for(int i=0;i<n;i++) {
scanf(" %d", v+i);
if(v[i] > maxElement) maxElement = v[i];
}
int mask = 0;
for(int i=0;i<n;i++) if(v[i] == maxElement) mask |= (1<<i);
int ret = 0, p = 1;
while(p < n) {
ret++;
p *= 2;
}
doit(n, mask, 0, ret);
printf("%d\n",ret);
}
main() {
for(int i=0;i<(1<<15);i++) {
bc[i] = bc[i>>1] + (i&1);
}
int cases;
scanf(" %d",&cases);
while(cases--) solveCase();
}
The problem setter approach has exponential complexity. It is pretty good for N=30. But not so for larger sizes. I think, it's more interesting to find an exponential time solution. And I found one, with O(N4) complexity.
This approach uses the fact that optimal solution starts with some group of consecutive maximal elements and extends only this single group until whole array is filled with maximal values.
To prove this fact, take 2 starting groups of consecutive maximal elements and extend each of them in optimal way until they merge into one group. Suppose that group 1 needs X turns to grow to size M, group 2 needs Y turns to grow to the same size M, and on turn X + Y + 1 these groups merge. The result is a group of size at least M * 4. Now instead of turn Y for group 2, make an additional turn X + 1 for group 1. In this case group sizes are at least M * 2 and at most M / 2 (even if we count initially maximal elements, that might be included in step Y). After this change, on turn X + Y + 1 the merged group size is at least M * 4 only as a result of the first group extension, add to this at least one element from second group. So extending a single group here produces larger group in same number of steps (and if Y > 1, it even requires less steps). Since this works for equal group sizes (M), it will work even better for non-equal groups. This proof may be extended to the case of several groups (more than two).
To work with single group of consecutive maximal elements, we need to keep track of only two values: starting and ending positions of the group. Which means it is possible to use a triangular matrix to store all possible groups, allowing to use a dynamic programming algorithm.
Pseudo-code:
For each group of consecutive maximal elements in original array:
Mark corresponding element in the matrix and clear other elements
For each matrix diagonal, starting with one, containing this element:
For each marked element in this diagonal:
Retrieve current number of turns from this matrix element
(use indexes of this matrix element to initialize p1 and p2)
p2 = end of the group
p1 = start of the group
Decrease p1 while it is possible to keep median at maximum value
(now all values between p1 and p2 are assumed as maximal)
While p2 < N:
Check if number of maximal elements in the array is >= N/2
If this is true, compare current number of turns with the best result \
and update it if necessary
(additional matrix with number of maximal values between each pair of
points may be used to count elements to the left of p1 and to the
right of p2)
Look at position [p1, p2] in the matrix. Mark it and if it contains \
larger number of turns, update it
Repeat:
Increase p1 while it points to maximal value
Increment p1 (to skip one non-maximum value)
Increase p2 while it is possible to keep median at maximum value
while median is not at maximum value
To keep algorithm simple, I didn't mention special cases when group starts at position 0 or ends at position N, skipped initialization and didn't make any optimizations.

Resources