Given a permutation of natural integers from 1 to N, inclusive. Initially, the permutation is 1, 2, 3, ..., N. We are also given M pairs of integers, where the i-th is (Li,Ri). In a single turn we can choose any of these pairs (let's say with the index j) and arbitrarily shuffle the elements of our permutation on the positions from Lj to Rj, inclusive (the positions are 1-based). We are not limited in the number of turns and you can pick any pair more than once.
The goal is to obtain the permutation P, that is given. If it's possible, output "Possible", otherwise output "Impossible".
Example : Let N=7 and M=4 and array be [3 1 2 4 5 7 6] and queries are :
1 2
4 4
6 7
2 3
Here answer is Possible.
Treat each pair as an interval, compute the union of intervals as a list of non-overlapping intervals, and then test, for each i, whether the value at position i of the permutation either is i or is in the same non-overlapping interval as i.
This works because, if we have a <= b <= c <= d with pairs (a, c) and (b, d), then by repeatedly invoking (a, c) and (b, d), we can get any permutation that we could get with (a, d). Conversely, (a, d) enables any permutation that we could get with (a, c) and (b, d). Once the list of pairs is non-overlapping, it's clear that we can move element i to position j != i if and only if i and j are in the same interval.
Related
My data structure is a path represented by a list of cities. If, for example, the cities are
A, B, C, D
A possible configuration could be: A, B, D, C or D, C, A, B.
I need two compare two paths in order to find the differences between these two, in such a way that the output of this procedure returns the set of swapping operations necessary to transform the second path into the first one.
For example, given the following paths:
X = {A, B, D, C}
Y = {D, C, A, B}
indexes = {0, 1, 2, 3}
A possible way to transform the path Y into X would be the set of the following swaps: {0-2, 1-3}.
{D, C, A, B} --> [0-2] --> {A, C, D, B} --> [1-3] --> {A, B, D, C}
Is there any known (and fast) algorithm that allows to compute this set?
Your problem looks like a problem of counting the minimal number of swaps to transform one permutation to another.
In fact it's a well known problem. The key idea is to create new permutation P such that P[i] is the index of X[i] city in the Y. Then you just calculate the total number of cycles C in the P. The answer is the len(X) - C, where len(X) is the size of X.
In your case P looks like: 3, 4, 1, 2. It has two cycles: 3, 1 and 4, 2. So the answer is 4 - 2 = 2.
Total complexity is linear.
For more details see this answer. It explains this algorithm in more details.
EDIT
Okay, but how we can get swaps, and not only their number? Note, that in this solution we reorder each cycle independently doing N - 1 swaps if the length of cycle is N. So, if you have cycle v(0), v(1), ..., v(N - 1), v(N) you just need to swap v(N), v(N - 1), v(N - 1), v(N - 2), ..., v(1), v(0). So you swap cycle elements in reverse order.
Also, if you have C cycles with lengths L(1), L(2), ..., L(C) the number of swaps is L(1) - 1 + L(2) - 1 + ... + L(C) - 1 = L(1) + L(2) + ... + L(C) - C = LEN - C where LEN is the length of permutation.
After the discussion in the comments I must note that terms used here are out of math context, not programming context.
How can I uniformly generate random tuples of natural numbers A and B, so that A + B <= C, where C is constant?
Each possible tuple that meets the criteria should have an equal chance of being generated. For the purposes of this question, a natural number means a positive integer greater than or equal to 1.
Wrong solution (just to explain the question): take random A from 1 to C, take random B from A to C. This way you're as likely to get a tuple where A = 1 as the tuple where A = C, but you have C tuples of first kind and only 1 tuple of the second kind, so individual tuples of these types don't appear with the same probability.
For a given natural number C, there are (C-1) * C / 2 possible natural number* tuples where A + B <= C
e.g. C = 5, the 10 possible natural number tuples are:
(1,1), (1,2), (1,3), (1,4)
(2,1), (2,2), (2,3)
(3,1), (3,2)
(4,1)
So you could choose a random value between [1, (C-1) * C / 2] and find the tuples based on that.
To make it easier to find the tuple, imagine the list doubled with the triangle flipped around and fitted to itself:
(1,1), (1,2), (1,3), (1,4), (4,1)
(2,1), (2,2), (2,3), (3,2), (3,1)
(3,1), (3,2), (2,3), (2,2), (2,1)
(4,1), (1,4), (1,3), (1,2), (1,1)
Now you just need one random number for the row in the range [1, C-1] and one for the column in the range [1, C]
If the row + column <= C then A = row, B = column
Otherwise A = C - row, B = C + 1 - column
(*) Going by the definition of a natural number as a "positive integral starting with 1" given by the OP, which is not the only possible definition of a natural number
Edited for updated question
Generate two numbers:
X ~ U(1, C)
Y ~ U(1, C - X)
Now toss a coin:
with probability 1/2, A <- X, B <- Y
with probability 1/2, A <- Y, B <- X
I need a little help with my assignment in Pseudo codes:
Take input of 4 numbers and print the sum of the largest 3 numbers.
For example:
inputs: 14, 1, 9, 3
output: 14+9+3 => 26
How can I write an algorithm in pseudo codes of the above task?
So far i have come to this:
input a, b, c, d
declare h1, h2, h3
if(a>=b && a>=c && a>=d) h1 = a
if(b>=a && b>=c && b>=d) h2 = b
if(c>=a && c>=b && c>=d) h3 = c
if(d>=a && d>=b && d>=c) h4 = d
print h1+h2+h3
Is this any good?
Let's say that inputs are in array t.
Let sum = t[0].
Let min = t[0].
For i from 1 to 3 repeat steps 5 and 6:
sum += t[i].
if (min > t[i]) min = t[i].
Return sum - min.
Another approach, which you and Brian presented, boils down to sorting (for n=4 you can do it "manually" like you did, but for larger n it's not a good idea) and then taking the sum.
I prefer approach shown above, because it's guaranteed linear time complexity, scales nicely and it's easy to implement. It does exactly one pass through the input data and can be used if the input is streamed one by one and we don't want to store the inputs in the memory (we want to do it "on the fly"). Sorting can be more expensive than linear (can be nlogn) if there are no assumptions whatsoever about the input data.
Your pseudo code is off to a good start. But right now you only find the single largest of your four numbers. You would need to repeat two more times (ignoring the largest) to find the 2nd and 3rd largest.
Another clever idea, though, is if you need the 3 largest numbers, your 4th number must be the smallest. Find the smallest number, then add the others.
input a, b, c, d
declare min
// find the smallest
min = a
if (b < min) min = b
if (c < min) min = c
if (d < min) min = d
// the sum of the largest 3 = the sum of all 4 minus the minimum
print a + b + c + d - min
Use recursion. If the first number in the input is smallest, sum the other three, otherwise rotate the inputs and call recursively. Eventually the smallest number will be a, so the other three are the largest, and you can sum them and return the answer. In pseudocode:
function sum3max(a, b, c, d)
if a == min(a, b, c, d)
return b + c + d
return sum3max(b, c, d, a)
The problem statement:
Given a list of pairs {A|B}
Find the minimum sum where you must take 'm' values from 'A' and 'n' values from 'B'
you may not use the same 'pair' for both A and B
the size of the list will be between 2 and 500 items
the number of items you take (m & n) can also vary
the numbers in the pair (A & B) are ranged 0-9.
There of course can be multiple pair combinations that give you the correct minimum.
For example, given:
1 - {4,5}
2 - {3,2}
3 - {3,1}
4 - {1,0}
and desiring 2 from A, 1 from B
the correct answer is 5
taking 2A(3), 4A(1) and 3B(1).
Another example is:
1 - {5,4}
2 - {2,1}
3 - {6,6}
4 - {2,1}
5 - {5,5}
and desiring 2 from A, 2 from Bthe correct answer is 12
taking 1A(5), 5A(5), 2B(1), 4B(1).
I have solved this using a brute force approach, but of course as the list grows larger, and m/n increase, the performance suffers greatly.
How can I improve on this brute force approach?
What is this class of problem called?
Believe it or not, this is not homework!
It can be formulated as a minimum-cost flow problem. Let the pairs be (a_i, b_i). Create vertices s, t, a, b, u_i and arcs from s to a (capacity m, cost 0), from s to b (capacity n, cost 0), from a to u_i (capacity 1, cost a_i), from b to u_i (capacity 1, cost b_i), from u_i to t (capacity 1, cost 0). Send m + n units of flow as cheaply as possible from s to t. Flow on a->u_i means that a_i is chosen. Flow on b->u_i means that b_i is chosen.
It also can be solved by dynamic programming. Let Cost[i, j, k] be the minimum sum for choosing i A's and j B's from the first k pairs. Then we have recurrence relations
Cost[0, 0, 0] = 0
Cost[i, j, 0] = infinity
(for all i, j such that i > 0 or j > 0)
Cost[i, j, k] = min {Cost[i, j, k-1],
Cost[i-1, j, k-1] + a_k,
Cost[i, j-1, k-1] + b_k}
(for all i, j, k such that k > 0).
Trace back the minimum arguments to reconstruct the optimal choices.
My input are three numbers - a number s and the beginning b and end e of a range with 0 <= s,b,e <= 10^1000. The task is to find the minimal Levenstein distance between s and all numbers in range [b, e]. It is not necessary to find the number minimizing the distance, the minimal distance is sufficient.
Obviously I have to read the numbers as string, because standard C++ type will not handle such large numbers. Calculating the Levenstein distance for every number in the possibly huge range is not feasible.
Any ideas?
[EDIT 10/8/2013: Some cases considered in the DP algorithm actually don't need to be considered after all, though considering them does not lead to incorrectness :)]
In the following I describe an algorithm that takes O(N^2) time, where N is the largest number of digits in any of b, e, or s. Since all these numbers are limited to 1000 digits, this means at most a few million basic operations, which will take milliseconds on any modern CPU.
Suppose s has n digits. In the following, "between" means "inclusive"; I will say "strictly between" if I mean "excluding its endpoints". Indices are 1-based. x[i] means the ith digit of x, so e.g. x[1] is its first digit.
Splitting up the problem
The first thing to do is to break up the problem into a series of subproblems in which each b and e have the same number of digits. Suppose e has k >= 0 more digits than s: break up the problem into k+1 subproblems. E.g. if b = 5 and e = 14032, create the following subproblems:
b = 5, e = 9
b = 10, e = 99
b = 100, e = 999
b = 1000, e = 9999
b = 10000, e = 14032
We can solve each of these subproblems, and take the minimum solution.
The easy cases: the middle
The easy cases are the ones in the middle. Whenever e has k >= 1 more digits than b, there will be k-1 subproblems (e.g. 3 above) in which b is a power of 10 and e is the next power of 10, minus 1. Suppose b is 10^m. Notice that choosing any digit between 1 and 9, followed by any m digits between 0 and 9, produces a number x that is in the range b <= x <= e. Furthermore there are no numbers in this range that cannot be produced this way. The minimum Levenshtein distance between s (or in fact any given length-n digit string that doesn't start with a 0) and any number x in the range 10^m <= x <= 10^(m+1)-1 is necessarily abs(m+1-n), since if m+1 >= n it's possible to simply choose the first n digits of x to be the same as those in s, and delete the remainder, and if m+1 < n then choose the first m+1 to be the same as those in s and insert the remainder.
In fact we can deal with all these subproblems in a single constant-time operation: if the smallest "easy" subproblem has b = 10^m and the largest "easy" subproblem has b = 10^u, then the minimum Levenshtein distance between s and any number in any of these ranges is m-n if n < m, n-u if n > u, and 0 otherwise.
The hard cases: the end(s)
The hard cases are when b and e are not restricted to have the form b = 10^m and e = 10^(m+1)-1 respectively. Any master problem can generate at most two subproblems like this: either two "ends" (resulting from a master problem in which b and e have different numbers of digits, such as the example at the top) or a single subproblem (i.e. the master problem itself, which didn't need to be subdivided at all because b and e already have the same number of digits). Note that due to the previous splitting of the problem, we can assume that the subproblem's b and e have the same number of digits, which we will call m.
Super-Levenshtein!
What we will do is design a variation of the Levenshtein DP matrix that calculates the minimum Levenshtein distance between a given digit string (s) and any number x in the range b <= x <= e. Despite this added "power", the algorithm will still run in O(n^2) time :)
First, observe that if b and e have the same number of digits and b != e, then it must be the case that they consist of some number q >= 0 of identical digits at the left, followed by a digit that is larger in e than in b. Now consider the following procedure for generating a random digit string x:
Set x to the first q digits of b.
Append a randomly-chosen digit d between b[i] and e[i] to x.
If d == b[i], we "hug" the lower bound:
For i from q+1 to m:
If b[i] == 9 then append b[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that e[i] will be larger then b[i], and there is no digit larger than 9!]
Otherwise, flip a coin:
Heads: Append b[i].
Tails: Append a randomly-chosen digit d > b[i], then goto 6.
Stop.
Else if d == e[i], we "hug" the upper bound:
For i from q+1 to m:
If e[i] == 0 then append e[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that b[i] will be smaller then e[i], and there is no digit smaller than 0!]
Otherwise, flip a coin:
Heads: Append e[i].
Tails: Append a randomly-chosen digit d < e[i], then goto 6.
Stop.
Otherwise (if d is strictly between b[i] and e[i]), drop through to step 6.
Keep appending randomly-chosen digits to x until it has m digits.
The basic idea is that after including all the digits that you must include, you can either "hug" the lower bound's digits for as long as you want, or "hug" the upper bound's digits for as long as you want, and as soon as you decide to stop "hugging", you can thereafter choose any digits you want. For suitable random choices, this procedure will generate all and only the numbers x such that b <= x <= e.
In the "usual" Levenshtein distance computation between two strings s and x, of lengths n and m respectively, we have a rectangular grid from (0, 0) to (n, m), and at each grid point (i, j) we record the Levenshtein distance between the prefix s[1..i] and the prefix x[1..j]. The score at (i, j) is calculated from the scores at (i-1, j), (i, j-1) and (i-1, j-1) using bottom-up dynamic programming. To adapt this to treat x as one of a set of possible strings (specifically, a digit string corresponding to a number between b and e) instead of a particular given string, what we need to do is record not one but two scores for each grid point: one for the case where we assume that the digit at position j was chosen to hug the lower bound, and one where we assume it was chosen to hug the upper bound. The 3rd possibility (step 5 above) doesn't actually require space in the DP matrix because we can work out the minimal Levenshtein distance for the entire rest of the input string immediately, very similar to the way we work it out for the "easy" subproblems in the first section.
Super-Levenshtein DP recursion
Call the overall minimal score at grid point (i, j) v(i, j). Let diff(a, b) = 1 if characters a and b are different, and 0 otherwise. Let inrange(a, b..c) be 1 if the character a is in the range b..c, and 0 otherwise. The calculations are:
# The best Lev distance overall between s[1..i] and x[1..j]
v(i, j) = min(hb(i, j), he(i, j))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the lower bound
hb(i, j) = min(hb(i-1, j)+1, hb(i, j-1)+1, hb(i-1, j-1)+diff(s[i], b[j]))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the upper bound
he(i, j) = min(he(i-1, j)+1, he(i, j-1)+1, he(i-1, j-1)+diff(s[i], e[j]))
At the point in time when v(i, j) is being calculated, we will also calculate the Levenshtein distance resulting from choosing to "stop hugging", i.e. by choosing a digit that is strictly in between b[j] and e[j] (if j == q) or (if j != q) is either above b[j] or below e[j], and thereafter freely choosing digits to make the suffix of x match the suffix of s as closely as possible:
# The best Lev distance possible between the ENTIRE STRINGS s and x, given that
# we choose to stop hugging at the jth digit of x, and have optimally aligned
# the first i digits of s to these j digits
sh(i, j) = if j >= q then shc(i, j)+abs(n-i-m+j)
else infinity
shc(i, j) = if j == q then
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..(e[j]-1)))
else
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..9),
he(i, j-1)+1, he(i-1, j-1)+inrange(s[i], (0..(e[j]-1)))
The formula for shc(i, j) doesn't need to consider "downward" moves, since such moves don't involve any digit choice for x.
The overall minimal Levenshtein distance is the minimum of v(n, m) and sh(i, j), for all 0 <= i <= n and 0 <= j <= m.
Complexity
Take N to be the largest number of digits in any of s, b or e. The original problem can be split in linear time into at most 1 set of easy problems that collectively takes O(1) time to solve and 2 hard subproblems that each take O(N^2) time to solve using the super-Levenshtein algorithm, so overall the problem can be solved in O(N^2) time, i.e. time proportional to the square of the number of digits.
A first idea to speed up the computation (works if |e-b| is not too large):
Question: how much can the Levestein distance change when we compare s with n and then with n+1?
Answer: not too much!
Let's see the dynamic-programming tables for s = 12007 and two consecutive n
n = 12296
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 3
and
n = 12297
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 2
As you can see, only the last column changes, since n and n+1 have the same digits, except for the last one.
If you have the dynamic-programming table for the edit-distance of s = 12001 and n = 12296, you already have the table for n = 12297, you just need to update the last column!
Obviously if n = 12299 then n+1 = 12300 and you need to update the last 3 columns of the previous table.. but this happens just once every 100 iteration.
In general, you have to
update the last column on every iterations (so, length(s) cells)
update the second-to-last too, once every 10 iterations
update the third-to-last, too, once every 100 iterations
so let L = length(s) and D = e-b. First you compute the edit-distance between s and b. Then you can find the minimum Levenstein distance over [b,e] looping over every integer in the interval. There are D of them, so the execution time is about:
Now since
we have an algorithm wich is