I have faced the following problem recently:
We have a sequence A of M consecutive integers, beginning at A[1] = 1:
1,2,...M (example: M = 8 , A = 1,2,3,4,5,6,7,8 )
We have the set T consisting of all possible subsequences made from L_T consecutive terms of A.
(example L_T = 3 , subsequences are {1,2,3},{2,3,4},{3,4,5},...). Let's call the elements of T "tiles".
We have the set S consisting of all possible subsequences of A that have length L_S. ( example L_S = 4, subsequences like {1,2,3,4} , {1,3,7,8} ,...{4,5,7,8} ).
We say that an element s of S can be "covered" by K "tiles" of T if there exist K tiles in T such that the union of their sets of terms contains the terms of s as a subset. For example, subsequence {1,2,3} is possible to cover with 2 tiles of length 2 ({1,2} and {3,4}), while subsequnce {1,3,5} is not possible to "cover" with 2 "tiles" of length 2, but is possible to cover with 2 "tiles" of length 3 ({1,2,3} and {4,5,6}).
Let C be the subset of elements of S that can be covered by K tiles of T.
Find the cardinality of C given M, L_T, L_S, K.
Any ideas would be appreciated how to tackle this problem.
Assume M is divisible by T, so that we have an integer number of tiles covering all elements of the initial set (otherwise the statement is currently unclear).
First, let us count F (P): it will be almost the number of subsequences of length L_S which can be covered by no more than P tiles, but not exactly that.
Formally, F (P) = choose (M/T, P) * choose (P*T, L_S).
We start by choosing exactly P covering tiles: the number of ways is choose (M/T, P).
When the tiles are fixed, we have exactly P * T distinct elements available, and there are choose (P*T, L_S) ways to choose a subsequence.
Well, this approach has a flaw.
Note that, when we chose a tile but did not use its elements at all, we in fact counted some subsequences more than once.
For example, if we fixed three tiles numbered 2, 6 and 7, but used only 2 and 7, we counted the same subsequences again and again when we fixed three tiles numbered 2, 7 and whatever.
The problem described above can be countered by a variation of the inclusion-exclusion principle.
Indeed, for a subsequence which uses only Q tiles out of P selected tiles, it is counted choose (M-Q, P-Q) times instead of only once: Q of P choices are fixed, but the other ones are arbitrary.
Define G (P) as the number of subsequences of length L_S which can be covered by exactly P tiles.
Then, F (P) is sum for Q from 0 to P of the products G (Q) * choose (M-Q, P-Q).
Working from P = 0 upwards, we can calculate all the values of G by calculating the values of F.
For example, we get G (2) from knowing F (2), G (0) and G (1), and also the equation connecting F (2) with G (0), G (1) and G (2).
After that, the answer is simply sum for P from 0 to K of the values G (P).
Related
Suppose this is a question.....
How can I calculate total number of the topological sort without finding all orders?
In general this is
#P-complete. This
particular graph happens to be
series–parallel,
however, which makes it easy. Graphs in series cause the number of
possibilities for each graph to be multiplied. For the particular graph
you show, there are three diamonds in series, each of which has two
valid extensions, so there are eight possibilities.
Check the image here
For each of the rectangles [u,v] either u can appear first or v can appear first
So for these pairs [a,b], [c,d], [e,f], we have two choices.
And for the remaining elements p, q, r, s, they have only one choice because
we have to start with p and end with s
p ->(a or b)-> q -> (c or d) -> r -> (e or f) -> s.
Total = 1 * 2 * 1 * 2 * 1 * 2 * 1 = 8
Hence total of 8 topological ordering is possible.
I have a permutation of numbers 1 to n. in each turn a permutation function is used to map the current permutation to a new one.
The function is defined by F(i) = p[i] which maps each element of the current permutation to a position in the new permutation. Since this function is injective and surjective, it can be proved that we always get to the first permutation again. (It's actually a cycle in the permutations graph)
for example [2,3,1] -> [3,1,2] -> [1,2,3] -> [2,3,1] so the cycle length is 3 since the first and last permutations are the same and we're stuck in a loop.
As input I have a special kind of permutation like this:
[2,1,
4,5,3,
7,8,9,10,6,
17,11,12,13,14,15,16,
18,19,20,
29,21,22,23,24,25,26,27,28,
40,30,31,32,33,34,35,36,37,38,39,
53,41,42,43,44,45,46,47,48,49,50,51,52]
It's made of some sub-permutations (each line has a set of numbers which belong to the set of their indexes)
My question is what's the minimum number of moves needed to get to the first permutation again.
As a practice problem in prolog I want to calculate the number of moves for each sub-permutation and get their lcm but I'm not sure how to implement this (how to count the number of moves for each sub-permutation)
Any help would be appreciated
A permutation p can be seen as a bijective function from the set {1,2,...,n} onto itself. Now what you seem to ask for is the minimal number of concatenations of this permutation with itself p o p o ... o p (where o is the concatenation operator with (f o g)(i) := f(g(i))) s.t. you get the identity permutation p0 with p0(i) = i.
You have a permutation that can be easily decomposed into cycles 1->2->1, 3->4->5->3, 6->7->8->9->10->6, ...
Each cycle needs as many concatenations with itself as it has members to get the identity. Since you have cycles of lengths 2, 3, 5, 7, 3, 9, 11, 13 it takes 2*9*5*7*11*13 (the least common multiple) concatenations until all cycles are run through at the same time for the first time.
For example, for N = 3.
The permutations are:
[1,3,2]
[2,3,1]
Note: [1,2,3] and [3,2,1] are not valid here because [1,2,3] increases but doesn't decreases and vice-versa for [3,2,1].
I got this problem in TCS CodeVita 2017, they didn't even provide the editorial for this.
All these permutations have number N somewhere in the middle
All numbers that less than N can be divided into two groups: left and right. The left group is in increasing order, the right group is in decreasing order.
The left and the right groups could not be empty.
The answer will be equal to the number of different left groups because this group should immediately be followed by N and all remaining numbers in decreasing order.
The left group can contain all numbers besides N. And it can neither be empty nor contain all N-1 numbers.
Therefore, the answer is the number of subsets of numbers {1, 2, ..., N-1} minus two corner cases. That is 2^(N-1) - 2.
The algorithm would be as follows
The peak element would always be N
N cannot be at any of the 2 ends, so We can place it at N-2 positions
suppose N was at the i'th position, we can select i-1 numbers to be at the left side. these would be placed in a sorted manner, the other elements would simply be placed in reverse order after N
Number of ways to select i elements from n = nCi, we have to select (i-1) elements from (n-1) elements
N can be at any index from i = 2 to N-1 (assuming index starts from 1)
Answer would be
(n-1)C1 + (n-1)C2 + (n-1)C3 + (n-1)C4 + ....(n-1)Cn-2 = 2^(n-1) - 2 // -2 handles the case for n-1C0 and n-1Cn-1
I want to solve a problem without loops. Assume we have N place, and for each place you can select a number between [p,q] but the number in place i+1 should be less or equal than the number in place i. Now how to count all possible strings without brute force.
For example assume we have 2 places and you can select a number between [2,3] then the possible sequences can be:
3 3
3 2
2 2
As the number of places is not limited also the p an q then it is impossible to solve it with simple loops.
It's choose(q-p+N, N) where choose is the binomial coefficient.
Weakly decreasing sequences that are between p and q are in bijection with sequences of 0's and 1's of length q-p+N, where the sequences have exactly N ones. It's obvious that the number of such sequences is choose(q-p+N, N) because that's the number of ways of choosing N things from q-p+N things.
The proof that the two sets are in bijection
Given a sequence xs of 0's and 1's of length q-p+N with N 1's, this pseudocode generates a weakly decreasing sequence of numbers between p and q:
c = q
for x in xs
if x = 1 then output c
if x = 0 then c <- c - 1
Conversely, given an weakly decreasing sequence xs of length N between p and q, this pseudocode generates a sequence of 0's and 1's of length q-p+N with exactly N 1's.
c = q
while xs is not empty
if c = head(hs) then
output 1
xs <- tail(xs)
else
output 0
c <- c - 1
while c > p
output 0
c <- c - 1
Here head(xs) denotes the first thing in the xs list, and tail(xs) the remainder of the list.
My input are three numbers - a number s and the beginning b and end e of a range with 0 <= s,b,e <= 10^1000. The task is to find the minimal Levenstein distance between s and all numbers in range [b, e]. It is not necessary to find the number minimizing the distance, the minimal distance is sufficient.
Obviously I have to read the numbers as string, because standard C++ type will not handle such large numbers. Calculating the Levenstein distance for every number in the possibly huge range is not feasible.
Any ideas?
[EDIT 10/8/2013: Some cases considered in the DP algorithm actually don't need to be considered after all, though considering them does not lead to incorrectness :)]
In the following I describe an algorithm that takes O(N^2) time, where N is the largest number of digits in any of b, e, or s. Since all these numbers are limited to 1000 digits, this means at most a few million basic operations, which will take milliseconds on any modern CPU.
Suppose s has n digits. In the following, "between" means "inclusive"; I will say "strictly between" if I mean "excluding its endpoints". Indices are 1-based. x[i] means the ith digit of x, so e.g. x[1] is its first digit.
Splitting up the problem
The first thing to do is to break up the problem into a series of subproblems in which each b and e have the same number of digits. Suppose e has k >= 0 more digits than s: break up the problem into k+1 subproblems. E.g. if b = 5 and e = 14032, create the following subproblems:
b = 5, e = 9
b = 10, e = 99
b = 100, e = 999
b = 1000, e = 9999
b = 10000, e = 14032
We can solve each of these subproblems, and take the minimum solution.
The easy cases: the middle
The easy cases are the ones in the middle. Whenever e has k >= 1 more digits than b, there will be k-1 subproblems (e.g. 3 above) in which b is a power of 10 and e is the next power of 10, minus 1. Suppose b is 10^m. Notice that choosing any digit between 1 and 9, followed by any m digits between 0 and 9, produces a number x that is in the range b <= x <= e. Furthermore there are no numbers in this range that cannot be produced this way. The minimum Levenshtein distance between s (or in fact any given length-n digit string that doesn't start with a 0) and any number x in the range 10^m <= x <= 10^(m+1)-1 is necessarily abs(m+1-n), since if m+1 >= n it's possible to simply choose the first n digits of x to be the same as those in s, and delete the remainder, and if m+1 < n then choose the first m+1 to be the same as those in s and insert the remainder.
In fact we can deal with all these subproblems in a single constant-time operation: if the smallest "easy" subproblem has b = 10^m and the largest "easy" subproblem has b = 10^u, then the minimum Levenshtein distance between s and any number in any of these ranges is m-n if n < m, n-u if n > u, and 0 otherwise.
The hard cases: the end(s)
The hard cases are when b and e are not restricted to have the form b = 10^m and e = 10^(m+1)-1 respectively. Any master problem can generate at most two subproblems like this: either two "ends" (resulting from a master problem in which b and e have different numbers of digits, such as the example at the top) or a single subproblem (i.e. the master problem itself, which didn't need to be subdivided at all because b and e already have the same number of digits). Note that due to the previous splitting of the problem, we can assume that the subproblem's b and e have the same number of digits, which we will call m.
Super-Levenshtein!
What we will do is design a variation of the Levenshtein DP matrix that calculates the minimum Levenshtein distance between a given digit string (s) and any number x in the range b <= x <= e. Despite this added "power", the algorithm will still run in O(n^2) time :)
First, observe that if b and e have the same number of digits and b != e, then it must be the case that they consist of some number q >= 0 of identical digits at the left, followed by a digit that is larger in e than in b. Now consider the following procedure for generating a random digit string x:
Set x to the first q digits of b.
Append a randomly-chosen digit d between b[i] and e[i] to x.
If d == b[i], we "hug" the lower bound:
For i from q+1 to m:
If b[i] == 9 then append b[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that e[i] will be larger then b[i], and there is no digit larger than 9!]
Otherwise, flip a coin:
Heads: Append b[i].
Tails: Append a randomly-chosen digit d > b[i], then goto 6.
Stop.
Else if d == e[i], we "hug" the upper bound:
For i from q+1 to m:
If e[i] == 0 then append e[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that b[i] will be smaller then e[i], and there is no digit smaller than 0!]
Otherwise, flip a coin:
Heads: Append e[i].
Tails: Append a randomly-chosen digit d < e[i], then goto 6.
Stop.
Otherwise (if d is strictly between b[i] and e[i]), drop through to step 6.
Keep appending randomly-chosen digits to x until it has m digits.
The basic idea is that after including all the digits that you must include, you can either "hug" the lower bound's digits for as long as you want, or "hug" the upper bound's digits for as long as you want, and as soon as you decide to stop "hugging", you can thereafter choose any digits you want. For suitable random choices, this procedure will generate all and only the numbers x such that b <= x <= e.
In the "usual" Levenshtein distance computation between two strings s and x, of lengths n and m respectively, we have a rectangular grid from (0, 0) to (n, m), and at each grid point (i, j) we record the Levenshtein distance between the prefix s[1..i] and the prefix x[1..j]. The score at (i, j) is calculated from the scores at (i-1, j), (i, j-1) and (i-1, j-1) using bottom-up dynamic programming. To adapt this to treat x as one of a set of possible strings (specifically, a digit string corresponding to a number between b and e) instead of a particular given string, what we need to do is record not one but two scores for each grid point: one for the case where we assume that the digit at position j was chosen to hug the lower bound, and one where we assume it was chosen to hug the upper bound. The 3rd possibility (step 5 above) doesn't actually require space in the DP matrix because we can work out the minimal Levenshtein distance for the entire rest of the input string immediately, very similar to the way we work it out for the "easy" subproblems in the first section.
Super-Levenshtein DP recursion
Call the overall minimal score at grid point (i, j) v(i, j). Let diff(a, b) = 1 if characters a and b are different, and 0 otherwise. Let inrange(a, b..c) be 1 if the character a is in the range b..c, and 0 otherwise. The calculations are:
# The best Lev distance overall between s[1..i] and x[1..j]
v(i, j) = min(hb(i, j), he(i, j))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the lower bound
hb(i, j) = min(hb(i-1, j)+1, hb(i, j-1)+1, hb(i-1, j-1)+diff(s[i], b[j]))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the upper bound
he(i, j) = min(he(i-1, j)+1, he(i, j-1)+1, he(i-1, j-1)+diff(s[i], e[j]))
At the point in time when v(i, j) is being calculated, we will also calculate the Levenshtein distance resulting from choosing to "stop hugging", i.e. by choosing a digit that is strictly in between b[j] and e[j] (if j == q) or (if j != q) is either above b[j] or below e[j], and thereafter freely choosing digits to make the suffix of x match the suffix of s as closely as possible:
# The best Lev distance possible between the ENTIRE STRINGS s and x, given that
# we choose to stop hugging at the jth digit of x, and have optimally aligned
# the first i digits of s to these j digits
sh(i, j) = if j >= q then shc(i, j)+abs(n-i-m+j)
else infinity
shc(i, j) = if j == q then
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..(e[j]-1)))
else
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..9),
he(i, j-1)+1, he(i-1, j-1)+inrange(s[i], (0..(e[j]-1)))
The formula for shc(i, j) doesn't need to consider "downward" moves, since such moves don't involve any digit choice for x.
The overall minimal Levenshtein distance is the minimum of v(n, m) and sh(i, j), for all 0 <= i <= n and 0 <= j <= m.
Complexity
Take N to be the largest number of digits in any of s, b or e. The original problem can be split in linear time into at most 1 set of easy problems that collectively takes O(1) time to solve and 2 hard subproblems that each take O(N^2) time to solve using the super-Levenshtein algorithm, so overall the problem can be solved in O(N^2) time, i.e. time proportional to the square of the number of digits.
A first idea to speed up the computation (works if |e-b| is not too large):
Question: how much can the Levestein distance change when we compare s with n and then with n+1?
Answer: not too much!
Let's see the dynamic-programming tables for s = 12007 and two consecutive n
n = 12296
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 3
and
n = 12297
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 2
As you can see, only the last column changes, since n and n+1 have the same digits, except for the last one.
If you have the dynamic-programming table for the edit-distance of s = 12001 and n = 12296, you already have the table for n = 12297, you just need to update the last column!
Obviously if n = 12299 then n+1 = 12300 and you need to update the last 3 columns of the previous table.. but this happens just once every 100 iteration.
In general, you have to
update the last column on every iterations (so, length(s) cells)
update the second-to-last too, once every 10 iterations
update the third-to-last, too, once every 100 iterations
so let L = length(s) and D = e-b. First you compute the edit-distance between s and b. Then you can find the minimum Levenstein distance over [b,e] looping over every integer in the interval. There are D of them, so the execution time is about:
Now since
we have an algorithm wich is