Related
Consider a binary sequence b of length N. Initially, all the bits are set to 0. We define a flip operation with 2 arguments, flip(L,R), such that:
All bits with indices between L and R are "flipped", meaning a bit with value 1 becomes a bit with value 0 and vice-versa. More exactly, for all i in range [L,R]: b[i] = !b[i].
Nothing happens to bits outside the specified range.
You are asked to determine the number of possible different sequences that can be obtained using exactly K flip operations modulo an arbitrary given number, let's call it MOD.
More specifically, each test contains on the first line a number T, the number of queries to be given. Then there are T queries, each one being of the form N, K, MOD with the meaning from above.
1 ≤ N, K ≤ 300 000
T ≤ 250
2 ≤ MOD ≤ 1 000 000 007
Sum of all N-s in a test is ≤ 600 000
time limit: 2 seconds
memory limit: 65536 kbytes
Example :
Input :
1
2 1 1000
Output :
3
Explanation :
There is a single query. The initial sequence is 00. We can do the following operations :
flip(1,1) ⇒ 10
flip(2,2) ⇒ 01
flip(1,2) ⇒ 11
So there are 3 possible sequences that can be generated using exactly 1 flip.
Some quick observations that I've made, although I'm not sure they are totally correct :
If K is big enough, that is if we have a big enough number of flips at our disposal, we should be able to obtain 2n sequences.
If K=1, then the result we're looking for is N(N+1)/2. It's also C(n,1)+C(n,2), where C is the binomial coefficient.
Currently trying a brute force approach to see if I can spot a rule of some kind. I think this is a sum of some binomial coefficients, but I'm not sure.
I've also come across a somewhat simpler variant of this problem, where the flip operation only flips a single specified bit. In that case, the result is
C(n,k)+C(n,k-2)+C(n,k-4)+...+C(n,(1 or 0)). Of course, there's the special case where k > n, but it's not a huge difference. Anyway, it's pretty easy to understand why that happens.I guess it's worth noting.
Here are a few ideas:
We may assume that no flip operation occurs twice (otherwise, we can assume that it did not happen). It does affect the number of operations, but I'll talk about it later.
We may assume that no two segments intersect. Indeed, if L1 < L2 < R1 < R2, we can just do the (L1, L2 - 1) and (R1 + 1, R2) flips instead. The case when one segment is inside the other is handled similarly.
We may also assume that no two segments touch each other. Otherwise, we can glue them together and reduce the number of operations.
These observations give the following formula for the number of different sequences one can obtain by flipping exactly k segments without "redundant" flips: C(n + 1, 2 * k) (we choose 2 * k ends of segments. They are always different. The left end is exclusive).
If we had perform no more than K flips, the answer would be sum for k = 0...K of C(n + 1, 2 * k)
Intuitively, it seems that its possible to transform the sequence of no more than K flips into a sequence of exactly K flips (for instance, we can flip the same segment two more times and add 2 operations. We can also split a segment of more than two elements into two segments and add one operation).
By running the brute force search (I know that it's not a real proof, but looks correct combined with the observations mentioned above) that the answer this sum minus 1 if n or k is equal to 1 and exactly the sum otherwise.
That is, the result is C(n + 1, 0) + C(n + 1, 2) + ... + C(n + 1, 2 * K) - d, where d = 1 if n = 1 or k = 1 and 0 otherwise.
Here is code I used to look for patterns running a brute force search and to verify that the formula is correct for small n and k:
reachable = set()
was = set()
def other(c):
"""
returns '1' if c == '0' and '0' otherwise
"""
return '0' if c == '1' else '1'
def flipped(s, l, r):
"""
Flips the [l, r] segment of the string s and returns the result
"""
res = s[:l]
for i in range(l, r + 1):
res += other(s[i])
res += s[r + 1:]
return res
def go(xs, k):
"""
Exhaustive search. was is used to speed up the search to avoid checking the
same string with the same number of remaining operations twice.
"""
p = (xs, k)
if p in was:
return
was.add(p)
if k == 0:
reachable.add(xs)
return
for l in range(len(xs)):
for r in range(l, len(xs)):
go(flipped(xs, l, r), k - 1)
def calc_naive(n, k):
"""
Counts the number of reachable sequences by running an exhaustive search
"""
xs = '0' * n
global reachable
global was
was = set()
reachable = set()
go(xs, k)
return len(reachable)
def fact(n):
return 1 if n == 0 else n * fact(n - 1)
def cnk(n, k):
if k > n:
return 0
return fact(n) // fact(k) // fact(n - k)
def solve(n, k):
"""
Uses the formula shown above to compute the answer
"""
res = 0
for i in range(k + 1):
res += cnk(n + 1, 2 * i)
if k == 1 or n == 1:
res -= 1
return res
if __name__ == '__main__':
# Checks that the formula gives the right answer for small values of n and k
for n in range(1, 11):
for k in range(1, 11):
assert calc_naive(n, k) == solve(n, k)
This solution is much better than the exhaustive search. For instance, it can run in O(N * K) time per test case if we compute the coefficients using Pascal's triangle. Unfortunately, it is not fast enough. I know how to solve it more efficiently for prime MOD (using Lucas' theorem), but O do not have a solution in general case.
Multiplicative modular inverses can't solve this problem immediately as k! or (n - k)! may not have an inverse modulo MOD.
Note: I assumed that C(n, m) is defined for all non-negative n and m and is equal to 0 if n < m.
I think I know how to solve it for an arbitrary MOD now.
Let's factorize the MOD into prime factors p1^a1 * p2^a2 * ... * pn^an. Now can solve this problem for each prime factor independently and combine the result using the Chinese remainder theorem.
Let's fix a prime p. Let's assume that p^a|MOD (that is, we need to get the result modulo p^a). We can precompute all p-free parts of the factorial and the maximum power of p that divides the factorial for all 0 <= n <= N in linear time using something like this:
powers = [0] * (N + 1)
p_free = [i for i in range(N + 1)]
p_free[0] = 1
for cur_p in powers of p <= N:
i = cur_p
while i < N:
powers[i] += 1
p_free[i] /= p
i += cur_p
Now the p-free part of the factorial is the product of p_free[i] for all i <= n and the power of p that divides n! is the prefix sum of the powers.
Now we can divide two factorials: the p-free part is coprime with p^a so it always has an inverse. The powers of p are just subtracted.
We're almost there. One more observation: we can precompute the inverses of p-free parts in linear time. Let's compute the inverse for the p-free part of N! using Euclid's algorithm. Now we can iterate over all i from N to 0. The inverse of the p-free part of i! is the inverse for i + 1 times p_free[i] (it's easy to prove it if we rewrite the inverse of the p-free part as a product using the fact that elements coprime with p^a form an abelian group under multiplication).
This algorithm runs in O(N * number_of_prime_factors + the time to solve the system using the Chinese remainder theorem + sqrt(MOD)) time per test case. Now it looks good enough.
You're on a good path with binomial-coefficients already. There are several factors to consider:
Think of your number as a binary-string of length n. Now we can create another array counting the number of times a bit will be flipped:
[0, 1, 0, 0, 1] number
[a, b, c, d, e] number of flips.
But even numbers of flips all lead to the same result and so do all odd numbers of flips. So basically the relevant part of the distribution can be represented %2
Logical next question: How many different combinations of even and odd values are available. We'll take care of the ordering later on, for now just assume the flipping-array is ordered descending for simplicity. We start of with k as the only flipping-number in the array. Now we want to add a flip. Since the whole flipping-array is used %2, we need to remove two from the value of k to achieve this and insert them into the array separately. E.g.:
[5, 0, 0, 0] mod 2 [1, 0, 0, 0]
[3, 1, 1, 0] [1, 1, 1, 0]
[4, 1, 0, 0] [0, 1, 0, 0]
As the last example shows (remember we're operating modulo 2 in the final result), moving a single 1 doesn't change the number of flips in the final outcome. Thus we always have to flip an even number bits in the flipping-array. If k is even, so will the number of flipped bits be and same applies vice versa, no matter what the value of n is.
So now the question is of course how many different ways of filling the array are available? For simplicity we'll start with mod 2 right away.
Obviously we start with 1 flipped bit, if k is odd, otherwise with 1. And we always add 2 flipped bits. We can continue with this until we either have flipped all n bits (or at least as many as we can flip)
v = (k % 2 == n % 2) ? n : n - 1
or we can't spread k further over the array.
v = k
Putting this together:
noOfAvailableFlips:
if k < n:
return k
else:
return (k % 2 == n % 2) ? n : n - 1
So far so well, there are always v / 2 flipping-arrays (mod 2) that differ by the number of flipped bits. Now we come to the next part permuting these arrays. This is just a simple permutation-function (permutation with repetition to be precise):
flipArrayNo(flippedbits):
return factorial(n) / (factorial(flippedbits) * factorial(n - flippedbits)
Putting it all together:
solutionsByFlipping(n, k):
res = 0
for i in [k % 2, noOfAvailableFlips(), step=2]:
res += flipArrayNo(i)
return res
This also shows that for sufficiently large numbers we can't obtain 2^n sequences for the simply reason that we can not arrange operations as we please. The number of flips that actually affect the outcome will always be either even or odd depending upon k. There's no way around this. The best result one can get is 2^(n-1) sequences.
For completeness, here's a dynamic program. It can deal easily with arbitrary modulo since it is based on sums, but unfortunately I haven't found a way to speed it beyond O(n * k).
Let a[n][k] be the number of binary strings of length n with k non-adjacent blocks of contiguous 1s that end in 1. Let b[n][k] be the number of binary strings of length n with k non-adjacent blocks of contiguous 1s that end in 0.
Then:
# we can append 1 to any arrangement of k non-adjacent blocks of contiguous 1's
# that ends in 1, or to any arrangement of (k-1) non-adjacent blocks of contiguous
# 1's that ends in 0:
a[n][k] = a[n - 1][k] + b[n - 1][k - 1]
# we can append 0 to any arrangement of k non-adjacent blocks of contiguous 1's
# that ends in either 0 or 1:
b[n][k] = b[n - 1][k] + a[n - 1][k]
# complete answer would be sum (a[n][i] + b[n][i]) for i = 0 to k
I wonder if the following observations might be useful: (1) a[n][k] and b[n][k] are zero when n < 2*k - 1, and (2) on the flip side, for values of k greater than ⌊(n + 1) / 2⌋ the overall answer seems to be identical.
Python code (full matrices are defined for simplicity, but I think only one row of each would actually be needed, space-wise, for a bottom-up method):
a = [[0] * 11 for i in range(0,11)]
b = [([1] + [0] * 10) for i in range(0,11)]
def f(n,k):
return fa(n,k) + fb(n,k)
def fa(n,k):
global a
if a[n][k] or n == 0 or k == 0:
return a[n][k]
elif n == 2*k - 1:
a[n][k] = 1
return 1
else:
a[n][k] = fb(n-1,k-1) + fa(n-1,k)
return a[n][k]
def fb(n,k):
global b
if b[n][k] or n == 0 or n == 2*k - 1:
return b[n][k]
else:
b[n][k] = fb(n-1,k) + fa(n-1,k)
return b[n][k]
def g(n,k):
return sum([f(n,i) for i in range(0,k+1)])
# example
print(g(10,10))
for i in range(0,11):
print(a[i])
print()
for i in range(0,11):
print(b[i])
I have n strings of different length s1, s2, …, sn that I want to display on a terminal in c columns. The terminal has a width of m characters. Each column i has a certain width wi which is equal to the width of the longest entry in that column. Between each pair of columns there is a certain amount of space s. The total width of all columns including the space between cannot be larger than the width of the terminal (w1 + w2 + … + wc + (c - 1) · s ≤ m). Each column shall contain ⌈n / c⌉ strings, except when n is not evenly dividable by c, in which case the last few columns shall be shorter by one entry or only the last column shall either be shorter depending on whether the strings are arranged across or down.
Is there an efficient (e.g. O(n·w) where w = max(w1, w2, …, wn)) algorithm to figure out the maximal amount of columns that I can fit into c columns, if...
the strings are arranged across
string1 string2 string3 string4
string5 string6 string7 string8
string9 string10
the strings are arranged down
string1 string4 string7 string10
string2 string5 string8
string3 string6 string9
?
Later findings
I found out that s doesn't matter. Each instance of the problem where s > 0 can be translated into an instance where s = 0 by expanding each string by s characters and also expanding the width of the terminal by s characters to compensate for the extra s characters at the end of the screen.
Unfortunately I think the fastest algorithm you can have is O(n^2). This is because you can determine if a configuration is possible for c columns in a single pass of the list but you can't know by how much to change c so basically you'll just have to try a different value for it. At most your algorithm will do this n times.
This is pseudo-code for how I would do it
for c = n.size, c > 0, c --
remainingDist = m - c*s
for i = 1, i <= c, i ++
columnWidth = 0
for j = i, j <= n.size; j += c
columnWidth = max(n[j], columnWidth)
end
remainingDist -= columnWidth
end
if remainingDist >= 0
success
columns = c
break
end
end
you could jump to midway through the loop by first computing an average size of the items and figure an "ideal" number of columns from that.
I'm not sure exactly how to formulate this in code but with a histogram of the strings sorted by size, we may be able to set a theoretical upper bound on the number of columns, to be further refined by an exact method like wckd's. Since column size is dictated by its longest element, and we are obliged to divide columns as evenly as possible, as long as the number of the largest strings so far is a small enough portion of the total, we can continue to split the columns with shorter strings. For example,
size frequency
10 5
6 10
3 11
m = 30, s = 1
start: 30 - (10 + 1) = 19
implies 13 + 13:
10 (x5) 6 (x2)
6 (x8) 3 (x11)
but the portion of 6 and 3 is large enough to split the second column:
19 - (6 + 1) = 12
implies 9 + 9 + 8:
10 (x5) 6 (x6) 3 (x8)
6 (x4) 3 (x3)
but splitting again will still fit:
12 - (6 + 1) = 5
implies 7 + 7 + 6 + 6:
10 (x5) 6 (x7) 6 (x1) 3 (x6)
6 (x2) 3 (x5)
We end up with at most 4 columns theoretically (clearly, sorting the strings is not allowed in the actual result) which may be reduced by wckd's method.
Depending on the data, I wonder if such an optimization could sometimes be useful. Constructing the histogram ought to take O(n + k * log k) time and O(k) space, where k is the number of string sizes (which you have already limited with w < 1000, m < 10000). And the operation I am suggesting is actually independent of n, it depends only on m, s and the distribution of k; and since k is sorted, we only need one pass splitting/calculating the columns.
I looked at the first problem (horizontal fill), and assumed a gap-size (s) of zero, like you suggested in your edit.
First: I know the bounty is over, and I have no proof of an algorithm that does better than O(n²).
However, I do have some ideas to present that might still be of interest.
My proposed algorithm goes as follows:
Get some upper bound of c in O(n) time (I will get to that later)
If c is 0 or 1, or all strings fit on one row then that c is the answer. Stop.
Create an index ss[] on s[] on descending widths, using pigeon hole sort, in O(w+n) (with w = max(s[]), w <= m). One element of ss[] has two attributes: width and seqNo (the original sequence number, as it occurs in s[]).
Then, loop through the widths in decreasing order until we have a width for each column in a c-column configuration.
If the sum of these widths is still not greater than m then c is a solution. More formally:
knownColumnWidths = new Set() of column numbers
sumWidth = 0
for i from 0 to n-1:
colNo = ss[i].seqNo modulo c
if not colNo in knownColumnWidths:
sumWidth += ss[i].width
if sumWidth > m:
// c is not a solution, exit for-loop
break
add colNo to knownColumnWidths
if knownColumnWidths has c values:
// solution found
return c as solution. Stop
If c was rejected as solution, repeat previous code with c = c - 1.
This last part of the algorithm seems O(n²). But, if the for-loop has a worst case performance (i.e. n - c + 1 iterations), then the next few times (c/2) it runs, it will have close to best performance (i.e. close to c iterations). But in the end it still looks like O(n²).
For getting a good upper bound of c (cf. first step above), I propose this:
First fill as many strings on the first row of the terminal without exceeding the limit m, and take that as initial upper limit for c. More formally put:
sumWidth = 0
c = 0
while c < n and sumWidth + s[c] <= m:
sumWidth += s[c]
c++
This is clearly O(n).
This can be further improved as follows:
Take the sum of c widths, but starting one string further, and check if this still is not greater than m. Keep doing this shifting. When m is surpassed, decrease c until the sum of c widths is OK again, and continue the shift with the sum of c consecutive widths.
More formally put, with c starting off with the above found upper limit:
for i from c to n - 1:
if s[i] > m:
c = 0. Stop // should not happen: all widths should be <= m
sumWidth += s[i] - s[i - c]
while sumWidth > m:
// c is not a solution. Improve upper limit:
c--
sumWidth -= s[i - c]
This means that in one sweep you might have several improvements for c. Of course, in the worst case, it leads to no improvement at all.
This concludes my algorithm. I estimate it will perform well on random input, but still looks like O(n²).
But I have a few observations, which I have not used in the above algorithm:
When you have found the column widths for a certain c, but the total width is greater than m, then this result can still be put to good use for the case c'=c/2. It is then not necessary to go through all string widths. It suffices to take the sum(max(s[i], s[i+c']) for i in 0..c'-1. The same principle holds for other divisors of c.
I did not use this, because if you have to go down from c all the way to c/2 without finding a solution, you already have spent O(n²) on the algorithm. But maybe it can serve a purpose in another algorithm...
When zero-length strings are not allowed, then an algorithm can be made to be O(m.n), as the number of possible solutions (values for c) is limited to m and to determine whether one of these is a solution, requires only one sweep through all the widths.
I have allowed for zero-length strings.
Initially I was looking into a binary search for c, dividing it by 2 and going for one of the remaining halves. But this method cannot be used, because even when a certain c is found to not be a solution, this does not exclude there being a c' > c that is a solution.
Hopefully there is something in this answer you find useful.
An obvious way to solve this problem is to iterate through all string lengths
in some predetermined order, update width of the column where each string
belongs, and stop when sum of column widths exceeds terminal width. Then repeat
this process for decremented number of columns (or for incremented number of rows for
"arrange down" case) until success. Three possible choices for this predetermined
order:
by row (consider all strings in the first row, then in row 2, 3, ...)
by column (consider all strings in the first column, then in column 2, 3, ...)
by value (sort strings by length, then iterate them in sorted order, starting from the longest one).
These 3 approaches are OK for "arrange across" sub-problem. But for "arrange down"
sub-problem they all have worst case time complexity O(n2). And first two
methods show quadratic complexity even for random data. "By value" approach is pretty
good for random data, but it is easy to find worst case scenario: just assign short
strings to first half of the list and long strings - to second half.
Here I describe an algorithm for "arrange down" case that does not have these
disadvantages.
Instead of inspecting each string length separately, it determines width of each
column in O(1) time with the help of range maximum query (RMQ). Width of the first
column is just the maximum value in range (0 .. num_of_rows), width of next one is
the maximum value in range (num_of_rows .. 2*num_of_rows), etc.
And to answer each query in O(1) time, we need to prepare an array of maximum
values in ranges (0 .. 2k), (1 .. 1 + 2k), ..., where k is
the largest integer such that 2k is not greater than current number of rows. Each range maximum query is computed as maximum of two entries from this array.
When number of rows starts to be too large, we should update this query array from k to k+1 (each such update needs O(n) range queries).
Here is C++14 implementation:
template<class PP>
uint32_t arrangeDownRMQ(Vec& input)
{
auto rows = getMinRows(input);
auto ranges = PP{}(input, rows);
while (rows <= input.size())
{
if (rows >= ranges * 2)
{ // lazily update RMQ data
transform(begin(input) + ranges, end(input), begin(input),
begin(input), maximum
);
ranges *= 2;
}
else
{ // use RMQ to get widths of columns and check if all columns fit
uint32_t sum = 0;
for (Index i = 0; sum <= kPageW && i < input.size(); i += rows)
{
++g_complexity;
auto j = i + rows - ranges;
if (j < input.size())
sum += max(input[i], input[j]);
else
sum += input[i];
}
if (sum <= kPageW)
{
return uint32_t(ceilDiv(input.size(), rows));
}
++rows;
}
}
return 0;
}
Here PP is optional, for simple case this function object does nothing and
returns 1.
To determine the worst case time complexity of this algorithm, note that outer loop starts
with rows = n * v / m (where v is average string length, m is page width) and
stops with at most rows = n * w / m (where w is largest string length).
Number of iterations in "query" loop is not greater than the number of columns or
n / rows. Adding these iterations together gives O(n * (ln(n*w/m) - ln(n*v/m)))
or O(n * log(w/v)). Which means linear time with small constant factor.
We should add here time to perform all update operations which is O(n log n) to get
complexity of whole algorithm: O(n * log n).
If we perform no update operations until some query operations are done, time needed
for update operations as well as algorithm complexity decreases to O(n * log(w/v)).
To make this possible we need some algorithm that fills RMQ array with maximums
of sub-arrays of given length. I tried two possible approaches and it seems
algorithm with pair of stacks
is slightly faster. Here is C++14 implementation (pieces of input array are used to implement both stacks to lower memory requirements and to simplify the code):
template<typename I, typename Op>
auto transform_partial_sum(I lbegin, I lend, I rbegin, I out, Op op)
{ // maximum of the element in first enterval and prefix of second interval
auto sum = typename I::value_type{};
for (; lbegin != lend; ++lbegin, ++rbegin, ++out)
{
sum = op(sum, *rbegin);
*lbegin = op(*lbegin, sum);
}
return sum;
}
template<typename I>
void reverse_max(I b, I e)
{ // for each element: maximum of the suffix starting from this element
partial_sum(make_reverse_iterator(e),
make_reverse_iterator(b),
make_reverse_iterator(e),
maximum);
}
struct PreprocRMQ
{
Index operator () (Vec& input, Index rows)
{
if (rows < 4)
{ // no preprocessing needed
return 1;
}
Index ranges = 1;
auto b = begin(input);
while (rows >>= 1)
{
ranges <<= 1;
}
for (; b + 2 * ranges <= end(input); b += ranges)
{
reverse_max(b, b + ranges);
transform_partial_sum(b, b + ranges, b + ranges, b, maximum);
}
// preprocess inconvenient tail part of the array
reverse_max(b, b + ranges);
const auto d = end(input) - b - ranges;
const auto z = transform_partial_sum(b, b + d, b + ranges, b, maximum);
transform(b + d, b + ranges, b + d, [&](Data x){return max(x, z);});
reverse_max(b + ranges, end(input));
return ranges;
}
};
In practice there is much higher chance to see a short word than a long word.
Shorter words outnumber longer ones in English texts, shorter text representations
of natural numbers prevail as well. So I chosen (slightly modified) geometric distribution of string
lengths to evaluate various algorithms. Here is whole benchmarking program (in
C++14 for gcc). Older version of the same program contains some obsolete tests and different implementation of some algorithms: TL;DR. And here are the results:
For "arrange across" case:
"by column" approach is the slowest
"by value" approach is good for n = 1000..500'000'000
"by row" approach is better for small and very large sizes
For "arrange down" case:
"by value" approach is OK, but slower than other alternatives (one possibility to make it faster is to implement divisions via multiplications), also it uses more memory and has quadratic worst-case time
simple RMQ approach is better
RMQ with preprocessing the best: it has linear time complexity and its memory bandwidth requirements are pretty low.
It may be also interesting to see number of iterations needed for each algorithm.
Here all predictable parts are excluded (sorting, summing, preprocessing, and RMQ
updates):
This is in reference to this problem. We are required to calculate f(n , k), which is the number of binary strings of length n that have the length of the longest substring of ones as k. I am having trouble coming up with a recursion.
The case when the ith digit is a 0 , i think i can handle.
Specifically, I am unable to extend the solution to a sub-problem f(i-1 , j) , when I consider the ith digit to be a 1. how do i stitch the two together?
Sorry if I am a bit unclear. Any pointers would be a great help. Thanks.
I think you could build up a table using a variation of dynamic programming, if you expand the state space. Suppose that you calculate f(n,k,e) defined as the number of different binary strings of length n with the longest substring of 1s length at most k and ending with e 1s in a row. If you have calculated f(n,k,e) for all possible values of k and e associated with a given n, then, because you have the values split up by e, you can calculate f(n+1,k,e) for all possible values of k and e - what happens to an n-long string when you extend it with 0 or 1 depends on how many 1s it ends with at the moment, and you know that because of e.
Let s be the start index of the length k pattern. Then s is in: 1 to n-k.
For each s, we divide the Sting S into three strings:
PRE(s,k,n) = S[1:s-1]
POST(s,k,n)=S[s+k-1:n]
ONE(s,k,n) which has all 1s from S[s] to S[s+k-1]
The longest sub-string of 1s for PRE and POST should be less than k.
Let
x = s-1
y = n-(s+k)-1
Let NS(p,k) is total number of ways you can have a longest sub-string of size greater than equal to k.
NS(p,k) = sum{f(p,k), f(p,k+1),... f(p,p)}
Terminating condition:
NS(p,k) = 1 if p==k, 0 if k>p
f(n,k) = 1 if n==k, 0, if k > n.
For a string of length n, the number of permutations such that the longest substring of 1s is of size less than k = 2^n - NS(n,k).
f(n,k) = Sum over all s=1 to n-k
{2^x - NS(x,k)}*{2^y - NS(y,k)}
i.e. product of the number of permutations of each of the pre and post substrings where the longest sub-string is less than size k.
So we have a repeating sub-problem, and a whole bunch of reuse which can be DPed
Added Later:
Based on the comment below, I guess we really do not need to go into NS.
We can define S(p,k) as
S(p,k) = sum{f(p,1), f(p,2),... f(p,k-1)}
and
f(n,k) = Sum over all s=1 to n-k
S(x,k)*S(y,k)
I know this is quite an old question if any one wants I can clarify my small answer..
Here is my code
#include<bits/stdc++.h>
using namespace std;
long long DP[64][64];
int main()
{
ios::sync_with_stdio(0);
cin.tie(0);
int i,j,k;
DP[1][0]=1;
DP[1][1]=1;
DP[0][0]=1;
cout<<"1 1\n";
for(i=2;i<=63;i++,cout<<"\n")
{
DP[i][0]=1;
DP[i][i]=1;
cout<<"1 ";
for(j=1;j<i;j++)
{
for(k=0;k<=j;k++)
DP[i][j]+=DP[i-k-1][j]+DP[i-j-1][k];
DP[i][j]-=DP[i-j-1][j];
cout<<DP[i][j]<<" ";
}
cout<<"1 ";
}
return 0;
}
DP[i][j] represents F(i,j) .
Transitions/Recurrence (Hard to think):
Considering F(i,j):
1)I can put k 1s on the right and seperate them using a 0 i.e
String + 0 + k times '1' .
F(i-k-1,j)
Note : k=0 signifies I am only keeping 0 at the right!
2) I am missing out the ways in which the right j+1 positions are filled with 0 and j '1' s and All the left do not form any consecutive string of length j !!
F(i-j-1,k) (Note I have used k to signify both just because I have done so in my Code , you can define other variables too!)
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Square Subsequence
I have been trying to solve the "Square Subsequences" problem on interviewstreet.com:
A string is called a square string if it can be obtained by concatenating two copies of the same string. For example, "abab", "aa" are square strings, while "aaa", "abba" are not.
Given a string, how many subsequences of the string are square strings?
I tried working out a DP solution, but this constraint seems impossible to circumvent: S will have at most 200 lowercase characters (a-z).
From what I know, finding all subsequences of a list of length n is O(2^n), which stops being feasible as soon as n is larger than, say, 30.
Is it really possible to systematically check all solutions if n is 200? How do I approach it?
First, for every letter a..z you get a list of their indices in S:
`p[x] = {i : S[i] = x}`, where `x = 'a',..,'z'`.
Then we start DP:
S: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
^ ^ ^
r1 l2 r2
Let f(r1,l2,r2) be the number of square subsequences (subsequences that are square strings) of any length L such that
SS[L-1] = r1
SS[L] = l2
SS[2L-1] = r2
i.e. the first half ends exactly at r1, the second half starts exactly at l2 and ends at r2.
The algorithm is then:
Let f[r1,l2,l2] = 1 if S[r1] = S[l2], else 0.
for (l2 in 1..2L-1 )
for( r1 in 0..l2-1 )
for (r2 in l2..2L-1)
if( f(r1, l2, r2) != 0 )
for (x in 'a'..'z')
for (i,j: r1 < i < l2, r2 < j, S[i] = S[j] = x) // these i,j are found using p[x] quickly
f[i, l2, j] += f[r1, l2, r2]
In the end, the answer is the sum of all the values in the f[.,.,.] array.
So basically, we divide S unisg l2 into two parts and then count the common subsequences.
It's hard for me to provide exact time complexity estimation right now, it's surely below n^4 and n^4 is acceptable for n = 200.
There are many algorithms (e.g. Z-algorithm) which can in linear time generate an array of prefix lengths. That is for every position i it tells you what is the longest prefix that can be read starting from position i (of course to i = 0 the longetst prefix is n).
Now notice that if you have a square string starting at the beginning, then there is a position k in this prefix length array such that the longest length is >=k. So you can count the number of those in linear time again.
Then remove the first letter of you string and do the same thing.
The total complexity of this would be O(n^2).
The question is Number of solutions to a1 x1+a2 x2+....+an xn=k with constraints: 1)ai>0 and ai<=15 2)n>0 and n<=15 3)xi>=0 I was able to formulate a Dynamic programming solution but it is running too long for n>10^10. Please guide me to get a more efficient soution.
The code
int dp[]=new int[16];
dp[0]=1;
BigInteger seen=new BigInteger("0");
while(true)
{
for(int i=0;i<arr[0];i++)
{
if(dp[0]==0)
break;
dp[arr[i+1]]=(dp[arr[i+1]]+dp[0])%1000000007;
}
for(int i=1;i<15;i++)
dp[i-1]=dp[i];
seen=seen.add(new BigInteger("1"));
if(seen.compareTo(n)==0)
break;
}
System.out.println(dp[0]);
arr is the array containing coefficients and answer should be mod 1000000007 as the number of ways donot fit into an int.
Update for real problem:
The actual problem is much simpler. However, it's hard to be helpful without spoiling it entirely.
Stripping it down to the bare essentials, the problem is
Given k distinct positive integers L1, ... , Lk and a nonnegative integer n, how many different finite sequences (a1, ..., ar) are there such that 1. for all i (1 <= i <= r), ai is one of the Lj, and 2. a1 + ... + ar = n. (In other words, the number of compositions of n using only the given Lj.)
For convenience, you are also told that all the Lj are <= 15 (and hence k <= 15), and n <= 10^18. And, so that the entire computation can be carried out using 64-bit integers (the number of sequences grows exponentially with n, you wouldn't have enough memory to store the exact number for large n), you should only calculate the remainder of the sequence count modulo 1000000007.
To solve such a problem, start by looking at the simplest cases first. The very simplest cases are when only one L is given, then evidently there is one admissible sequence if n is a multiple of L and no admissible sequence if n mod L != 0. That doesn't help yet. So consider the next simplest cases, two L values given. Suppose those are 1 and 2.
0 has one composition, the empty sequence: N(0) = 1
1 has one composition, (1): N(1) = 1
2 has two compositions, (1,1); (2): N(2) = 2
3 has three compositions, (1,1,1);(1,2);(2,1): N(3) = 3
4 has five compositions, (1,1,1,1);(1,1,2);(1,2,1);(2,1,1);(2,2): N(4) = 5
5 has eight compositions, (1,1,1,1,1);(1,1,1,2);(1,1,2,1);(1,2,1,1);(2,1,1,1);(1,2,2);(2,1,2);(2,2,1): N(5) = 8
You may see it now, or need a few more terms, but you'll notice that you get the Fibonacci sequence (shifted by one), N(n) = F(n+1), thus the sequence N(n) satisfies the recurrence relation
N(n) = N(n-1) + N(n-2) (for n >= 2; we have not yet proved that, so far it's a hypothesis based on pattern-spotting). Now, can we see that without calculating many values? Of course, there are two types of admissible sequences, those ending with 1 and those ending with 2. Since that partitioning of the admissible sequences restricts only the last element, the number of ad. seq. summing to n and ending with 1 is N(n-1) and the number of ad. seq. summing to n and ending with 2 is N(n-2).
That reasoning immediately generalises, given L1 < L2 < ... < Lk, for all n >= Lk, we have
N(n) = N(n-L1) + N(n-L2) + ... + N(n-Lk)
with the obvious interpretation if we're only interested in N(n) % m.
Umm, that linear recurrence still leaves calculating N(n) as an O(n) task?
Yes, but researching a few of the mentioned keywords quickly leads to an algorithm needing only O(log n) steps ;)
Algorithm for misinterpreted problem, no longer relevant, but may still be interesting:
The question looks a little SPOJish, so I won't give a complete algorithm (at least, not before I've googled around a bit to check if it's a contest question). I hope no restriction has been omitted in the description, such as that permutations of such representations should only contribute one to the count, that would considerably complicate the matter. So I count 1*3 + 2*4 = 11 and 2*4 + 1*3 = 11 as two different solutions.
Some notations first. For m-tuples of numbers, let < | > denote the canonical bilinear pairing, i.e.
<a|x> = a_1*x_1 + ... + a_m*x_m. For a positive integer B, let A_B = {1, 2, ..., B} be the set of positive integers not exceeding B. Let N denote the set of natural numbers, i.e. of nonnegative integers.
For 0 <= m, k and B > 0, let C(B,m,k) = card { (a,x) \in A_B^m × N^m : <a|x> = k }.
Your problem is then to find \sum_{m = 1}^15 C(15,m,k) (modulo 1000000007).
For completeness, let us mention that C(B,0,k) = if k == 0 then 1 else 0, which can be helpful in theoretical considerations. For the case of a positive number of summands, we easily find the recursion formula
C(B,m+1,k) = \sum_{j = 0}^k C(B,1,j) * C(B,m,k-j)
By induction, C(B,m,_) is the convolution¹ of m factors C(B,1,_). Calculating the convolution of two known functions up to k is O(k^2), so if C(B,1,_) is known, that gives an O(n*k^2) algorithm to compute C(B,m,k), 1 <= m <= n. Okay for small k, but our galaxy won't live to see you calculating C(15,15,10^18) that way. So, can we do better? Well, if you're familiar with the Laplace-transformation, you'll know that an analogous transformation will convert the convolution product to a pointwise product, which is much easier to calculate. However, although the transformation is in this case easy to compute, the inverse is not. Any other idea? Why, yes, let's take a closer look at C(B,1,_).
C(B,1,k) = card { a \in A_B : (k/a) is an integer }
In other words, C(B,1,k) is the number of divisors of k not exceeding B. Let us denote that by d_B(k). It is immediately clear that 1 <= d_B(k) <= B. For B = 2, evidently d_2(k) = 1 if k is odd, 2 if k is even. d_3(k) = 3 if and only if k is divisible by 2 and by 3, hence iff k is a multiple of 6, d_3(k) = 2 if and only if one of 2, 3 divides k but not the other, that is, iff k % 6 \in {2,3,4} and finally, d_3(k) = 1 iff neither 2 nor 3 divides k, i.e. iff gcd(k,6) = 1, iff k % 6 \in {1,5}. So we've seen that d_2 is periodic with period 2, d_3 is periodic with period 6. Generally, like reasoning shows that d_B is periodic for all B, and the minimal positive period divides B!.
Given any positive period P of C(B,1,_) = d_B, we can split the sum in the convolution (k = q*P+r, 0 <= r < P):
C(B,m+1, q*P+r) = \sum_{c = 0}^{q-1} (\sum_{j = 0}^{P-1} d_B(j)*C(B,m,(q-c)*P + (r-j)))
+ \sum_{j = 0}^r d_B(j)*C(B,m,r-j)
The functions C(B,m,_) are no longer periodic for m >= 2, but there are simple formulae to obtain C(B,m,q*P+r) from C(B,m,r). Thus, with C(B,1,_) = d_B and C(B,m,_) known up to P, calculating C(B,m+1,_) up to P is an O(P^2) task², getting the data necessary for calculating C(B,m+1,k) for arbitrarily large k, needs m such convolutions, hence that's O(m*P^2).
Then finding C(B,m,k) for 1 <= m <= n and arbitrarily large k is O(n^2*P^2), in time and O(n^2*P) in space.
For B = 15, we have 15! = 1.307674368 * 10^12, so using that for P isn't feasible. Fortunately, the smallest positive period of d_15 is much smaller, so you get something workable. From a rough estimate, I would still expect the calculation of C(15,15,k) to take time more appropriately measured in hours than seconds, but it's an improvement over O(k) which would take years (for k in the region of 10^18).
¹ The convolution used here is (f \ast g)(k) = \sum_{j = 0}^k f(j)*g(k-j).
² Assuming all arithmetic operations are O(1); if, as in the OP, only the residue modulo some M > 0 is desired, that holds if all intermediate calculations are done modulo M.