I'm re-attempting this problem statement, now that the contest is all over (so it's not cheating or anything, just want to learn, since the answers are not published, only the correct output for the given test case input files).
There are 10 given test case inputs, for which the associated output files are to be submitted. My original submission was an implementation of a naive nested for loop of (start,end) pairs, answering the query: What is the volatility measure of the substring starting at (0-based) index start, and ending at end (inclusive).
Clearly, for the maximum problem limits of 106, O(N2) is infeasible, so I only got as far as 5/10 test cases correct (the first - and simpler - 5, of course).
As such, I'm writing here to seek the crowd intelligence on how I could go about improving my algorithm, namely I suspect the nested for loops (start,end) is the main bottleneck to optimize (of course!) So far, I've gone down the route of trying to formulate this as a dynamic programming (DP) on strings/substrings problem, but without any much success on coming up with the state representation and transition bits so that the DP can be implemented.
For easy reference, and to show that this is not homework, and I have honestly tried, my original submission is available here.
Any help is much appreciated, even links to similar problems for which I can google for tutorial blog posts/sample solutions/post-contest editorial analysis.
Have you tried divide-and-conquer?
If I understand the problem correctly, given a DNA chain S of length n, we divide S into two halves, S_left and S_right, with S_left consisting of S[i] where 0 <= i < n/2, and S_right consisting of S[j] where (n/2)+1 <= j < n. The most volatile fragment either occurs entirely within S_left, entirely within S_right, or crosses the boundary of S_left and S_right.
To find the most volatile fragment of S_left and S_right we just use recursion. THe tricky bit is to find the volatility measure of the fragment which crosses the boundary of S_left and S_right. There is a mathematical property of positive integer fraction here: given four positive (non-zero) integers a, b, c and d, (a + c) / (b + d) is never greater than both (a / b) and (c / d). Here a and b are the cumulative count of purines and pyrimidines in S_left starting at the boundary, while c and d are the cumulative count of purines and pyrimidines in S_right starting at the boundary. This mathematical property means that we don't need to examine the volatility measure of the crossing fragment beyond a = 0 or c = 0 because it is guaranteed to be less than the maximum volatility of S_left or S_right. The time complexity of such search can be done in O(n) for the crossing fragment, and O(n lg n) for the overall algorithm.
Hope this works as I haven't coded the algorithm. Perhaps it has a O(n) time DP algorithm for this problem but this is all I've got for now.
Related
I am trying to implement the Held-Karp algorithm for the Traveling Salesman Problem by following this pseudocode:
(which I found here: https://en.wikipedia.org/wiki/Held%E2%80%93Karp_algorithm#Example.5B4.5D )
I can do the algorithm by hand but am having trouble actually implementing it in code. It would be great if someone could provide an easy-to-follow explanation.
I also don't understand this:
I thought this part was for setting the distance from the starting city to it's connected cities. If that was the case, wouldn't it be it C({1}, k) := d1,k and not C({k}, k) := d1,k? Am I just completely misunderstanding this?
I have also heard that this algorithm does not perform very well past about 15-20 cities so for around 40 cities, what would be a good alternative?
Held-Karp is a dynamic programming approach.
In dynamic programming, you break the task into subtasks and use "dynamic function" to solve larger subtasks using already computed results of smaller subtasks, until you finally solve your task.
To understand a DP algorithm it's imperative to understand how it defines subtask and dynamic function.
In the case of Held-Karp, the subtask is following:
For a given set of vertices S and a vertex k (1 ∉ S, k ∈ S)
C(S,k) is the minimal length of the path that starts with vertex 1, traverses all vertices in S and ends with the vertex k.
Given this subtask definition, it's clear why initialization is:
C({k}, k) := d(1,k)
The minimal length of the path from 1 to k, traversing through {k}, is just the edge from 1 to k.
Next, the "dynamic function".
A side note, DP algorithm could be written as top-down or bottom-up. This pseudocode is bottom-up, meaning it computes smaller tasks first and uses their results for larger tasks. To be more specific, it computes tasks in the order of increasing size of the set S, starting from |S|=1 and going up to |S| = n-1 (i.e. S containing all vertices, except 1).
Now, consider a task, defined by some S, k. Remember, it corresponds to path from 1, through S, ending in k.
We break it into a:
path from 1, through all vertices in S except k (S\k), which ends in the vertex m (m ∈ S, m ≠ k): C(S\k, m)
an edge from m to k
It's easy to see, that if we look through all possible ways to break C(S,k) like this, and find the minimal path among them, we'll have the answer for C(S, k).
Finally, having computed all C(S, k) for |S| = n-1, we check all of them, completing the cycle with the missing edge from k to 1: d(1,k). The minimal cycle is the final result.
Regarding:
I have also heard that this algorithm does not perform very well past about 15-20 cities so for around 40 cities, what would be a good alternative?
Held-Karp has algorithmic complexity of θ(n²2n). 40² * 240 ≈ 1.75 * 1015 which, I would say, is unfeasible to compute on a single machine in reasonable time.
As David Eisenstat suggested, there are approaches using mixed integer programming that can solve this problem fast enough for N=40.
For example, see this blog post, and this project that builds upon it.
This was on my last comp stat qual. I gave an answer I thought was pretty good. We just get our score on the exam, not whether we got specific questions right. Hoping the community can give guidance on this one, I am not interested in the answer so much as what is being tested and where I can go read more about it and get some practice before the next exam.
At first glance it looks like a time complexity question, but when it starts talking about mapping-functions and pre-sorting data, I am not sure how to handle.
So how would you answer?
Here it is:
Given a set of items X = {x1, x2, ..., xn} drawn from some domain Z, your task is to find if a query item q in Z occurs in the set. For simplicity you may assume each item occurs exactly once in X and that it takes O(l) amount of time to compare any two items in Z.
(a) Write pseudo-code for an algorithm which checks if q in X. What is the worst case time complexity of your algorithm?
(b) If l is very large (e.g. if each element of X is a long video) then one needs efficient algorithms to check if q \in X. Suppose you are given access to k functions h_i: Z -> {1, 2, ..., m} which uniformly map an element of Z to a number between 1 and m, and let k << l and m > n.
Write pseudo-code for an algorithm which uses the function h_1...h_k to check if q \in X. Note that you are allowed to preprocess the data. What is the worst case time complexity of your algorithm?
Be explicit about the inputs, outputs, and assumptions in your pseudocode.
The first seems to be a simple linear scan. The time complexity is O(n * l), the worst case is to compare all elements. Note - it cannot be sub-linear with n, since there is no information if the data is sorted.
The second (b) is actually a variation of bloom-filter, which is a probabalistic way to represent a set. Using bloom filters - you might have false positives (say something is in the set while it is not), but never false negative (say something is not int the set, while it is).
I want to find groups of contiguous cells in a matrix.
So for example let’s consider a 2D matrix below.
In the given matrix there are 2 contiguous groups of cells with value 1:
Here is one way to find these groups:
Assign 1st cell with value 1 a different value: let’s say A. Then examine cells with value 1 which are adjacent to A and set the value in those cells as A. Search this way until no more contiguous cells are found.
In the next step increment A to B and start with a cell having value 1. Then follow the same steps as above.
This is kind of brute force and it won’t be efficient in 3D. Does anyone know of any algorithm that I could use with a little tweaking?
Or any easy solution to this problem?
What you are trying to do goes, often, under the label connected component labelling. I won't elaborate further, the Wikipedia article explains matters better than I could or would.
But while I'm answering ...
You, and many others on SO, seem to think that simple iteration over all the elements of an array, which you characterise by the derogatory term brute-force, is something to be avoided at all costs. Modern computers are very, very fast. Accessing each element of an array in order is something that most compilers can optimise the hell out of.
You seem to have fallen into the trap of thinking that accessing every element of a 3D array has time-complexity in O(n^3), where n is the number of elements along each dimension of the array. It isn't: accessing elements of an array is in O(n) where n is the number of elements in the array.
Even if the time complexity of visiting each element in the array were in O(n^3) many sophisticated algorithms which offer better asymptotic time complexity, will prove, in practice, to deliver worse performance than the simpler algorithm. Many sophisticated algorithms make it much harder for the compiler to optimise code. And, bear in mind, that O(n^2) is an equivalence class of algorithms which includes algorithms with true time complexities such as O(m+k*n^2) where both of m and k are constants
Here is some psuedo code for a simple flood fill algorithm:
>>> def flood(i, j, matrix):
... if 0 <= i < len(matrix) and 0 <= j < len(matrix):
... if matrix[i][j] == 1:
... matrix[i][j] = 0
... for dx, dy in ((-1, 0), (1, 0), (0, -1), (0, 1)):
... flood(i + dx, j + dy, matrix)
>>> count = 0
>>> while True:
... i, j = get_one(matrix)
... if i and j: #found a one
... count += 1
... flood(i, j, matrix)
This is just the same like finding strongly connected components in a graph, but with the whole thing extended to 3 dimensions. So you can use any of the linear time algorithms for 2D graphs and adapt the DFS also for the 3rd dimension. This should be straight forward.
As those algorithms are linear time you can't get better in terms of running time complexity.
I'm pretty sure that this is the right site for this question, but feel free to move it to some other stackexchange site if it fits there better.
Suppose you have a sum of fractions a1/d1 + a2/d2 + … + an/dn. You want to compute a common numerator and denominator, i.e., rewrite it as p/q. We have the formula
p = a1*d2*…*dn + d1*a2*d3*…*dn + … + d1*d2*…d(n-1)*an
q = d1*d2*…*dn.
What is the most efficient way to compute these things, in particular, p? You can see that if you compute it naïvely, i.e., using the formula I gave above, you compute a lot of redundant things. For example, you will compute d1*d2 n-1 times.
My first thought was to iteratively compute d1*d2, d1*d2*d3, … and dn*d(n-1), dn*d(n-1)*d(n-2), … but even this is inefficient, because you will end up computing multiplications in the "middle" twice (e.g., if n is large enough, you will compute d3*d4 twice).
I'm sure this problem could be expressed somehow using maybe some graph theory or combinatorics, but I haven't studied enough of that stuff to have a good feel for it.
And one note: I don't care about cancelation, just the most efficient way to multiply things.
UPDATE:
I should have known that people on stackoverflow would be assuming that these were numbers, but I've been so used to my use case that I forgot to mention this.
We cannot just "divide" out an from each term. The use case here is a symbolic system. Actually, I am trying to fix a function called .as_numer_denom() in the SymPy computer algebra system which presently computes this the naïve way. See the corresponding SymPy issue.
Dividing out things has some problems, which I would like to avoid. First, there is no guarantee that things will cancel. This is because mathematically, (a*b)**n != a**n*b**n in general (if a and b are positive it holds, but e.g., if a == b ==-1 and n == 1/2, you get (a*b)**n == 1**(1/2) == 1 but (-1)**(1/2)*(-1)**(1/2) == I*I == -1). So I don't think it's a good idea to assume that dividing by an will cancel it in the expression (this may be actually be unfounded, I'd need to check what the code does).
Second, I'd like to also apply a this algorithm to computing the sum of rational functions. In this case, the terms would automatically be multiplied together into a single polynomial, and "dividing" out each an would involve applying the polynomial division algorithm. You can see in this case, you really do want to compute the most efficient multiplication in the first place.
UPDATE 2:
I think my fears for cancelation of symbolic terms may be unfounded. SymPy does not cancel things like x**n*x**(m - n) automatically, but I think that any exponents that would combine through multiplication would also combine through division, so powers should be canceling.
There is an issue with constants automatically distributing across additions, like:
In [13]: 2*(x + y)*z*(S(1)/2)
Out[13]:
z⋅(2⋅x + 2⋅y)
─────────────
2
But this is first a bug and second could never be a problem (I think) because 1/2 would be split into 1 and 2 by the algorithm that gets the numerator and denominator of each term.
Nonetheless, I still want to know how to do this without "dividing out" di from each term, so that I can have an efficient algorithm for summing rational functions.
Instead of adding up n quotients in one go I would use pairwise addition of quotients.
If things cancel out in partial sums then the numbers or polynomials stay smaller, which makes computation faster.
You avoid the problem of computing the same product multiple times.
You could try to order the additions in a certain way, to make canceling more likely (maybe add quotients with small denominators first?), but I don't know if this would be worthwhile.
If you start from scratch this is simpler to implement, though I'm not sure it fits as a replacement of the problematic routine in SymPy.
Edit: To make it more explicit, I propose to compute a1/d1 + a2/d2 + … + an/dn as (…(a1/d1 + a2/d2) + … ) + an/dn.
Compute two new arrays:
The first contains partial multiples to the left: l[0] = 1, l[i] = l[i-1] * d[i]
The second contains partial multiples to the right: r[n-1] = 1, r[i] = d[i] * r[i+1]
In both cases, 1 is the multiplicative identity of whatever ring you are working in.
Then each of your terms on the top, t[i] = l[i-1] * a[i] * r[i+1]
This assumes multiplication is associative, but it need not be commutative.
As a first optimization, you don't actually have to create r as an array: you can do a first pass to calculate all the l values, and accumulate the r values during a second (backward) pass to calculate the summands. No need to actually store the r values since you use each one once, in order.
In your question you say that this computes d3*d4 twice, but it doesn't. It does multiply two different values by d4 (one a right-multiplication and the other a left-multiplication), but that's not exactly a repeated operation. Anyway, the total number of multiplications is about 4*n, vs. 2*n multiplications and n divisions for the other approach that doesn't work in non-commutative multiplication or non-field rings.
If you want to compute p in the above expression, one way to do this would be to multiply together all of the denominators (in O(n), where n is the number of fractions), letting this value be D. Then, iterate across all of the fractions and for each fraction with numerator ai and denominator di, compute ai * D / di. This last term is equal to the product of the numerator of the fraction and all of the denominators other than its own. Each of these terms can be computed in O(1) time (assuming you're using hardware multiplication, otherwise it might take longer), and you can sum them all up in O(n) time.
This gives an O(n)-time algorithm for computing the numerator and denominator of the new fraction.
It was also pointed out to me that you could manually sift out common denominators and combine those trivially without multiplication.
maybe you would have an idea on how to solve the following problem.
John decided to buy his son Johnny some mathematical toys. One of his most favorite toy is blocks of different colors. John has decided to buy blocks of C different colors. For each color he will buy googol (10^100) blocks. All blocks of same color are of same length. But blocks of different color may vary in length.
Jhonny has decided to use these blocks to make a large 1 x n block. He wonders how many ways he can do this. Two ways are considered different if there is a position where the color differs. The example shows a red block of size 5, blue block of size 3 and green block of size 3. It shows there are 12 ways of making a large block of length 11.
Each test case starts with an integer 1 ≤ C ≤ 100. Next line consists c integers. ith integer 1 ≤ leni ≤ 750 denotes length of ith color. Next line is positive integer N ≤ 10^15.
This problem should be solved in 20 seconds for T <= 25 test cases. The answer should be calculated MOD 100000007 (prime number).
It can be deduced to matrix exponentiation problem, which can be solved relatively efficiently in O(N^2.376*log(max(leni))) using Coppersmith-Winograd algorithm and fast exponentiation. But it seems that a more efficient algorithm is required, as Coppersmith-Winograd implies a large constant factor. Do you have any other ideas? It can possibly be a Number Theory or Divide and Conquer problem
Firstly note the number of blocks of each colour you have is a complete red herring, since 10^100 > N always. So the number of blocks of each colour is practically infinite.
Now notice that at each position, p (if there is a valid configuration, that leaves no spaces, etc.) There must block of a color, c. There are len[c] ways for this block to lie, so that it still lies over this position, p.
My idea is to try all possible colors and positions at a fixed position (N/2 since it halves the range), and then for each case, there are b cells before this fixed coloured block and a after this fixed colour block. So if we define a function ways(i) that returns the number of ways to tile i cells (with ways(0)=1). Then the number of ways to tile a number of cells with a fixed colour block at a position is ways(b)*ways(a). Adding up all possible configurations yields the answer for ways(i).
Now I chose the fixed position to be N/2 since that halves the range and you can halve a range at most ceil(log(N)) times. Now since you are moving a block about N/2 you will have to calculate from N/2-750 to N/2-750, where 750 is the max length a block can have. So you will have to calculate about 750*ceil(log(N)) (a bit more because of the variance) lengths to get the final answer.
So in order to get good performance you have to through in memoisation, since this inherently a recursive algorithm.
So using Python(since I was lazy and didn't want to write a big number class):
T = int(raw_input())
for case in xrange(T):
#read in the data
C = int(raw_input())
lengths = map(int, raw_input().split())
minlength = min(lengths)
n = int(raw_input())
#setup memoisation, note all lengths less than the minimum length are
#set to 0 as the algorithm needs this
memoise = {}
memoise[0] = 1
for length in xrange(1, minlength):
memoise[length] = 0
def solve(n):
global memoise
if n in memoise:
return memoise[n]
ans = 0
for i in xrange(C):
if lengths[i] > n:
continue
if lengths[i] == n:
ans += 1
ans %= 100000007
continue
for j in xrange(0, lengths[i]):
b = n/2-lengths[i]+j
a = n-(n/2+j)
if b < 0 or a < 0:
continue
ans += solve(b)*solve(a)
ans %= 100000007
memoise[n] = ans
return memoise[n]
solve(n)
print "Case %d: %d" % (case+1, memoise[n])
Note I haven't exhaustively tested this, but I'm quite sure it will meet the 20 second time limit, if you translated this algorithm to C++ or somesuch.
EDIT: Running a test with N = 10^15 and a block with length 750 I get that memoise contains about 60000 elements which means non-lookup bit of solve(n) is called about the same number of time.
A word of caution: In the case c=2, len1=1, len2=2, the answer will be the N'th Fibonacci number, and the Fibonacci numbers grow (approximately) exponentially with a growth factor of the golden ratio, phi ~ 1.61803399. For the
huge value N=10^15, the answer will be about phi^(10^15), an enormous number. The answer will have storage
requirements on the order of (ln(phi^(10^15))/ln(2)) / (8 * 2^40) ~ 79 terabytes. Since you can't even access 79
terabytes in 20 seconds, it's unlikely you can meet the speed requirements in this special case.
Your best hope occurs when C is not too large, and leni is large for all i. In such cases, the answer will
still grow exponentially with N, but the growth factor may be much smaller.
I recommend that you first construct the integer matrix M which will compute the (i+1,..., i+k)
terms in your sequence based on the (i, ..., i+k-1) terms. (only row k+1 of this matrix is interesting).
Compute the first k entries "by hand", then calculate M^(10^15) based on the repeated squaring
trick, and apply it to terms (0...k-1).
The (integer) entries of the matrix will grow exponentially, perhaps too fast to handle. If this is the case, do the
very same calculation, but modulo p, for several moderate-sized prime numbers p. This will allow you to obtain
your answer modulo p, for various p, without using a matrix of bigints. After using enough primes so that you know their product
is larger than your answer, you can use the so-called "Chinese remainder theorem" to recover
your answer from your mod-p answers.
I'd like to build on the earlier #JPvdMerwe solution with some improvements. In his answer, #JPvdMerwe uses a Dynamic Programming / memoisation approach, which I agree is the way to go on this problem. Dividing the problem recursively into two smaller problems and remembering previously computed results is quite efficient.
I'd like to suggest several improvements that would speed things up even further:
Instead of going over all the ways the block in the middle can be positioned, you only need to go over the first half, and multiply the solution by 2. This is because the second half of the cases are symmetrical. For odd-length blocks you would still need to take the centered position as a seperate case.
In general, iterative implementations can be several magnitudes faster than recursive ones. This is because a recursive implementation incurs bookkeeping overhead for each function call. It can be a challenge to convert a solution to its iterative cousin, but it is usually possible. The #JPvdMerwe solution can be made iterative by using a stack to store intermediate values.
Modulo operations are expensive, as are multiplications to a lesser extent. The number of multiplications and modulos can be decreased by approximately a factor C=100 by switching the color-loop with the position-loop. This allows you to add the return values of several calls to solve() before doing a multiplication and modulo.
A good way to test the performance of a solution is with a pathological case. The following could be especially daunting: length 10^15, C=100, prime block sizes.
Hope this helps.
In the above answer
ans += 1
ans %= 100000007
could be much faster without general modulo :
ans += 1
if ans == 100000007 then ans = 0
Please see TopCoder thread for a solution. No one was close enough to find the answer in this thread.