Parallelize dynamic programming solution for longest common subsequence (LCS)

Parallelize dynamic programming solution for longest common subsequence (LCS) - parallel-processing

Below is a standard dynamic programming solution for finding the length of the longest common subsequence (LCS) among three strings
procedure THREECOUNT(A, B, C, m, n, o)
Where A is an array of characters of size m
Where B is an array of characters of size n
Where C is an array of characters of size o
Let V be a three dimensional array of the size m * n * o.
Let maxV alue be an integer initialized to 0.
for i 0 to m do
V [i; 0; 0] = 0
for j 0 to n do
V [0; j; 0] = 0
for k 0 to o do
V [0; 0; o] = 0
for i 1 to m do
for j 1 to n do
for k 1 to o do
if A[i - 1] == B[j - 1] and B[j - 1] == C[k - 1]
V [i][j][k] = V [i - 1][j - 1][k - 1] + 1
else
V [i][j][k] = max(V [i -1 1][j][k], V [i][j -1 1][k], V [i][j][k - 1])
if V [i][j][k] > maxV alue
maxV alue = V [i][j][k]
return maxV alue which is an integer
Since we only care about the length of LCS we can traverse the diagonal of the 3D DP table. Then I am stuck with how to parallelize this algorithm. The solution claims that parallelization can achieve a span of O(min(m,n,o)log(max(m,n,o)) while remaining work-optimal i.e. O(mno) which I have no idea about how to achieve it. Could somebody help to point out how to parallelize this DP solution?

Related

Finding the longest consecutive sequence of integers that appears in both integers

I am given two integers X and Y. The goal is to find the longest consecutive sequence of integers that appears in both X and Y. So, if X = 124534891 and Y = 324534768, then the output would be 24534 since we have 124534891 and 324534768. The integers can be of different length.
I am trying to design a dynamic algorithm solution, but I'm completely lost.

This is a modification of Longest common substring problem.
Consider the numbers as strings and apply the same algorithm as for LCS problem.
Here's pseudocode to get started,
function maxConsecutiveSequence(A, B):
S[N] = toString(A)
T[M] = toString(B)
L = array(N, M)
len = 0
ans = {}
for i = 1 to r
for j = 1 to n
if S[i] = T[j]
if i = 1 or j = 1
L[i][j] = 1
else
L[i][j] = L[i - 1][j - 1] + 1
if L[i][j] > len
len = L[i][j]
ans = {S[i − z + 1..i]}
else if L[i][j] = len
ans = ans ∪ {S[i − z + 1..i]}
else
L[i][j] = 0
return ans

Formulating dp problem [Codeforces 414 B]

all here is the problem statement from an old contest on codeforces
A sequence of l integers b 1, b 2, ..., b l (1 ≤ b 1 ≤ b 2 ≤ ... ≤ b
l ≤ n) is called good if each number divides (without a remainder) by
the next number in the sequence. More formally for all i
(1 ≤ i ≤ l - 1).
Given n and k find the number of good sequences of length k. As the
answer can be rather large print it modulo 1000000007 (109 + 7).
I have formulated my dp[i][j] as the number of good sequences of length i which ends with the jth number, and the transition table as the following pseudocode
dp[k][n] =
for each factor of n as i do
for j from 1 to k - 1
dp[k][n] += dp[j][i]
end
end
But in the editorial it is given as
Lets define dp[i][j] as number of good sequences of length i that ends in j.
Let's denote divisors of j by x1, x2, ..., xl. Then dp[i][j] = sigma dp[i - 1][xr]
But in my understanding, we need two sigmas, one for the divisors and the other for length. Please help me correct my understanding.
My code ->
MOD = 10 ** 9 + 7
N, K = map(int, input().split())
dp = [[0 for _ in range(N + 1)] for _ in range(K + 1)]
for k in range(1, K + 1):
for n in range(1, N + 1):
c = 1
for i in range(1, n):
if n % i != 0:
continue
for j in range(1, k):
c += dp[j][i]
dp[k][n] = c
c = 0
for i in range(1, N + 1):
c = (c + dp[K][i]) % MOD
print(c)
Link to the problem: https://codeforces.com/problemset/problem/414/B

So let's define dp[i][j] as the number of good sequences of length exactly i and which ends with a value j as its last element.
Now, dp[i][j] = Sum(dp[i-1][x]) for all x s.t. x is a divisor of i. Note that x can be equal to j itself.
This is true because if there is some sequence of length i-1 which we have already found that ends with some value x, then we can simply add j to its end and form a new sequence which satisfies all the conditions.
I guess your confusion is with the length. The thing is that since our current length is i, we can add j to the end of a sequence only if its length is i-1, we cannot iterate over other lengths.
Hope this is clear.

From a loop index k, obtain pairs i,j with i < j?

I need to traverse all pairs i,j with 0 <= i < n, 0 <= j < n and i < j for some positive integer n.
Problem is that I can only loop through another variable, say k. I can control the bounds of k. So the problem is to determine two arithmetic methods, f(k) and g(k) such that i=f(k) and j=g(k) traverse all admissible pairs as k traverses its consecutive values.
How can I do this in a simple way?

I think I got it (in Python):
def get_ij(n, k):
j = k // (n - 1) # // is integer (truncating) division
i = k - j * (n - 1)
if i >= j:
i = (n - 2) - i
j = (n - 1) - j
return i, j
for n in range(2, 6):
print n, sorted(get_ij(n, k) for k in range(n * (n - 1) / 2))
It basically folds the matrix so that it's (almost) rectangular. By "almost" I mean that there could be some unused entries on the far right of the bottom row.
The following pictures illustrate how the folding works for n=4:
and n=5:
Now, iterating over the rectangle is easy, as is mapping from folded coordinates back to coordinates in the original triangular matrix.
Pros: uses simple integer math.
Cons: returns the tuples in a weird order.

I think I found another way, that gives the pairs in lexicographic order. Note that here i > j instead of i < j.
Basically the algorithm consists of the two expressions:
i = floor((1 + sqrt(1 + 8*k))/2)
j = k - i*(i - 1)/2
that give i,j as functions of k. Here k is a zero-based index.
Pros: Gives the pairs in lexicographic order.
Cons: Relies on floating-point arithmetic.
Rationale:
We want to achieve the mapping in the following table:
k -> (i,j)
0 -> (1,0)
1 -> (2,0)
2 -> (2,1)
3 -> (3,0)
4 -> (3,1)
5 -> (3,2)
....
We start by considering the inverse mapping (i,j) -> k. It isn't hard to realize that:
k = i*(i-1)/2 + j
Since j < i, it follows that the value of k corresponding to all pairs (i,j) with fixed i satisfies:
i*(i-1)/2 <= k < i*(i+1)/2
Therefore, given k, i=f(k) returns the largest integer i such that i*(i-1)/2 <= k. After some algebra:
i = f(k) = floor((1 + sqrt(1 + 8*k))/2)
After we have found the value i, j is trivially given by
j = k - i*(i-1)/2

I'm not sure to understand exactly the question, but to sum up, if 0 <= i < n, 0 <= j < n , then you want to traverse 0 <= k < n*n
for (int k = 0; k < n*n; k++) {
int i = k / n;
int j = k % n;
// ...
}
[edit] I just saw that i < j ; so, this solution is not optimal since there's less that n*n necessary iterations ...

If we think of our solution in terms of a number triangle, where k is the sequence
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
...
Then j would be our (non zero-based) row number, that is, the greatest integer such that
j * (j - 1) / 2 < k
Solving for j:
j = ceiling ((sqrt (1 + 8 * k) - 1) / 2)
And i would be k's (zero-based) position in the row
i = k - j * (j - 1) / 2 - 1
The bounds for k are:
1 <= k <= n * (n - 1) / 2

Is it important that you actually have two arithmetic functions f(k) and g(k) doing this? Because you could first create a list such as
L = []
for i in range(n-1):
for j in range(n):
if j>i:
L.append((i,j))
This will give you all the pairs you asked for. Your variable k can now just run along the index of the list. For example, if we take n=5,
for x in L:
print(x)
gives us
(0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
Suppose your have 2<=k<5 for example, then
for k in range(2, 5)
print L[k]
yields
(0,3), (0,4), (1,2)

Counting number of points in lower left quadrant?

I am having trouble understanding a solution to an algorithmic problem
In particular, I don't understand how or why this part of the code
s += a[i];
total += query(s);
update(s);
allows you to compute the total number of points in the lower left quadrant of each point.
Could someone please elaborate?

As an analogue for the plane problem, consider this:
For a point (a, b) to lie in the lower left quadrant of (x, y), a <
x & b < y; thus, points of the form (i, P[i]) lie in the lower left quadrant
of (j, P[j]) iff i < j and P[i] < P[j]
When iterating in ascending order, all points that were considered earlier lie on the left compared to the current (i, P[i])
So one only has to locate all P[j]s less that P[i] that have been considered until now
*current point refers to the point in consideration in the current iteration of the for loop that you quoted ie, (i, P[i])
Let's define another array, C[s]:
C[s] = Number of Prefix Sums of array A[1..(i - 1)] that amount to s
So the solution to #3 becomes the sum ... C[-2] + C[-1] + C[0] + C[1] + C[2] ... C[P[i] - 1], ie prefix sum of C[P[i]]
Use the BIT to store the prefix sum of C, thus defining query(s) as:
query(s) = Number of Prefix Sums of array A[1..(i - 1)] that amount to a value < s
Using these definitions, s in the given code gives you the prefix sum up to the current index i (P[i]). total builds the answer, and update simply adds P[i] to the BIT.
We have to repeat this method for all i, hence the for loop.
PS: It uses a data structure called a Binary Indexed Tree (http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees) for operations. If you aren't acquainted with it, I'd recommend that you check the link.
EDIT:
You are given a array S and a value X. You can split S into two disjoint subarrays such that L has all elements of S less than X, and H that has those that are greater than or equal to X.
A: All elements of L are less than all elements of H.
Any subsequence T of S will have some elements of L and some elements of H. Let's say it has p elements of L and q of H. When T is sorted to give T', all p elements of L appear before the q elements of H because of A.
Median being the central value is the value at location m = (p + q)/2
It is intuitive to think that having q >= p implies that the median lies in X, as a proof:
Values in locations [1..p] in T' belong to L. Therefore for the median to be in H, it's position m should be greater than p:
m > p
(p + q)/2 > p
p + q > 2p
q > p
B: q - p > 0
To computer q - p, I replace all elements in T' with -1 if they belong to L ( < X ) and +1 if they belong to H ( >= X)
T looks something like {-1, -1, -1... 1, 1, 1}
It has p times -1 and q times 1. Sum of T' will now give me:
Sum = p * (-1) + q * (1)
C: Sum = q - p
I can use this information to find the value in B.
All subsequences are of the form {A[i], A[i + 2], A[i + 3] ... A[j + 1]} since they are contiguous, To compute sum of A[i] to A[j + 1], I can compute the prefix sum of A[i] with P[i] = A[1] + A[2] + .. A[i - 1]
Sum of subsequence from A[i] to A[j] then can be computed as P[j] - P[i] (j is greater of j and i)
With C and B in mind, we conclude:
Sum = P[j] - P[i] = q - p (q - p > 0)
P[j] - P[i] > 0
P[j] > P[i]
j > i and P[j] > P[i] for each solution that gives you a median >= X
In summary:
Replace all A[i] with -1 if they are less than X and -1 otherwise
Computer prefix sums of A[i]
For each pair (i, P[i]), count pairs which lie to its lower left quadrant.

How to find the total number of Increasing sub-sequences of certain length with Binary Index Tree(BIT)

How can I find the total number of Increasing sub-sequences of certain length with Binary Index Tree(BIT)?
Actually this is a problem from Spoj Online Judge
Example
Suppose I have an array 1,2,2,10
The increasing sub-sequences of length 3 are 1,2,4 and 1,3,4
So, the answer is 2.

Let:
dp[i, j] = number of increasing subsequences of length j that end at i
An easy solution is in O(n^2 * k):
for i = 1 to n do
dp[i, 1] = 1
for i = 1 to n do
for j = 1 to i - 1 do
if array[i] > array[j]
for p = 2 to k do
dp[i, p] += dp[j, p - 1]
The answer is dp[1, k] + dp[2, k] + ... + dp[n, k].
Now, this works, but it is inefficient for your given constraints, since n can go up to 10000. k is small enough, so we should try to find a way to get rid of an n.
Let's try another approach. We also have S - the upper bound on the values in our array. Let's try to find an algorithm in relation to this.
dp[i, j] = same as before
num[i] = how many subsequences that end with i (element, not index this time)
have a certain length
for i = 1 to n do
dp[i, 1] = 1
for p = 2 to k do // for each length this time
num = {0}
for i = 2 to n do
// note: dp[1, p > 1] = 0
// how many that end with the previous element
// have length p - 1
num[ array[i - 1] ] += dp[i - 1, p - 1]
// append the current element to all those smaller than it
// that end an increasing subsequence of length p - 1,
// creating an increasing subsequence of length p
for j = 1 to array[i] - 1 do
dp[i, p] += num[j]
This has complexity O(n * k * S), but we can reduce it to O(n * k * log S) quite easily. All we need is a data structure that lets us efficiently sum and update elements in a range: segment trees, binary indexed trees etc.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Parallelize dynamic programming solution for longest common subsequence (LCS) - parallel-processing

Related

Finding the longest consecutive sequence of integers that appears in both integers

Formulating dp problem [Codeforces 414 B]

From a loop index k, obtain pairs i,j with i < j?

Counting number of points in lower left quadrant?

How to find the total number of Increasing sub-sequences of certain length with Binary Index Tree(BIT)

Categories

Resources