Distinct Subsequences DP explanation - algorithm

From LeetCode
Given a string S and a string T, count the number of distinct
subsequences of T in S.
A subsequence of a string is a new string which is formed from the
original string by deleting some (can be none) of the characters
without disturbing the relative positions of the remaining characters.
(ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).
Here is an example: S = "rabbbit", T = "rabbit"
Return 3.
I see a very good DP solution, however, I have hard time to understand it, anybody can explain how this dp works?
int numDistinct(string S, string T) {
vector<int> f(T.size()+1);
//set the last size to 1.
f[T.size()]=1;
for(int i=S.size()-1; i>=0; --i){
for(int j=0; j<T.size(); ++j){
f[j]+=(S[i]==T[j])*f[j+1];
printf("%d\t", f[j] );
}
cout<<"\n";
}
return f[0];
}

First, try to solve the problem yourself to come up with a naive implementation:
Let's say that S.length = m and T.length = n. Let's write S{i} for the substring of S starting at i. For example, if S = "abcde", S{0} = "abcde", S{4} = "e", and S{5} = "". We use a similar definition for T.
Let N[i][j] be the distinct subsequences for S{i} and T{j}. We are interested in N[0][0] (because those are both full strings).
There are two easy cases: N[i][n] for any i and N[m][j] for j<n. How many subsequences are there for "" in some string S? Exactly 1. How many for some T in ""? Only 0.
Now, given some arbitrary i and j, we need to find a recursive formula. There are two cases.
If S[i] != T[j], we know that N[i][j] = N[i+1][j] (I hope you can verify this for yourself, I aim to explain the cryptic algorithm above in detail, not this naive version).
If S[i] = T[j], we have a choice. We can either 'match' these characters and go on with the next characters of both S and T, or we can ignore the match (as in the case that S[i] != T[j]). Since we have both choices, we need to add the counts there: N[i][j] = N[i+1][j] + N[i+1][j+1].
In order to find N[0][0] using dynamic programming, we need to fill the N table. We first need to set the boundary of the table:
N[m][j] = 0, for 0 <= j < n
N[i][n] = 1, for 0 <= i <= m
Because of the dependencies in the recursive relation, we can fill the rest of the table looping i backwards and j forwards:
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
if (S[i] == T[j]) {
N[i][j] = N[i+1][j] + N[i+1][j+1];
} else {
N[i][j] = N[i+1][j];
}
}
}
We can now use the most important trick of the algorithm: we can use a 1-dimensional array f, with the invariant in the outer loop: f = N[i+1]; This is possible because of the way the table is filled. If we apply this to my algorithm, this gives:
f[j] = 0, for 0 <= j < n
f[n] = 1
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
if (S[i] == T[j]) {
f[j] = f[j] + f[j+1];
} else {
f[j] = f[j];
}
}
}
We're almost at the algorithm you gave. First of all, we don't need to initialize f[j] = 0. Second, we don't need assignments of the type f[j] = f[j].
Since this is C++ code, we can rewrite the snippet
if (S[i] == T[j]) {
f[j] += f[j+1];
}
to
f[j] += (S[i] == T[j]) * f[j+1];
and that's all. This yields the algorithm:
f[n] = 1
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
f[j] += (S[i] == T[j]) * f[j+1];
}
}

I think the answer is wonderful, but something may be not correct.
I think we should iterate backwards over i and j. Then we change to array N to array f, we looping j forwards for not overlapping the result last got.
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
if (S[i] == T[j]) {
N[i][j] = N[i+1][j] + N[i+1][j+1];
} else {
N[i][j] = N[i+1][j];
}
}
}

Related

Leetcode Target sum of dynamic programming

Given n and target, find the number of combinations of number from [1,2,...,n] adding up to target. The number can be repeatedly picked (1 + 1 + 2 = 4), however the combinations cannot be duplicated ({1,1,2} and {1,2,1} are regard as one combination). e.g.
n = 2, target = 4: {1,1,1,1}, {1,1,2}, {1,3}, {2,2}, so return 4
Since we only need to return the number of combinations, we use dynamic programming as following:
int sum(int n, int target) {
vector<int> dp(target + 1);
dp[0] = 1;
for (int i = 1; i <= target; ++i) {
for (int j = 1; j <= n; j++) {
if (i >= j) dp[i] += dp[i - j];
}
}
return dp.back();
}
However this solution is for duplicated combinations:{1,1,1,1}, {1,1,2}, {1,2,1}, {2,1,1}, {1,3}, {3,1} {2,2}, so return 7.
Do you know how to modify it to remove the duplications?
Simple modification
for (int j = 1; j <= n; j++) {
for (int i = j; i <= target; i++) {
dp[i] += dp[i - j];
}
}
helps to avoid using small values after larger value, so code counts only sorted combinations
Alike question with specific coin nominals instead of 1..n values

How to convert this recursive function to a dp based solution?

This is the recursive function
def integerPartition(m, n):
if(n==0):
return 0
if(m ==0):
return 1
if(m<0):
return 0
return integerPartition(m,n-1) + integerPartition(m-n,n)
and this is what i have done in c++
// n -> no. of persons
// m -> amount of money to be distributed
// dp table of order (n+1)*(m+1)
long long int dp[n+1][m+1] ;
//initializing values to 0
for(i = 0; i<=n ; i++)
for(j = 0; j<= m ; j++)
dp[i][j] = 0;
Print(n,m,dp);
cout << "\n";
//Case 1 - if there is no persons i.e n = 0 answer will be 0
//Case 2 - if there is no money i.e. m = 0 there is only 1 way answer will be 1
for ( i = 1; i<= n ; i++ )
dp[i][0] = 1;
dp[i][i] = 1;
Print(n,m,dp);
for ( i = 1; i<= n ; i++){
for ( j = 1; j<= m ; j++){
dp[i][j] = dp[i][j-1] ;
if(i>=j){
dp[i][j] += dp[i-j][j];
}
// else if(i==j){
// dp[i][j] += 1;
// }
}
}
but the answers i am getting are not matching with the recursive one i don't understand what am i missing if anyone can please help me to correct i will be thankful since i have just started with dynamic programming i really am not able to figure it out
Some issues:
You seem to use non-local variables for your for loops. This is bad practice and can lead to errors that are difficult to debug. Instead
do for (int i = 1; ...etc.
dp[i][i] = 1; is not part of the for loop. You would have detected this if you would have defined i only as a variable local to the for loop.
It is good practice to always use braces for the body of a for loop (also if, else, ...etc), even if you would only have one
statement in the body.
dp[i][i] = 1; is also a bad assignment: it just is not true that integerPartition(i, i) always returns 1. It happens to be true
for small values of i, but not when i is greater than 3. For instance, integerPartition(4, 4) should return 5.
Just remove this line.
In the final nested for loop you are mixing up the row/column in your dp array. Note that you had reserved the first dimension for n and the second dimension for m, so opposite to the parameter order.
That is fine, but you do not stick to that decision in this for loop. Instead of dp[i][j-1] you should have written dp[i-1][j], and instead of dp[i-j][j] you should have
written dp[i][j-i]. And so the if condition should be adapted accordingly.
There is no return statement in your version, but maybe you just forgot to include it in the question. It should be
return dp[n][m];
Here is the corrected code:
long long int dp[n+1][m+1];
for(int i = 0; i <=n; i++) {
for(int j = 0; j <= m; j++) {
dp[i][j] = 0;
}
}
for (int i = 1; i <= n; i++) {
dp[i][0] = 1;
}
for (int i = 1; i <= n; i++){
for (int j = 1; j <= m ; j++) {
dp[i][j] = dp[i-1][j];
if (j >= i) {
dp[i][j] += dp[i][j-i];
}
}
}
return dp[n][m];
Not sure that this technically is DP, but if your goal is to get the benefits of DP, memorization might be a better approach.
The idea is made up of 2 parts:
At the start of each call to integerPartition, look up in a table (your dp will do nicely) to see if that computation has already been done, and if it has, just return the value stored in the table.
Just before any point where integerPartition is to return a value, store it in the table.
Note that this means you don't need to try to "pivot" the original code -- it proceeds as it did originally, so you are almost guaranteed to get the same results, but without as much unnecessary re-computation (at the code of extra storage).
so, basis of your code comment,
I am going to assume you only want 1 when n > 0 and m = 0 according to your recursive code, but in dp code, you interchanged them, that is i go to upto n, and j go upto m
so updating your code, try to find the mistake
// n -> no. of persons
// m -> amount of money to be distributed
// dp table of order (n+1)*(m+1)
long long int dp[n+1][m+1] ;
//initializing values to 0
for(i = 0; i<=n ; i++)
for(j = 0; j<= m ; j++)
dp[i][j] = 0;
Print(n,m,dp);
cout << "\n";
//Case 1 - if there is no persons i.e n = 0 answer will be 0
//Case 2 - if there is no money i.e. m = 0 there is only 1 way answer will be 1
for ( i = 1; i<= n; i++){
dp[i][0] = 0;
}
for(int j = 1; j <= m; j++){
dp[0][j] = 1;
}
Print(n,m,dp);
for ( i = 1; i<= n ; i++){
for ( j = 1; j<= m ; j++){
dp[i][j] = dp[i][j-1] ;
if(i>=j){
dp[i][j] += dp[i-j][j];
}
// else if(i==j){
// dp[i][j] += 1;
// }
}
}

select k values at time and flip them, find minimum cost to make all array values same equal to 1

Given a array containing only 0 and 1 and a integer value k.
You should choose k digits at time and flip all of them. Find minimum cost for making all values same. If it is not possible then give -1.
This is a simple greedy problem. I am assuming you can't flip less than k digits any time.
Find minimum cost for making all values same.
To solve this, first we will try to make all values 1 and then we'll try to make all values 0. Between these, which will take minimum steps will be our answer.
Here is my pseudo-code. The pseudo-code is self-explanatory and that's why I am not adding explanation. I am giving code for making all values 1, hope you can do for both.
int cnt = 0;
for(int i = 0; i < arr.length() - k + 1; i++) {
if(arr[i] == '1') {
continue;
}
for(int j = 0; j < k; j++) {
arr[i + j] = (arr[i + j] == '0') ? '1' : '0';
}
cnt++;
}
bool flag = true;
for(int i = 0; i < arr.length(); i++) {
if(arr[i] == '0') {
flag = false;
break;
}
}
if(flag) {
print(cnt);
} else {
print("-1");
}

Find zeroes to be flipped so that number of consecutive 1’s is maximized

Find zeroes to be flipped so that number of consecutive 1’s is maximized.
Input: arr[] = {1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1}
m = 2
Output: 5 7
We are allowed to flip maximum 2 zeroes. If we flip
arr[5] and arr[7], we get 8 consecutive 1's which is
maximum possible under given constraints .
Now if we were to find just the maximum number of 1's that is possible, is it possible to solve using dynamic programming approach?
This problem can be solved in linear time O(N) and linear space O(N). Its not full fledged dynamic programming, but its similar to that as it uses precomputation.
Data Structures Used:
1.left: It is an integer array, of same length as given array. It is precomputed such that for every position i:
left[i] = Number of consecutive 1's to the left position i
2.right: It is an integer array, of same length as given array. It is precomputed such that for every position i:
right[i] = Number of consecutive 1's to the right position i
These can be computed in single traversal of the array.Assuming arr is the original array, following pseudocode does the job:
Pseudocode for populating left array
left()
{
int count = 0;
for(int i = 0;i < arr length; ++i)
{
if(i == 0)
{
left[i] = 0;
if(arr[i] == 1)
count++;
continue;
}
else
{
left[i] = count;
if(arr[i] == 1)
count++;
else count = 0;
}
}
}
Pseudocode for populating right array
right()
{
int count = 0;
for(int i = arr length - 1;i >= 0; --i)
{
if(i == arr length - 1)
{
right[i] = 0;
if(arr[i] == 1)
count++;
continue;
}
else
{
right[i] = count;
if(arr[i] == 1)
count++;
else count = 0;
}
}
}
Now the only thing we have to do is :check all pair of positions i and j (i < j) such that arr[i] = 0 and arr[j] = 0 and for no position between i and j arr[i] should be 0 and Keep track of the pair for which we get maximum value of the following:
left[i] + right[j] + right[l]
You could also use left[i] + right[j] + left[r].
left[i] tells the number of consecutive 1's to the left of position i and right[j] tells the number of consecutive 1's to the right of position j and the number of consecutive 1's between i and j can be counted be left[r] OR right[l], and therefore, we have two candidate expressions.
This can also be done in single traversal, using following pseudocode:
max_One()
{
max = 0;
l = -1, r = -1;
for(int i = 0;i < arr length; ++i)
{
if(arr[i] == 0)
{
if(l == -1)
l = i;
else
{
r = i;
if(left[l] + right[r] + right[l] > max)
{
max = left[l] + right[r] + right[l];
left_pos = l;
right_pos = r;
}
l = r;
}
}
}
}
You should use sliding window concept here - use start and end vars to store index of range. Whenever you encounter a 0, increment the counter of zeros received. Include it in current length.. If zeros encounter equals m+1, increment start till you encounter 0.
public static int[] zerosToFlip(int[] input, int m) {
if (m == 0) return new int[0];
int[] indices = new int[m];
int beginIndex = 0;
int endIndex = 0;
int maxBeginIndex=0;
int maxEndIndex=0;
int zerosIncluded = input[0] == 0 ? 1 : 0;
for (int i = 1; i < input.length; i++) {
if (input[i] == 0) {
if (zerosIncluded == m) {
if (endIndex - beginIndex > maxEndIndex - maxBeginIndex){
maxBeginIndex = beginIndex;
maxEndIndex = endIndex;
}
while (input[beginIndex] != 0) beginIndex++;
beginIndex++;
} else {
zerosIncluded++;
}
}
endIndex++;
}
if (endIndex - beginIndex > maxEndIndex - maxBeginIndex){
maxBeginIndex = beginIndex;
maxEndIndex = endIndex;
}
int j = 0;
for (int i = maxBeginIndex; i <= maxEndIndex; i++) {
if (input[i] == 0) {
indices[j] = i;
++j;
}
}
return indices;
}

Max sum in an array with constraints

I have this problem , where given an array of positive numbers i have to find the maximum sum of elements such that no two adjacent elements are picked. The maximum has to be less than a certain given K. I tried thinking on the lines of the similar problem without the k , but i have failed so far.I have the following dp-ish soln for the latter problem
int sum1,sum2 = 0;
int sum = sum1 = a[0];
for(int i=1; i<n; i++)
{
sum = max(sum2 + a[i], sum1);
sum2 = sum1;
sum1 = sum;
}
Could someone give me tips on how to proceed with my present problem??
The best I can think of off the top of my head is an O(n*K) dp:
int sums[n][K+1] = {{0}};
int i, j;
for(j = a[0]; j <= K; ++j) {
sums[0][j] = a[0];
}
if (a[1] > a[0]) {
for(j = a[0]; j < a[1]; ++j) {
sums[1][j] = a[0];
}
for(j = a[1]; j <= K; ++j) {
sums[1][j] = a[1];
}
} else {
for(j = a[1]; j < a[0]; ++j) {
sums[1][j] = a[1];
}
for(j = a[0]; j <= K; ++j) {
sums[1][j] = a[0];
}
}
for(i = 2; i < n; ++i) {
for(j = 0; j <= K && j < a[i]; ++j) {
sums[i][j] = max(sums[i-1][j],sums[i-2][j]);
}
for(j = a[i]; j <= K; ++j) {
sums[i][j] = max(sums[i-1][j],a[i] + sums[i-2][j-a[i]]);
}
}
sums[i][j] contains the maximal sum of non-adjacent elements of a[0..i] not exceeding j. The solution is then sums[n-1][K] at the end.
Make a copy (A2) of the original array (A1).
Find largest value in array (A2).
Extract all values before the it's preceeding neighbour and the values after it's next neighbour into a new array (A3).
Find largest value in the new array (A3).
Check if sum is larger that k. If sum passes the check you are done.
If not you will need to go back to the copied array (A2), remove the second larges value (found in step 3) and start over with step 3.
Once there are no combinations of numbers that can be used with the largest number (i.e. number found in step 1 + any other number in array is larger than k) you remove it from the original array (A1) and start over with step 0.
If for some reason there are no valid combinations (e.g. array is only three numbers or no combination of numbers are lower than k) then throw an exception or you return null if that seems more appropriate.
First idea: Brute force
Iterate all legal combination of indexes and build the sum on the fly.
Stop with one sequence when you get over K.
keep the sequence until you find a larger one, that is still smaller then K
Second idea: maybe one can force this into a divide and conquer thing ...
Here is a solution to the problem without the "k" constraint which you set out to do as the first step: https://stackoverflow.com/a/13022021/1110808
The above solution can in my view be easily extended to have the k constraint by simply amending the if condition in the following for loop to include the constraint: possibleMax < k
// Subproblem solutions, DP
for (int i = start; i <= end; i++) {
int possibleMaxSub1 = maxSum(a, i + 2, end);
int possibleMaxSub2 = maxSum(a, start, i - 2);
int possibleMax = possibleMaxSub1 + possibleMaxSub2 + a[i];
/*
if (possibleMax > maxSum) {
maxSum = possibleMax;
}
*/
if (possibleMax > maxSum && possibleMax < k) {
maxSum = possibleMax;
}
}
As posted in the original link, this approach can be improved by adding memorization so that solutions to repeating sub problems are not recomputed. Or can be improved by using a bottom up dynamic programming approach (current approach is a recursive top down approach)
You can refer to a bottom up approach here: https://stackoverflow.com/a/4487594/1110808

Resources