Total number of palindromic subsequences in a string

Total number of palindromic subsequences in a string - algorithm

The question is like this--
For every string given as input, you need to tell the number of subsequences of it that are palindromes (need not necessarily be distinct). Note that the empty string is not a palindrome.
For example, the palindromic subsequences of "aab" are:
"a", "a", "b", "aa", and the method returns 4.
I had the Dynamic Programming solution to finding Longest Palindromic Subsequence in mind and therefore tried to take ideas from it. Couldn't really get the solution. May be dynamic programming is not even required. Suggestions please.
And there is one more catch. When the condition "need not necessarily be distinct" is removed, can we still count without actually generating all the palindromic subsequences?

[EDIT 19/10/2015: An anonymous reviewer pointed out a problem with the formula, which prompted me to notice another, even bigger mistake... Now fixed.]
I now see how to drop the solution time down to O(n^2). I'll leave my other answer up in case it's interesting as a stepping-stone to this one. Note: This is (also) only a solution to the first part of the problem; I see no way to efficiently count only distinct palindromic subsequences (PS).
Instead of counting the number of PS that begin and end at exactly the positions i and j, let's count how many begin at or after i and end at or before j. Call this g(i, j).
We can try to write g(i, j) = g(i, j-1) + g(i+1, j) + (x[i] == x[j])*g(i+1, j-1) for the case when j > i. But this doesn't quite work, because the first two terms will double-count any PS that begin after i and end before j.
The key insight is to notice that we can easily calculate the number of PS that begin or end at some exact position by subtracting off other values of g(), and perhaps adding yet more values of g() back on to compensate for double-counting. For example, the number of PS that begin at exactly i and end at exactly j is g(i, j) - g(i+1, j) - g(i, j-1) + g(i+1, j-1): the last term corrects for the fact that both the second and third terms count all g(i+1, j-1) PS that begin after i and end before j.
Every PS that begins at or after i and ends at or before j is in exactly 1 of 4 categories:
It begins after i, and ends before j.
It begins at i, and ends before j.
It begins after i, and ends at j.
It begins at i, and ends at j.
g(i+1, j) counts all PS in category 1 or 3, and g(i, j-1) counts all PS in category 1 or 2, so their sum g(i+1, j) + g(i, j-1) counts all PS in category 2 or 3 once each, and all PS in category 1 twice. Since g(i+1, j-1) counts all PS in category 1 only, subtracting this off to get g(i+1, j) + g(i, j-1) - g(i+1, j-1) gives the total number of PS in category 1, 2 and 3. The remaining PS are those in category 4. If x[i] != x[j] then there are no PS in this category; otherwise, there are exactly as many as there are PS that begin at or after i+1 and end at or before j-1, namely g(i+1, j-1), plus one extra for the 2-character sequence x[i]x[j]. [EDIT: Thanks to commenter Tuxdude for 2 fixes here!]
With this in hand, we can express g() in a way that changes the quadratic case from f() to constant time:
g(i, i) = 1 (i.e. when j = i)
g(i, i+1) = 2 + (x[i] == x[i+1]) (i.e. 3 iff adjacent chars are identical, otherwise 2)
g(i, j) = 0 when j < i (this new boundary case is needed)
g(i, j) = g(i+1, j) + g(i, j-1) - g(i+1, j-1) + (x[i] == x[j])*(g(i+1, j-1)+1) when j >= i+2
The final answer is now simply g(1, n).

Here's a horrible O(n^4) solution:
Every palindromic subsequence begins at some position i and ends at some position j >= i such that x[i] = x[j], and its "interior" (all characters except the first and last) is either empty or a palindromic subsequence of x[i+1 .. j-1].
So we can define f(i, j) to be the number of palindromic subsequences beginning at i and ending at j >= i. Then
f(i, j) = 0 if x[i] != x[j]
f(i, i) = 1 (i.e. when j = i)
f(i, j) = 1 + the sum of f(i', j') over all i < i' <= j' < j otherwise
[EDIT: Fixed to count palindromic subsequences of length <= 2 too!]
Then the final answer is the sum of f(i, j) over all 1 <= i <= j <= n.
The DP for this is O(n^4) because there are n^2 table entries, and computing each one takes O(n^2) time. (It's probably possible to speed this up to at least O(n^3) by making use of the fact that x[i] != x[j] implies f(i, j) = 0.)

Intuitive O(n^3) solution using DP:
Let each state dp(i,j) represents number of palindromic subsequences in string[i...j]
Then simple recursive formula is
for k in range i, j-1:
if(A[j]==A[k]){
dp(i,j) = dp(i,j) + dp(k+1,j-1);
The Idea is very simple.. For adding a new character check if it is end of a subsequence or not. If there exist same character in the previously computed smaller subproblem, then it add the number of subsequences contained in range (k+1,j-1).
Just take care of corner cases.
Add one as newly added character is a single character subsequence too.
Even if there are no subsequences in the range (k+1,j-1) , you would still get 1 new subsequences of length 2 (like "aa").

Related

shortest uncommon subsequence

Given two strings s and t, determine length of shortest string z such that z is a subsequence of s and not a subsequence of t.
example :
s :babab, t :babba
sol : 3 (aab)
not looking for copy pastable code, please if anybody can help with intution for solving this.
thanks a lot !

You can use dynamic programming to solve this in quadratic time just like the longest common subsequence. I'll show the formula and how you would come up with it.
First, some definitions. Let m be the length of S, and let n be the length of T. Define DP(i, j) as the length of the shortest subsequence of S[:i] that is not a subsequence of T[:j], or 'INF' if none exists. Here, the expression S[:i] is slice notation meaning 'the first i characters of S, so S[:0] is empty and S[:m] is all of S. We want DP(m, n).
There's two easy base cases: Since T[:0] is empty, any character in S will work, so DP(i, 0) = 1 if i > 0. Similarly, DP(0, j) = 'INF' for all j.
Now, we just have to write a general formula for DP(i, j) which only depends on the value of DP() on indices smaller than (i, j). The last character of S[:i] is just some character S[i-1]. Either our optimal subsequence for S[:i], T[:j] ends with S[i-1] or it doesn't.
If our optimal subsequence doesn't end with S[i-1], then we can delete S[i-1] from consideration, and our answer is DP(i, j) = DP(i-1, j).
If our optimal subsequence does end with S[i-1], then we need to know the rightmost occurrence of S[i-1] in T[:j].
If S[i-1] does not occur in T[:j] at all, then S[i-1] by itself is a shortest subsequence, so DP(i, j) = 1.
Otherwise, let Rightmost(c, j) be the rightmost index of T[:j] equal to some character c. Since we are using S[i-1] to end our optimal subsequence, we can ignore all the characters in T[:j] after the rightmost occurrence of S[i-1]: they can no longer affect whether a string is a subsequence of T[:j]. So then DP(i, j) = DP(i-1, Rightmost(S[i-1], j)) + 1, where the +1 comes from the fact that we did choose to use S[i-1].
Putting those together, the general formula for i, j > 0 becomes:
DP(i, j) = 1 if (S[i-1] not in T[:j]), or
= min(DP(i-1, j),
DP(i-1, Rightmost(S[i-1], j)) + 1) otherwise.
Since Rightmost(c, j) is always less than j by definition, we've achieved a formula using only indices smaller (lexicographically) than (i, j), and we can use that formula directly for a recursive algorithm.

Finding best algorithm for sum of a section of an array's values

Given an array of n integers in the locations A[1], A[2], …, A[n], describe an O(n^2) time algorithm to
compute the sum A[i] + A[i+1] + … + A[j] for all i, j, 1 ≤ i < j ≤ n.
I've tried multiple ways of solving this problem but none have in O(n^2) time.
So for an array containing {1,2,3,4}
You would output:
1+2 = 3
1+2+3 = 6
1+2+3+4 = 10
2+3 = 5
2+3+4 = 9
3+4 = 7
The answer does not need to be in a specific language, pseudocode is preferred.

A good preperation is everything.
You could create an array of integrals:
I[0..n] = (0, I[0] + A[1], I[1] + A[2], ..., I[n-1]+A[n]);
This will cost you O(n) * O(1) (looping over all elements and doing one addition);
Now you can calculate each Sum(A, i, j) with just a single subtraction: I[j] - I[i-1];
so this has O(1)
Looping over all combinations of i and j with 1 <= (i,j) <= n has O(n^2).
So you end up with O(n) * O(1) + O(n^2) * O(1) = O(n^2) .
Edit:
Your array A starts at 1 - adapted to this - this also solves the little quirk with i-1
So the integral array I starts with index 0 and is 1 element larger than A
Edit:

First you'll maybe have thought about the most naive idea:
Naive idea
Create a function that for given values of i and of j will return the sum A[i] + ... + A[j].
function sumRange(A, i, j):
sum = 0
for k = i to j
sum = sum + A[k]
return sum
Then generate all pairs of i and j (with i < j) and call the above function for each pair:
for i = 1 to n
for j = i+1 to n
output sumRange(A, i, j)
This is not O(n²), because already the two loops on i and j represent O(n²) iterations, and then the function will perform yet another loop, making it O(n³).
Better idea
The above can be improved. Look at the repetition it performs. The sum that was calculated for given values of i and j could be reused to calculate the sum for when j has increased with 1, without starting from scratch and summing the values between i and (now) j-1 again, only to add that one more value to it.
We should just remember what the previous sum was, and add A[j] to it.
So without a separate function:
for i = 1 to n
sum = A[i]
for j = i+1 to n
sum = sum + A[j]
output sum
Note how the sum is not reset to 0 once it is output. It is preserved, so that when j is incremented, only one value needs to be added to it.
Now it is O(n²). Note also how it does not require an extra array for storage. It only needs the memory for a few variables (i, j, sum), so its space complexity is O(1).
As the number of sums you need to output is O(n²), there is no way to improve this time complexity any further.
NB: I assume here that single array values do not constitute a "sum". As you stated in your question, i < j, and also in your example you only showed sums of at least two array values. The above can be easily adapted to also include single value "sums" if ever that were needed.

How can I find a faster algorithm for this special case of Longest Common Sub-sequence (LCS)?

I know the LCS problem need time ~ O(mn) where m and n are length of two sequence X and Y respectively. But my problem is a little bit easier so I expect a faster algorithm than ~O(mn).
Here is my problem:
Input:
a positive integer Q, two sequence X=x1,x2,x3.....xn and Y=y1,y2,y3...yn, both of length n.
Output:
True, if the length of the LCS of X and Y is at least n - Q;
False, otherwise.
The well-known algorithm costs O(n^2) here, but actually we can do better than that. Because whenever we eliminate as many as Q elements in either sequence without finding a common element, the result returns False. Someone said there should be an algorithm as good as O(Q*n), but I cannot figure out.
UPDATE:
Already found an answer!
I was told I can just calculate the diagonal block of the table c[i,j], because if |i-j|>Q, means there are already more than Q unmatched elements in both sequences. So we only need to calculate the c[i,j] when |i-j|<=Q.

Here is one possible way to do it:
1. Let's assume that f(prefix_len, deleted_cnt) is the leftmost position in Y such that prefix_len elements of X were already processed and exactly deleted_cnt of them were deleted. Obviously, there are only O(N * Q) states because deleted_cnt cannot exceed Q.
2. The base case is f(0, 0) = 0(nothing was processed, thus nothing was deleted).
3. Transitions:
a) Remove the current element: f(i + 1, j + 1) = min(f(i + 1, j + 1), f(i, j)).
b) Match the current element with the leftmost possible element from Y that is equal to it and located after f(i, j)(let's assume that it has index pos): f(i + 1, j) = min(f(i + 1, j), pos).
4. So the only question remaining is how to get the leftmost matching element located to the right from a given position. Let's precompute the following pairs: (position in Y, element of X) -> the leftmost occurrence of the element of Y equal to this element of X to the right from this position in Y and put them into a hash table. It looks like O(n^2). But is not. For a fixed position in Y, we never need to go further to the right from it than by Q + 1 positions. Why? If we go further, we skip more than Q elements! So we can use this fact to examine only O(N * Q) pairs and get desired time complexity. When we have this hash table, finding pos during the step 3 is just one hash table lookup. Here is a pseudo code for this step:
map = EmptyHashMap()
for i = 0 ... n - 1:
for j = i + 1 ... min(n - 1, i + q + 1)
map[(i, Y[j])] = min(map[(i, Y[j])], j)
Unfortunately, this solution uses hash tables so it has O(N * Q) time complexity on average, not in the worst case, but it should be feasible.

You can also say cost of the process to make the string equal must not be greater than Q.if it greater than Q than answer must be false.(EDIT DISTANCE PROBLEM)
Suppose of the of string x is m, and the size of string y is n, then we create a two dimensional array d[0..m][0..n], where d[i][j] denotes the edit distance between the i-length prefix of x and j-length prefix of y.
The computation of array d is done using dynamic programming, which uses the following recurrence:
d[i][0] = i , for i <= m
d[0][j] = j , for j <= n
d[i][j] = d[i - 1][j - 1], if s[i] == w[j],
d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + 1), otherwise.
answer of LCS if m>n, m-dp[m][m-n]

Find largest continuous sum such that the minimum of it and it's complement is largest

I'm given a sequence of numbers a_1,a_2,...,a_n. It's sum is S=a_1+a_2+...+a_n and I need to find a subsequence a_i,...,a_j such that min(S-(a_i+...+a_j),a_i+...+a_j) is the largest possible (both sums must be non-empty).
Example:
1,2,3,4,5 the sequence is 3,4, because then min(S-(a_i+...+a_j),a_i+...+a_j)=min(8,7)=7 (and it's the largest possible which can be checked for other subsequences).
I tried to do this the hard way.
I load all values into the array tab[n].
I do this n-1 times tab[i]+=tab[i-j]. So that tab[j] is the sum from the beginning till j.
I check all possible sums a_i+...+a_j=tab[j]-tab[i-1] and substract it from the sum, take the minimum and see if it's larger than before.
It takes O(n^2). This makes me very sad and miserable. Is there a better way?

Seems like this can be done in O(n) time.
Compute the sum S. The ideal subsequence sum is the longest one which gets closest to S/2.
Start with i=j=0 and increase j until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and save the values of i_best,j_best,sum_best.
Increment i and then increase j again until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and replace the values of i_best,j_best,sum_best if they are better. Repeat this step until done.
Note that both i and j are never decremented, so they are changed a total of at most O(n) times. Since all other operations take only constant time, this results in an O(n) runtime for the entire algorithm.

Let's first do some clarifications.
A subsequence of a sequence is actually a subset of the indices of the sequence. Haivng said that, and specifically int he case where you sequence has distinct elements, your problem will reduce to the famous Partition problem, which is known to be NP-complete. If that is the case, you can manage to solve the problem in O(Sn) where "n" is the number of elements and "S" is the total sum. This is not polynomial time as "S" can be arbitrarily large.
So lets consider the case with a contiguous subsequence. You need to observe array elements twice. First run sums them up into some "S". In the second run you carefully adjust array length. Lets assume you know that a[i] + a[i + 1] + ... + a[j] > S / 2. Then you let i = i + 1 to reduce the sum. Conversely, if it was smaller, you would increase j.
This code runs in O(n).
Python code:
from math import fabs
a = [1, 2, 3, 4, 5]
i = 0
j = 0
S = sum(a)
s = 0
while s + a[j] <= S / 2:
s = s + a[j]
j = j + 1
s = s + a[j]
best_case = (i, j)
best_difference = fabs(S / 2 - s)
while True:
if fabs(S / 2 - s) < best_difference:
best_case = (i, j)
best_difference = fabs(S / 2 - s)
if s > S / 2:
s -= a[i]
i += 1
else:
j += 1
if j == len(a):
break
s += a[j]
print best_case
i = best_case[0]
j = best_case[1]
print "Best subarray = ", a[i:j + 1]
print "Best sum = " , sum(a[i:j + 1])

The expected number of inversions--From Introduction to Algorithms by Cormen

Let A[1 .. n] be an array of n distinct numbers. If i < j and A[i] > A[j], then the pair (i, j) is called an inversion of A. (See Problem 2-4 for more on inversions.) Suppose that each element of A is chosen randomly, independently, and uniformly from the range 1 through n. Use indicator random variables to compute the expected number of inversions.
The problem is from exercise 5.2-5 in Introduction to Algorithms by Cormen. Here is my recursive solution:
Suppose x(i) is the number of inversions in a[1..i], and E(i) is the expected value of x(i), then E(i+1) can be computed as following:
Image we have i+1 positions to place all the numbers, if we place i+1 on the first position, then x(i+1) = i + x(i); if we place i+1 on the second position, then x(i+1) = i-1 + x(i),..., so E(i+1) = 1/(i+1)* sum(k) + E(i), where k = [0,i]. Finally we get E(i+1) = i/2 + E(i).
Because we know that E(2) = 0.5, so recursively we get: E(n) = (n-1 + n-2 + ... + 2)/2 + 0.5 = n* (n-1)/4.
Although the deduction above seems to be right, but I am still not very sure of that. So I share it here.
If there is something wrong, please correct me.

All the solutions seem to be correct, but the problem says that we should use indicator random variables. So here is my solution using the same:
Let Eij be the event that i < j and A[i] > A[j].
Let Xij = I{Eij} = {1 if (i, j) is an inversion of A
0 if (i, j) is not an inversion of A}
Let X = Σ(i=1 to n)Σ(j=1 to n)(Xij) = No. of inversions of A.
E[X] = E[Σ(i=1 to n)Σ(j=1 to n)(Xij)]
= Σ(i=1 to n)Σ(j=1 to n)(E[Xij])
= Σ(i=1 to n)Σ(j=1 to n)(P(Eij))
= Σ(i=1 to n)Σ(j=i + 1 to n)(P(Eij)) (as we must have i < j)
= Σ(i=1 to n)Σ(j=i + 1 to n)(1/2) (we can choose the two numbers in
C(n, 2) ways and arrange them
as required. So P(Eij) = C(n, 2) / n(n-1))
= Σ(i=1 to n)((n - i)/2)
= n(n - 1)/4

Another solution is even simpler, IMO, although it does not use "indicator random variables".
Since all of the numbers are distinct, every pair of elements is either an inversion (i < j with A[i] > A[j]) or a non-inversion (i < j with A[i] < A[j]). Put another way, every pair of numbers is either in order or out of order.
So for any given permutation, the total number of inversions plus non-inversions is just the total number of pairs, or n*(n-1)/2.
By symmetry of "less than" and "greater than", the expected number of inversions equals the expected number of non-inversions.
Since the expectation of their sum is n*(n-1)/2 (constant for all permutations), and they are equal, they are each half of that or n*(n-1)/4.
[Update 1]
Apparently my "symmetry of 'less than' and 'greater than'" statement requires some elaboration.
For any array of numbers A in the range 1 through n, define ~A as the array you get when you subtract each number from n+1. For example, if A is [2,3,1], then ~A is [2,1,3].
Now, observe that for any pair of numbers in A that are in order, the corresponding elements of ~A are out of order. (Easy to show because negating two numbers exchanges their ordering.) This mapping explicitly shows the symmetry (duality) between less-than and greater-than in this context.
So, for any A, the number of inversions equals the number of non-inversions in ~A. But for every possible A, there corresponds exactly one ~A; when the numbers are chosen uniformly, both A and ~A are equally likely. Therefore the expected number of inversions in A equals the expected number of inversions in ~A, because these expectations are being calculated over the exact same space.
Therefore the expected number of inversions in A equals the expected number of non-inversions. The sum of these expectations is the expectation of the sum, which is the constant n*(n-1)/2, or the total number of pairs.
[Update 2]
A simpler symmetry: For any array A of n elements, define ~A as the same elements but in reverse order. Associate the element at position i in A with the element at position n+1-i in ~A. (That is, associate each element with itself in the reversed array.)
Now any inversion in A is associated with a non-inversion in ~A, just as with the construction in Update 1 above. So the same argument applies: The number of inversions in A equals the number of inversions in ~A; both A and ~A are equally likely sequences; etc.
The point of the intuition here is that the "less than" and "greater than" operators are just mirror images of each other, which you can see either by negating the arguments (as in Update 1) or by swapping them (as in Update 2). So the expected number of inversions and non-inversions is the same, since you cannot tell whether you are looking at any particular array through a mirror or not.

Even simpler (similar to Aman's answer above, but perhaps clearer) ...
Let Xij be a random variable with Xij=1 if A[i] > A[j] and Xij=0 otherwise.
Let X=sum(Xij) over i, j where i < j
Number of pairs (ij)*: n(n-1)/2
Probability that Xij=1 (Pr(Xij=1))): 1/2
By linearity of expectation**: E(X) = E(sum(Xij))
= sum(E(Xij))
= sum(Pr(Xij=1))
= n(n-1)/2 * 1/2
= n(n-1)/4
* I think of this as the size of the upper triangle of a square matrix.
** All sums here are over i, j, where i < j.

I think it's right, but I think the proper way to prove it is to use conditionnal expectations :
for all X and Y we have : E[X] =E [E [X|Y]]
then in your case :
E(i+1) = E[x(i+1)] = E[E[x(i+1) | x(i)]] = E[SUM(k)/(1+i) + x(i)] = i/2 + E[x(i)] = i/2 + E(i)
about the second statement :
if :
E(n) = n* (n-1)/4.
then E(n+1) = (n+1)*n/4 = (n-1)*n/4 + 2*n/4 = (n-1)*n/4 + n/2 = E(n) +n/2
So n* (n-1)/4. verify the recursion relation for all n >=2 and it verifies it for n=2
So E(n) = n*(n-1)/4
Hope I understood your problem and it helps

Using indicator random variables:
Let X = random variable which is equal to the number of inversions.
Let Xij = 1 if A[i] and A[j] form an inversion pair, and Xij = 0 otherwise.
Number of inversion pairs = Sum over 1 <= i < j <= n of (Xij)
Now P[Xij = 1] = P[A[i] > A[j]] = (n choose 2) / (2! * n choose 2) = 1/2
E[X] = E[sum over all ij pairs such that i < j of Xij] = sum over all ij pairs such that i < j of E[Xij] = n(n - 1) / 4

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Total number of palindromic subsequences in a string - algorithm

Related

shortest uncommon subsequence

Finding best algorithm for sum of a section of an array's values

How can I find a faster algorithm for this special case of Longest Common Sub-sequence (LCS)?

Find largest continuous sum such that the minimum of it and it's complement is largest

The expected number of inversions--From Introduction to Algorithms by Cormen

Categories

Resources