Longest Common Subsequence among 3 Strings - algorithm

I've implemented the dynamic programming solution to find the longest common subsequence among 2 strings. There is apparently a way to generalize this algorithm to find the LCS among 3 strings, but in my research I have not found any information on how to go about this. Any help would be appreciated.

To find the Longest Common Subsequence (LCS) of 2 strings A and B, you can traverse a 2-dimensional array diagonally like shown in the Link you posted. Every element in the array corresponds to the problem of finding the LCS of the substrings A' and B' (A cut by its row number, B cut by its column number). This problem can be solved by calculating the value of all elements in the array. You must be certain that when you calculate the value of an array element, all sub-problems required to calculate that given value has already been solved. That is why you traverse the 2-dimensional array diagonally.
This solution can be scaled to finding the longest common subsequence between N strings, but this requires a general way to iterate an array of N dimensions such that any element is reached only when all sub-problems the element requires a solution to has been solved.
Instead of iterating the N-dimensional array in a special order, you can also solve the problem recursively. With recursion it is important to save the intermediate solutions, since many branches will require the same intermediate solutions. I have written a small example in C# that does this:
string lcs(string[] strings)
{
if (strings.Length == 0)
return "";
if (strings.Length == 1)
return strings[0];
int max = -1;
int cacheSize = 1;
for (int i = 0; i < strings.Length; i++)
{
cacheSize *= strings[i].Length;
if (strings[i].Length > max)
max = strings[i].Length;
}
string[] cache = new string[cacheSize];
int[] indexes = new int[strings.Length];
for (int i = 0; i < indexes.Length; i++)
indexes[i] = strings[i].Length - 1;
return lcsBack(strings, indexes, cache);
}
string lcsBack(string[] strings, int[] indexes, string[] cache)
{
for (int i = 0; i < indexes.Length; i++ )
if (indexes[i] == -1)
return "";
bool match = true;
for (int i = 1; i < indexes.Length; i++)
{
if (strings[0][indexes[0]] != strings[i][indexes[i]])
{
match = false;
break;
}
}
if (match)
{
int[] newIndexes = new int[indexes.Length];
for (int i = 0; i < indexes.Length; i++)
newIndexes[i] = indexes[i] - 1;
string result = lcsBack(strings, newIndexes, cache) + strings[0][indexes[0]];
cache[calcCachePos(indexes, strings)] = result;
return result;
}
else
{
string[] subStrings = new string[strings.Length];
for (int i = 0; i < strings.Length; i++)
{
if (indexes[i] <= 0)
subStrings[i] = "";
else
{
int[] newIndexes = new int[indexes.Length];
for (int j = 0; j < indexes.Length; j++)
newIndexes[j] = indexes[j];
newIndexes[i]--;
int cachePos = calcCachePos(newIndexes, strings);
if (cache[cachePos] == null)
subStrings[i] = lcsBack(strings, newIndexes, cache);
else
subStrings[i] = cache[cachePos];
}
}
string longestString = "";
int longestLength = 0;
for (int i = 0; i < subStrings.Length; i++)
{
if (subStrings[i].Length > longestLength)
{
longestString = subStrings[i];
longestLength = longestString.Length;
}
}
cache[calcCachePos(indexes, strings)] = longestString;
return longestString;
}
}
int calcCachePos(int[] indexes, string[] strings)
{
int factor = 1;
int pos = 0;
for (int i = 0; i < indexes.Length; i++)
{
pos += indexes[i] * factor;
factor *= strings[i].Length;
}
return pos;
}
My code example can be optimized further. Many of the strings being cached are duplicates, and some are duplicates with just one additional character added. This uses more space than necessary when the input strings become large.
On input: "666222054263314443712", "5432127413542377777", "6664664565464057425"
The LCS returned is "54442"

Related

I think I've found a good sorting algorithm. Seems to be faster than quickSort?

This works without comparing.
First, it finds the largest number in the array and saves it in a variable called "max". Then it creates a temporary array with the length of max + 1. After that, each "tempArray[i]" counts how often a number equal to "i" has been counted in the input array. In the end, it converts "tempArray" and writes it into the input array. See for yourself.
static int[] nSort(int[] array) {
int max = array[0];
for(int i = 1; i < array.length; i++) {
max = Math.max(max, array[i]);
}
Integer[] tempArray = new Integer[max+1];
for(int i = 0; i < array.length; i++) {
if(tempArray[array[i]] == null) {
tempArray[array[i]] = 0;
}
tempArray[array[i]]++;
}
for(int[] i = new int[2]; i[0] < max + 1; i[0]++) {
if(tempArray[i[0]] != null) {
while(tempArray[i[0]] > 0) {
array[i[1]] = i[0];
i[1]++;
tempArray[i[0]]--;
}
}
}
return array;
}
I've charted the measured runtime in a graph below. Green being my algorithm red and red being quicksort.
I've used this quicksort GitHub implementation and measured runtime in the same way as implemented there.
Runtime graph:

The fastest algorithm for returning max length of consecutive same value fields of matrix?

Here is the given example:
We have the function which takes one matrix and it's number of columns and it's number of rows and returns int (this is gonna be length). For example:
int function (int** matrix, int n, int m)
The question is what's the fastest algorithm for implementing this function so it returns the maximum length of consecutive fields with the same value (doesn't matter if those same values are in one column or in one row, in this example on picture it's the 5 fields of one column with value 8)?
Values can be from 0-255 (grayscale for example).
So in the given example function should return 5.
If this is a bottleneck and the matrix is large, the first optimization to try is to make one pass over the matrix in sequential memory order (row-by-row in C or C++) rather than two. This is because it's very expensive to traverse a 2d array in the other direction. Cache and paging behavior are the worst possible.
For this you will need a row-sized array to track the number of consecutive values in the current run within each column.
int function (int a[][], int m, int n) {
if (n <= 0 || m <= 0) return 0;
int longest_run_len = 1; // Accumulator for the return value.
int current_col_run_len[n]; // Accumulators for each column
int current_row_run_len = 1; // Accumulator for the current row.
// Initialize the column accumulators and check the first row.
current_col_run_len[0] = 1;
for (int j = 1; j < n; j++) {
current_col_run_len[j] = 1;
if (a[0][j] == a[0][j-1]) {
if (++current_row_run_len > longest_run_len)
longest_run_len = current_row_run_len;
} else current_row_run_len = 1;
}
// Now the rest of the rows...
for (int i = 1; i < m; i++) {
// First column:
if (a[i][0] == a[i-1][0]) {
if (++current_col_run_len[0] > longest_run_len)
longest_run_len = current_col_run_len[0];
} else current_col_run_len[0] = 1;
// Other columns.
current_row_run_len = 1;
for (int j = 1; j < n; j++) {
if (a[i][j] == a[i][j-1]) {
if (++current_row_run_len > longest_run_len)
longest_run_len = current_row_run_len;
} else current_row_run_len = 1;
if (a[i][j] == a[i-1][j]) {
if (++current_col_run_len[j] > longest_run_len)
longest_run_len = current_col_run_len[j];
} else current_col_run_len[j] = 1;
}
}
return longest_run_len;
}
You need to pass over each entry of the matrix at least once, so you can't possible do better than O(m*n).
The most straightforward way is to pass over each row and each column once. This will be two passes over the matrix, but the algorithm is still O(m*n).
Any attempt to do it in one pass will probably be a lot more complex.
int function (int** matrix, int n, int m) {
int best=1;
for (int i=0; i<m; ++i) {
int k=1;
int last=-1;
for (int j=0; j<n; ++j) {
if (matrix[i][j] == last) {
k++;
if (k > best) {
best=k;
}
}
else {
k=1;
}
last = matrix[i][j];
}
}
for (int j=0; j<n; ++j) {
int k=1;
int last=-1;
for (int i=0; i<m; ++i) {
if (matrix[i][j] == last) {
k++;
if (k > best) {
best=k;
}
}
else {
k=1;
}
last = matrix[i][j];
}
}
return best;
}

Distinct Subsequences DP explanation

From LeetCode
Given a string S and a string T, count the number of distinct
subsequences of T in S.
A subsequence of a string is a new string which is formed from the
original string by deleting some (can be none) of the characters
without disturbing the relative positions of the remaining characters.
(ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).
Here is an example: S = "rabbbit", T = "rabbit"
Return 3.
I see a very good DP solution, however, I have hard time to understand it, anybody can explain how this dp works?
int numDistinct(string S, string T) {
vector<int> f(T.size()+1);
//set the last size to 1.
f[T.size()]=1;
for(int i=S.size()-1; i>=0; --i){
for(int j=0; j<T.size(); ++j){
f[j]+=(S[i]==T[j])*f[j+1];
printf("%d\t", f[j] );
}
cout<<"\n";
}
return f[0];
}
First, try to solve the problem yourself to come up with a naive implementation:
Let's say that S.length = m and T.length = n. Let's write S{i} for the substring of S starting at i. For example, if S = "abcde", S{0} = "abcde", S{4} = "e", and S{5} = "". We use a similar definition for T.
Let N[i][j] be the distinct subsequences for S{i} and T{j}. We are interested in N[0][0] (because those are both full strings).
There are two easy cases: N[i][n] for any i and N[m][j] for j<n. How many subsequences are there for "" in some string S? Exactly 1. How many for some T in ""? Only 0.
Now, given some arbitrary i and j, we need to find a recursive formula. There are two cases.
If S[i] != T[j], we know that N[i][j] = N[i+1][j] (I hope you can verify this for yourself, I aim to explain the cryptic algorithm above in detail, not this naive version).
If S[i] = T[j], we have a choice. We can either 'match' these characters and go on with the next characters of both S and T, or we can ignore the match (as in the case that S[i] != T[j]). Since we have both choices, we need to add the counts there: N[i][j] = N[i+1][j] + N[i+1][j+1].
In order to find N[0][0] using dynamic programming, we need to fill the N table. We first need to set the boundary of the table:
N[m][j] = 0, for 0 <= j < n
N[i][n] = 1, for 0 <= i <= m
Because of the dependencies in the recursive relation, we can fill the rest of the table looping i backwards and j forwards:
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
if (S[i] == T[j]) {
N[i][j] = N[i+1][j] + N[i+1][j+1];
} else {
N[i][j] = N[i+1][j];
}
}
}
We can now use the most important trick of the algorithm: we can use a 1-dimensional array f, with the invariant in the outer loop: f = N[i+1]; This is possible because of the way the table is filled. If we apply this to my algorithm, this gives:
f[j] = 0, for 0 <= j < n
f[n] = 1
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
if (S[i] == T[j]) {
f[j] = f[j] + f[j+1];
} else {
f[j] = f[j];
}
}
}
We're almost at the algorithm you gave. First of all, we don't need to initialize f[j] = 0. Second, we don't need assignments of the type f[j] = f[j].
Since this is C++ code, we can rewrite the snippet
if (S[i] == T[j]) {
f[j] += f[j+1];
}
to
f[j] += (S[i] == T[j]) * f[j+1];
and that's all. This yields the algorithm:
f[n] = 1
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
f[j] += (S[i] == T[j]) * f[j+1];
}
}
I think the answer is wonderful, but something may be not correct.
I think we should iterate backwards over i and j. Then we change to array N to array f, we looping j forwards for not overlapping the result last got.
for (int i = m-1; i >= 0; i--) {
for (int j = 0; j < n; j++) {
if (S[i] == T[j]) {
N[i][j] = N[i+1][j] + N[i+1][j+1];
} else {
N[i][j] = N[i+1][j];
}
}
}

interviewstreet Triplet challenge

There is an integer array d which does not contain more than two elements of the same value. How many distinct ascending triples (d[i] < d[j] < d[k], i < j < k) are present?
Input format:
The first line contains an integer N denoting the number of elements in the array. This is followed by a single line containing N integers separated by a single space with no leading/trailing spaces
Output format:
A single integer that denotes the number of distinct ascending triples present in the array
Constraints:
N <= 10^5
Every value in the array is present at most twice
Every value in the array is a 32-bit positive integer
Sample input:
6
1 1 2 2 3 4
Sample output:
4
Explanation:
The distinct triplets are
(1,2,3)
(1,2,4)
(1,3,4)
(2,3,4)
Another test case:
Input:
10
1 1 5 4 3 6 6 5 9 10
Output:
28
I tried to solve using DP. But out of 15 test cases only 7 test cases passed.
Please help solve this problem.
You should note that you only need to know the number of elements that are smaller/larger than a particular element to know how many triples it serves as the middle point for. Using this you can calculate the number of triples quite easily, the only remaining problem is to get rid of duplicates, but given that you are limited to at most 2 of the same element, this is trivial.
I solved using a Binary Index Tree http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees.
I also did a small write up, http://www.kesannmcclean.com/?p=223.
package com.jai;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.Arrays;
import java.util.HashMap;
public class Triplets {
int[] lSmaller, rLarger, treeArray, dscArray, lFlags, rFlags;
int size, count = 0;
Triplets(int aSize, int[] inputArray) {
size = aSize;
lSmaller = new int[size];
rLarger = new int[size];
dscArray = new int[size];
int[] tmpArray = Arrays.copyOf(inputArray, inputArray.length);
Arrays.sort(tmpArray);
HashMap<Integer, Integer> tmpMap = new HashMap<Integer, Integer>(size);
for (int i = 0; i < size; i++) {
if (!tmpMap.containsKey(tmpArray[i])) {
count++;
tmpMap.put(tmpArray[i], count);
}
}
count++;
treeArray = new int[count];
lFlags = new int[count];
rFlags = new int[count];
for (int i = 0; i < size; i++) {
dscArray[i] = tmpMap.get(inputArray[i]);
}
}
void update(int idx) {
while (idx < count) {
treeArray[idx]++;
idx += (idx & -idx);
}
}
int read(int index) {
int sum = 0;
while (index > 0) {
sum += treeArray[index];
index -= (index & -index);
}
return sum;
}
void countLeftSmaller() {
Arrays.fill(treeArray, 0);
Arrays.fill(lSmaller, 0);
Arrays.fill(lFlags, 0);
for (int i = 0; i < size; i++) {
int val = dscArray[i];
lSmaller[i] = read(val - 1);
if (lFlags[val] == 0) {
update(val);
lFlags[val] = i + 1;
} else {
lSmaller[i] -= lSmaller[lFlags[val] - 1];
}
}
}
void countRightLarger() {
Arrays.fill(treeArray, 0);
Arrays.fill(rLarger, 0);
Arrays.fill(rFlags, 0);
for (int i = size - 1; i >= 0; i--) {
int val = dscArray[i];
rLarger[i] = read(count - 1) - read(val);
if (rFlags[val] == 0) {
update(val);
rFlags[val] = i + 1;
}
}
}
long countTriplets() {
long sum = 0;
for (int i = 0; i < size; i++) {
sum += lSmaller[i] * rLarger[i];
}
return sum;
}
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int N = Integer.parseInt(br.readLine());
int[] a = new int[N];
String[] strs = br.readLine().split(" ");
for (int i = 0; i < N; i++)
a[i] = Integer.parseInt(strs[i]);
Triplets sol = new Triplets(N, a);
sol.countLeftSmaller();
sol.countRightLarger();
System.out.println(sol.countTriplets());
}
}
For tentative algorithm that I came up with, it should be:
(K-1)!^2
where K is number of unique elements.
EDIT
After more thinking about this:
SUM[i=1,K-2] SUM[j=i+1,K-1] SUM[m=j+1,K] 1
=> SUM[i=1,K-2] (SUM[j=i+1,K-1] (K-j))
if the input is not sorted (the question is not clear about this): sort it
remove the duplicated items (this step could be conbined with the first step)
now pick 3 items. Since the items are already sorted, the three chosen items are ordered as well
IIRC there are (n!) / ((n-3)! * 3!) ways to pick the three items; with n := the number of unique items
#hadron: exactly, I couldn get my head around on why it should be 28 and not 35 for a set of 7 distinct numbers *
[Since the ques is about ascending triplets, repeated numbers can be discarded].
btw, here's a very bad Java solution(N^3):
I have also printed out the possible triplets:
I'm also thinking about some function that dictates the no: of triplets possible for input 'N'
4 4
5 10
6 20
7 35
8 56
9 84
package org.HackerRank.AlgoChallenges;
import java.util.Iterator;
import java.util.Scanner;
import java.util.TreeSet;
public class Triplets {
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
int result = 0;
int n = scanner.nextInt();
Object[] array = new Object[n];
TreeSet<Integer> treeSet = new TreeSet<Integer>();
/*
* for (int i = 0; i < n; i++) { array[i] = scanner.nextInt(); }
*/
while (n>0) {
treeSet.add(scanner.nextInt());
n--;
}
scanner.close();
Iterator<Integer> iterator = treeSet.iterator();
int i =0;
while (iterator.hasNext()) {
//System.out.println("TreeSet["+i+"] : "+iterator.next());
array[i] = iterator.next();
//System.out.println("Array["+i+"] : "+array[i]);
i++;
}
for (int j = 0; j < (array.length-2); j++) {
for (int j2 = (j+1); j2 < array.length-1; j2++) {
for (int k = (j2+1); k < array.length; k++) {
if(array[j]!=null && array[j2]!=null && array[k]!=null){
System.out.println("{ "+array[j]+", "+array[j2]+", "+array[k]+" }");
result++;
}
}
}
}
System.out.println(result);
}
One of solution in python:
from itertools import combinations as comb
def triplet(lis):
done = dict()
result = set()
for ind, num in enumerate(lis):
if num not in done:
index = ind+1
for elm in comb(lis[index:], 2):
s,t = elm[0], elm[1]
if (num < s < t):
done.setdefault(num, None)
fin = (num,s,t)
if fin not in result:
result.add(fin)
return len(result)
test = int(raw_input())
lis = [int(_) for _ in raw_input().split()]
print triplet(lis)
Do you care about complexity?
Is the input array sorted?
if you don't mind about complexity you can solve it in complexity of N^3.
The solution with complexity N^3:
If it not sorted, then sorted the array.
Use 3 for loops one inside the other and go threw the array 3 times for each number.
Use hash map to count all the triples. The key will be the triple it self and the value will be the number of occurences.
It should be something like this:
for (i1=0; i1<N; i1++) {
for (i2=i1; i2<N; i2++) {
for (i3=i2; i3<N; i3++) {
if (N[i1] < N[i2] < N[i3]) {
/* if the triple exists in the hash then
add 1 to its value
else
put new triple to the hash with
value 1
*/
}
}
}
}
Result = number of triples in the hash;
I didn't try it but I think it should work.

find the largest sub- matrix full of ones in linear time

Given an n by n matrix with zeros and ones, find the largest sub-
matrix full of ones in linear time. I was told that a solution with
O(n) time complexity exists. If there are n^2 elements in a n X n
matrix how does a linear solution exist?
Unless you have a non-standard definition of submatrix this problem is NP-hard by reduction from maximum clique.
You can't search a n x n matrix in n time. Counterexample: a matrix of zeros with a single element set to one. You have to check every element to find where that one is, so time must be at least O(n^2).
Now if you say that the matrix has N = n^2 entries, and you only consider submatrices that form a contiguous block, then you should be able to find the largest submatrix by walking diagonally across the matrix, keeping track of every rectangle of ones as you go. You could in general have up to O(sqrt(N)) rectangles active simultaneously, and you would need to search in them to figure out which rectangle was the largest, so you ought to be able to do this in O(N^(3/2) * log(N)) time.
If you can pick arbitrary rows and columns to form your submatrix, then I don't see any obvious polynomial time algorithm.
The solution is linear in the number of entries, not in the number of rows or columns.
public static int biggestSubMatrix(int[][] matrix) {
int[][] newMatrix = new int[matrix.length][matrix[0].length];
for (int i = 0; i < matrix.length; i++) {
int sum = 0;
for (int j = 0; j < matrix[0].length; j++) {
if (matrix[i][j] == 1) {
sum++;
newMatrix[i][j] = sum;
} else {
sum = 0;
newMatrix[i][j] = 0;
}
}
}
int maxDimention = 0;
int maxSubMatrix = 0;
for (int i = 0; i < newMatrix[0].length; i++) {
//find dimention for each column
maxDimention = calcHighestDimentionBySmallestItem(newMatrix, i);
if(maxSubMatrix < maxDimention ){
maxSubMatrix = maxDimention ;
}
}
return maxSubMatrix;
}
private static int calcHighestDimentionBySmallestItem(int[][] matrix, int col) {
int totalMaxDimention =0;
for (int j = 0; j < matrix.length; j++) {
int maxDimention = matrix[j][col];
int numItems = 0;
int min = matrix[j][col];
int dimention = 0;
for (int i = j; i < matrix.length; i++) {
int val = matrix[i][col];
if (val != 0) {
if (val < min) {
min = val;
}
numItems++;
dimention = numItems*min;
if(dimention>maxDimention){
maxDimention = dimention;
}
} else { //case val == 0
numItems = 0;
min = 0;
}
}
if(totalMaxDimention < maxDimention){
totalMaxDimention = maxDimention;
}
}
return totalMaxDimention;
}

Resources