I want to find all possible Longest Increasing Subsequences in a given string.
For eg: Given string is qapbso
Here the length of longest increasing subsequence is 3.
I want to find all possible longest subsequence of length 3. i.e "abs", "aps", "abo".
I am using Dynamic Programming but I am only getting one LIS. I want to list down all of them.
So the typical DP solution to this will produce an array in which you know what is the longest possible subsequence starting at a given position i let's say that you have and array with length n in which dp[i] stores the length of the longest non-decreasing subseq that starts with the element with index i.
Now printing all the LNDSS(longest non-decreasing subsequences) is easiest to be done using a recursion. First find the actual length of the LNDSS by seleting the greast value in dp. Next start the recursion from each element in dp that has the maximum value in dp(these may be more then one). The recursion from a given index should print all the sequences starting from the current index that have length equal to the value dp[i]:
int st[1000];
int st_index
void rec(int index) {
if (dp[index] == 1) {
cout << "Another sequence:\n";
for (int i = 0; i < st_index; ++i) {
cout << st[i] << endl;
}
}
for (int j = index + 1; j < n; ++j) {
if (dp[j] == dp[index] - 1 && a[j] >= a[index]) {
st[st_index++] = a[j];
rec(j);
st_index--;
}
}
}
This is example code in c++(hope it is understandable in other languages as well). I have a global stack called stack that keeps the elements we have already added. It has size 1000 assuming you will never have more then 1000 elements in the LNDSS but this can be increased. The solution does not have the best possible design, but I have focused on making it simple rather then well designed.
Hope this helps.
Given this graph where all nodes are connected to the nodes that have both higher value and appear later in the sequence of letters of your input string:
You want to find the longest path from START to END. This is equivalent to the shortest path if all segments have negative length of -1.
Reference on longest path problem: http://en.wikipedia.org/wiki/Longest_path_problem
In 2000 Sergei Bespamyatnikh and Michael Segal proposed an algorithm for finding all longest increasing subsequences of a given permutation. The algorithm uses a Van Emde Boas tree and has a time complexity of O(n + Kl(p)) and space complexity of O(n), where n is the length of a permutation p, l(p) is the length of its longest increasing subsequence and K is the number of such subsequences.
See the paper in the ACM Digital Library or search the web for "Enumerating longest increasing subsequences and patience sorting" to get a free PDF.
Related
Description of the algorithm:
Maximum Subarray Problem
Given a sequence of n real numbers A(1) … A(n), determine a contiguous subsequence A(i) … A(j) for which the sum of elements in the subsequence is maximized.
Algorithm:
int kadane(int a[], int n)
{
int overall_sum=0; //overall maximum subarray sum
int new_sum=0; //sum obtained by including the current element
for(int i=0;i<n;i++)
{
//new_sum is the maximum value out of current element or the sum of current element
//and the previous sum
new_sum=max(a[i], new_sum+a[i]);
cout << new_sum << " : ";
//if the calculated value of new_sum is greater than the overall sum,
//it replaces the overall sum value
overall_sum=max(overall_sum, new_sum);
cout << overall_sum << endl;
}
return overall_sum;
}
I understand that we are trying to break down the problem into small sub-problems. The idea is to determine the largest partial sum of the n-1 sub-sequence to find the largest partial sum of the n sequence. The code looks clear to me in the sense that I can work it out on paper to find the solution, but the idea seems like magic. Can someone provide a better explanation of this algorithm? or a proof of why it works?
To be 100% precise, what the algorithm actually calculates is: maximum sum of a non-empty subsequence, for non-empty arrays (and zero for empty arrays, which is somewhat inconsistent). It makes a difference for arrays where all numbers are negative - if we counted an empty sequence as valid, then the result should be 0. The algorithm produces the largest negative number rather than 0 for such cases.
Proof:
At the beginning of the loop new_sum is always the maximum sum of those sequences that end on (excluding) a[i] (so, up to a[i-1] for i>0, 0 for i==0). Proof by induction of loop executions. This is obviously true for i=0 (new_sum == 0 which is the sum of an empty sequence), and becomes true for i+1 after the assignment, because the maximum-sum non-empty sequence ending at a[i] (which is the last element before a[i+1]) needs to include a[i] and is therefore the maximum of a[i] itself and the sum of a[i] and the preceding sequence.
The overall_sum is just the maximum of all new_sum values for a[i], and therefore represents the maximum global subsequence (for some i it has to end at a[i], so maxing over all a[i] works).
You've already included the explanation of why it works in the code comments:
new_sum is the maximum value out of current element
or the sum of current element and the previous sum
Rather than thinking of the algorithm as the best sum up to element i, think of it as the best sum starting at element i.
Notice that the algorithm does not admit new_sum to ever not include the current element in the traversal. If ever A[i] alone is greater than A[i] added to sum-up-to-A[i-1], it makes no sense for A[i] to include the previous section and we start counting from scratch. This guarantees that the sum we count starting at A[i] reaches the greatest it can be. We may see it decrease but by then we already updated the overall greatest sum if need be.
Let's assume I have a string like 100110001010001. I'd like to find such substring that:
are as longest as possible
have total positive sum >0
So the longest substrings, that have more 1s than 0s.
For example for the string above 100110001010001 it would be: [10011]000[101]000[1]
Actually it's be satisfying to find the total length of those, in this case: 9.
Unfortunately I have no clue, how can it be done not in brute-force way. Any ideas, please?
As posted now, your question seems a bit unclear. The total length of valid substrings that are "as long as possible" could mean different things: for example, among other options, it could be (1) a list of the longest valid extension to the left of each index (which would allow overlaps in the list), (2) the longest combination of non-overlapping such longest left-extensions, (3) the longest combination of non-overlapping, valid substrings (where each substring is not necessarily the longest possible).
I will outline a method for (3) since it easily transforms to (1) or (2). Finding the longest left-extension from each index with more ones than zeros can be done in O(n log n) time and O(n) additional space (for just the longest valid substring in O(n) time, see here: Finding the longest non-negative sub array). With that preprocessing, finding the longest combination of valid, non-overlapping substrings can be done with dynamic programming in somewhat optimized O(n^2) time and O(n) additional space.
We start by traversing the string, storing sums representing the partial sum up to and including s[i], counting zeros as -1. We insert each partial sum in a binary tree where each node also stores an array of indexes where the value occurs, and the leftmost index of a value less than the node's value. (A substring from s[a] to s[b] has more ones than zeros if the prefix sum up to b is greater than the prefix sum up to a.) If a value is already in the tree, we add the index to the node's index array.
Since we are traversing from left to right, only when a new lowest value is inserted into the tree is the leftmost-index-of-lower-value updated — and it's updated only for the node with the previous lowest value. This is because any nodes with a lower value would not need updating; and if any nodes with lower values were already in the tree, any nodes with higher values would already have stored the index of the earliest one inserted.
The longest valid substring to the left of each index extends to the leftmost index with a lower prefix sum, which can be easily looked up in the tree.
To get the longest combination, let f(i) represent the longest combination up to index i. Then f(i) equals the maximum of the length of each valid left extension possible to index j added to f(j-1).
Dynamic programming.
We have a string. If it is positive, that's our answer. Otherwise we need to trim each end until it goes positive, and find each pattern of trims. So for each length (N-1, N-2, N-3) etc, we've got N- length possible paths (trim from a, trim from b) each of which give us a state. When state goes positive, we've found out substring.
So two lists of integers, representing what happens if we trim entirely from a or entirely from b. Then backtrack. If we trim 1 from a, we must trim all the rest from b, if we trim two from a, we must trim one fewer from b. Is there an answer that allows us to go positive?
We can quickly eliminate because the answer must be at a maximum, either max trimming from a or max trimming from b. If the other trim allows us go positive, that's the result.
pseudocode:
N = length(string);
Nones = countones(string);
Nzeros = N - Nones;
if(Nones > Nzeroes)
return string
vector<int> cuta;
vector<int> cutb;
int besta = Nones - Nzeros;
int bestb = Nones - Nzeros;
cuta.push_back(besta);
cutb.push_back(bestb);
bestia = 0;
bestib = 0;
for(i=0;i<N;i++)
{
cuta.push_back( string[i] == 1 ? cuta.back() - 1 : cuta.back() +1);
cutb.push_back( string[N-i-1] == 1 ? cutb.back() -1 : cutb.back()+1);
if(cuta.back() > besta)
{
besta = cuta.back();
bestia = i;
}
if(cutb.back() > bestb)
{
bestb = cutb.back();
bestib = i;
}
// checks, is a cut from wholly from a or b going to send us positive
if(besta == 1)
answer = substring(string, bestia, N);
if(bestb == 1)
answer = substring(string, 0, N - bestib);
// if not, is a combined cut from current position to the
// the peak in the other distribution going to send us positive?
if(Nones - Nzeros + besta + cutb.back() == 1)
{
answer = substring(string, bestai, N - i);
}
if(Nones - Nzeros + cuta.back() + bestb == 1)
{
answer = substring(string, i, N - bestbi);
}
}
/*if we get here the string was all zeros and no positive substring */
This is untested and the final checks are a bit fiddly and I might have
made an error somewhere, but the algorithm should work more or less
as described.
Context
this problem arises from trying to minimize number of expensive function calls
Problem Definition
Please note that extract_and_insert != swap. In particular, we take the element from position "from", insert it at position "to", and SHIFT all intermediate elements.
int n;
int A[n]; // all elements are integer and distinct
function extract_and_insert(from, to) {
int old_value = A[from]
if (from < to) {
for(int i = from; i < to; ++i)
A[i] = A[i+1];
A[to] = old_value;
} else {
for(int i = from; i > to; --i)
A[i] = A[i-1];
A[to] = old_value;
}
}
Question
We know there are O(n log n) algorithms for sorting a list of numbers.
Now: is there an O(n log n) function, which returns the minimum number of calls to extract_and_insert required to sort the list?
The answer is Yes.
This problem is essentially equivalent to finding the longest increasing subsequence (LIS) in an array, and you can use algorithms to solve that.
Why is this question equivalent to longest increasing subsequence?
Because each extract_and_insert operation will, at its most effective use, correct the relative position of exactly one element in the array. In other words, when we consider the length of the longest increasing subsequence of the array, each operation will increase that length by 1. So, the minimum number of required calls is:
length_of_array - length_of_LIS
and therefore by finding the length of LIS, we will be able to find the minimum number of operations required.
Do read up the linked Wikipedia page to see how to implement the algorithm.
I found algorithm mentioned in The Hitchhiker’s Guide to the Programming Contests (note: this implementation assumes there are no duplicates in the list):
set<int> st;
set<int>::iterator it;
st.clear();
for(i=0; i<n; i++) {
st.insert(array[i]); it=st.find(array[i]);
it++; if(it!=st.end()) st.erase(it);
}
cout<<st.size()<<endl;
It's an algorithm to find longest increasing subsequence in O(NlogN). If I try to work it with few test cases, it seems to work. But I still couldn't figure out its correctness logic. Also, it doesn't look so intuitive to me.
Can anyone help me gain insight as to why this algorithm works correctly?
Statement: For each i, length of current set is equal to the length of the largest increasing subsequence.
Proof: Lets use the method of induction:
Base case : Trivially true.
Induction hypothesis: Suppose we have processed i-1 elements and the length of the set is LIS[i-1], i.e the length of the LIS possible with first i-1 elements.
Induction step: Inserting an element array[i] in the set will result in two cases.
A[i] >= set.last() : In this case A[i] will be the last element in the set and hence the LIS[i] = LIS[i-1]+1.
A[i] < set.last() : In this case we insert A[i] into the set and knock off element just greater than A[i] in the sorted order. LIS[i] = LIS[i-1] + 1(adding A[i]) - 1 (removing one elem > A[i]). Which is true. Hence proved.
To explain the big picture. Inserting A[i] into the set will either add to the LIS[i-1] or will create a LIS of its own, which will be the elements from 0th position to the position of the ith element.
How to determine the longest increasing subsequence using dynamic programming?
Please read my explanation there first. If it is still not clear, read the following:
The algorithm keeps the lowest possible ending number for LIS of every length. By keeping the lowest numbers, you can extend the LIS in a maximal way. I know this is not a proof, but maybe it will be intuitive for you.
Given a list {x_i}, I want to find the longest increasing subsequence starting from each element such that the starting element is included in the subsequence.
The obvious way would to do this would be to perform the usual longest increasing subsequence algorithm on each element, giving O(n^2 logn). Can this be beaten?
You can use DP and bring it down to O(n^2).
Let the input be x1, x2, ..., xn
Let f1, f2, ..., fn be the length of longest increasing sequence starting from ith element. Initialize all of them to 1.
Now,
for i = n-1, n-2, .... , 1:
for j = i,i+1,...,n:
if x[i]<x[j]:
fi=max(fi, fj+1)
If you want actual sequence in addition to the length, keep track of another variable g1,g2, ..., gn where gi is the next element to follow. Initialize gis to NULL.
for i = n-1, n-2, .... , 1:
for j = i,i+1,...,n:
if x[i]<x[j]:
if fi<fj+1:
fi=fj+1
gi=j
Once you have gs, I will leave you to figure out how to enumerate the sequence starting from a particular location.
A more efficient algorithm would rely on sharing the same data structure for each iteration instead of restarting the algorithm each time. One way to do this would be to find the longest decreasing subsequence of the reverse of your input list. This should give you a data structure that gives you constant-time access to the predecessor for each element, and the length of the subsequence starting at that element.
For each starting element: if it's in the longest decreasing subsequence, follow its predecessors to the end. If it's not, find the element that is larger and to the right of it, and has the most predecessors, and follow that element's predecessors.
This would give a worst-case time complexity of O(N^2), but at least that is needed to output the results anyway.
int main(){
int arr[]={1,10,5,12,17,18,19};
int t[]={1,0,0,0,0,0,0};
int i,j;
vector<int>v[7];
v[0].push_back(1);
for(i =1;i<7;i++){
for(j =0;j<i;j++){
if(arr[j]<arr[i]){
if(t[j]>=t[i]){
t[i]=t[j]+1;
v[i].push_back(arr[j]);
}
}
}
if(i==j){
v[i].push_back(arr[i]);
}
}
for(i=0;i<7;i++){
for(j=0;j<v[i].size();j++){
cout<<v[i][j]<<" ";
}
cout<<endl;
}
return 0;
}
This is c++ code ,time complexity is N^2.I will come up with more elegant(using map with pair) solution rather than this . That will be nlogn order.I didnt write that code here because that will depends on data density .If data will be dense then i will write that approach otherwise it will always work fine.