Dynamic programming algorithm (Kadane) - algorithm

Description of the algorithm:
Maximum Subarray Problem
Given a sequence of n real numbers A(1) … A(n), determine a contiguous subsequence A(i) … A(j) for which the sum of elements in the subsequence is maximized.
Algorithm:
int kadane(int a[], int n)
{
int overall_sum=0; //overall maximum subarray sum
int new_sum=0; //sum obtained by including the current element
for(int i=0;i<n;i++)
{
//new_sum is the maximum value out of current element or the sum of current element
//and the previous sum
new_sum=max(a[i], new_sum+a[i]);
cout << new_sum << " : ";
//if the calculated value of new_sum is greater than the overall sum,
//it replaces the overall sum value
overall_sum=max(overall_sum, new_sum);
cout << overall_sum << endl;
}
return overall_sum;
}
I understand that we are trying to break down the problem into small sub-problems. The idea is to determine the largest partial sum of the n-1 sub-sequence to find the largest partial sum of the n sequence. The code looks clear to me in the sense that I can work it out on paper to find the solution, but the idea seems like magic. Can someone provide a better explanation of this algorithm? or a proof of why it works?

To be 100% precise, what the algorithm actually calculates is: maximum sum of a non-empty subsequence, for non-empty arrays (and zero for empty arrays, which is somewhat inconsistent). It makes a difference for arrays where all numbers are negative - if we counted an empty sequence as valid, then the result should be 0. The algorithm produces the largest negative number rather than 0 for such cases.
Proof:
At the beginning of the loop new_sum is always the maximum sum of those sequences that end on (excluding) a[i] (so, up to a[i-1] for i>0, 0 for i==0). Proof by induction of loop executions. This is obviously true for i=0 (new_sum == 0 which is the sum of an empty sequence), and becomes true for i+1 after the assignment, because the maximum-sum non-empty sequence ending at a[i] (which is the last element before a[i+1]) needs to include a[i] and is therefore the maximum of a[i] itself and the sum of a[i] and the preceding sequence.
The overall_sum is just the maximum of all new_sum values for a[i], and therefore represents the maximum global subsequence (for some i it has to end at a[i], so maxing over all a[i] works).

You've already included the explanation of why it works in the code comments:
new_sum is the maximum value out of current element
or the sum of current element and the previous sum
Rather than thinking of the algorithm as the best sum up to element i, think of it as the best sum starting at element i.
Notice that the algorithm does not admit new_sum to ever not include the current element in the traversal. If ever A[i] alone is greater than A[i] added to sum-up-to-A[i-1], it makes no sense for A[i] to include the previous section and we start counting from scratch. This guarantees that the sum we count starting at A[i] reaches the greatest it can be. We may see it decrease but by then we already updated the overall greatest sum if need be.

Related

How does algorithm for Longest increasing subsequence [O(nlogn)] work?

I found algorithm mentioned in The Hitchhiker’s Guide to the Programming Contests (note: this implementation assumes there are no duplicates in the list):
set<int> st;
set<int>::iterator it;
st.clear();
for(i=0; i<n; i++) {
st.insert(array[i]); it=st.find(array[i]);
it++; if(it!=st.end()) st.erase(it);
}
cout<<st.size()<<endl;
It's an algorithm to find longest increasing subsequence in O(NlogN). If I try to work it with few test cases, it seems to work. But I still couldn't figure out its correctness logic. Also, it doesn't look so intuitive to me.
Can anyone help me gain insight as to why this algorithm works correctly?
Statement: For each i, length of current set is equal to the length of the largest increasing subsequence.
Proof: Lets use the method of induction:
Base case : Trivially true.
Induction hypothesis: Suppose we have processed i-1 elements and the length of the set is LIS[i-1], i.e the length of the LIS possible with first i-1 elements.
Induction step: Inserting an element array[i] in the set will result in two cases.
A[i] >= set.last() : In this case A[i] will be the last element in the set and hence the LIS[i] = LIS[i-1]+1.
A[i] < set.last() : In this case we insert A[i] into the set and knock off element just greater than A[i] in the sorted order. LIS[i] = LIS[i-1] + 1(adding A[i]) - 1 (removing one elem > A[i]). Which is true. Hence proved.
To explain the big picture. Inserting A[i] into the set will either add to the LIS[i-1] or will create a LIS of its own, which will be the elements from 0th position to the position of the ith element.
How to determine the longest increasing subsequence using dynamic programming?
Please read my explanation there first. If it is still not clear, read the following:
The algorithm keeps the lowest possible ending number for LIS of every length. By keeping the lowest numbers, you can extend the LIS in a maximal way. I know this is not a proof, but maybe it will be intuitive for you.

Find triplets in better than linear time such that A[n-1] >= A[n] <= A[n+1]

A sequence of numbers was given in an interview such that A[0] >= A[1] and A[N-1] >= A[N-2]. I was asked to find at-least one triplet such that A[n-1] >= A[n] <= A[n+1].
I tried to solve in iterations. Interviewer expected better than linear time solution. How should I approach this question?
Example: 9 8 5 4 3 2 6 7
Answer: 3 2 6
We can solve this in O(logn) time using divide & conquer aka. binary search. Better than linear time. So we need to find a triplet such that A[n-1] >= A[n] <= A[n+1].
First find the mid of the given array. If mid is smaller than its left and greater than its right. then return, thats your answer. Incidentally this would be a basecase in your recursion. Also if len(arr) < 3 then too return. another basecase.
Now comes the recursion scenarios. When to recurse, we would need to inspect further right. For that, If mid is greater than the element on its left then consider start to left of the array as a subproblem and recurse with this new array. i.e. in tangible terms at this point we would have ...26... with index n being 6. So we move left to see if the element to the left of 2 completes the triplet.
Otherwise if mid is greater than element on its right subarray then consider mid+1 to right of the array as a subproblem and recurse.
More Theory: The above should be sufficient to understand the problem but read on. The problem essentially boils down to finding local minima in a given set of elements. A number in the array is called local minima if it is smaller than both its left and right numbers which precisely boils down to A[n-1] >= A[n] <= A[n+1].
A given array such that its first 2 elements are decreasing and last 2 elements are increasing HAS to have a local minima. Why is that? Lets prove this by negation. If first two numbers are decreasing, and there is no local minima, that means 3rd number is less than 2nd number. otherwise 2nd number would have been local minima. Following the same logic 4th number will have to be less than 3rd number and so on and so forth. So the numbers in the array will have to be in decreasing order. Which violates the constraint of last two numbers being in increasing order. This proves by negation that there need to be a local minima.
The above theory suggests a O(n) linear approach but we definitely can do better. But the theory definitely gives us a different perspective about the problem.
Code: Here's python code (fyi - was typed in stackoverflow text editor freehand, it might misbheave).
def local_minima(arr, start, end):
mid = (start+end)/2
if mid-2 < 0 and mid+1 >= len(arr):
return -1;
if arr[mid-2] > arr[mid-1] and arr[mid-1] < arr[mid]: #found it!
return mid-1;
if arr[mid-1] > arr[mid-2]:
return local_minima(arr, start, mid);
else:
return local_minima(arr, mid, end);
Note that I just return the index of the n. To print out the triple just do -1 and +1 to the returned index. source
It sounds like what you're asking is this:
You have a sequence of numbers. It starts decreasing and continues to decrease until element n, then it starts increasing until the end of the sequence. Find n.
This is a (non-optimal) solution in linear time:
for (i = 1; i < length(A) - 1; i++)
{
if ((A[i-1] >= A[i]) && (A[i] <= A[i+1]))
return i;
}
To do better than linear time, you need to use the information that you get from the fact that the series decreases then increases.
Consider the difference between A[i] and A[i+1]. If A[i] > A[i+1], then n > i, since the values are still decreasing. If A[i] <= A[i+1], then n <= i, since the values are now increasing. In this case you need to check the difference between A[i-1] and A[i].
This is a solution in log time:
int boundUpper = length(A) - 1;
int boundLower = 1;
int i = (boundUpper + boundLower) / 2; //initial estimate
while (true)
{
if (A[i] > A[i+1])
boundLower = i + 1;
else if (A[i-1] >= A[i])
return i;
else
boundUpper = i;
i = (boundLower + boundUpper) / 2;
}
I'll leave it to you to add in the necessary error check in the case that A does not have an element satisfying the criteria.
Linear you could just do by iterating through the set, comparing them all.
You could also check the slope of the first two, then do a kind of binary chop/in order traversal comparing pairs until you find one of the opposite slope. That would amortize to a better than n time, I think, though it's not guaranteed.
edit: just realised what your ordering meant. The binary chop method is guaranteed to do this in <n time, as there is guaranteed to be a point of change (assuming that your N-1, N-2 are the last two elements of the list).
This means you just need to find it/one of them, in which case binary chop will do it in order log(n)

Reduce a sequence in most optimal way

We are given a sequence a of n numbers. The reduction of sequence a is defined as replacing the elements a[i] and a[i+1] with max(a[i],a[i+1]).
Each reduction operation has a cost defined as max(a[i],a[i+1]). After n-1 reductions a sequence of length 1 is obtained.
Now our goal is to print the cost of the optimal reduction of the given sequence a such that the resulting sequence of length 1 has the minimum cost.
e.g.:
1
2
3
Output :
5
An O(N^2) solution is trivial. Any ideas?
EDIT1:
People are asking about my idea, so my idea was to traverse through the sequence pairwise and for each pair check cost and in the end reduce the pair with least cost.
1 2 3
2 3 <=== Cost is 2
So reduce above sequence to
2 3
now again traverse through sequence, we get cost as 3
2 3
3 <=== Cost is 3
So total cost is 2+3=5
Above algorithm is of O(N^2). That is why I was asking for some more optimized idea.
O(n) solution:
High-level:
The basic idea is to repeatedly merge any element e smaller than both its neighbours ns and nl with its smallest neighbour ns. This produces the minimal cost because both the cost and result of merging is max(a[i],a[i+1]), which means no merge can make an element smaller than it currently is, thus the cheapest possible merge for e is with ns, and that merge can't increase the cost of any other possible merges.
This can be done with a one pass algorithm by keeping a stack of elements from our array in decreasing order. We compare the current element to both its neighbours (one being the top of the stack) and perform appropriate merges until we're done.
Pseudo-code:
stack = empty
for pos = 0 to length
// stack.top > arr[pos] is implicitly true because of the previous iteration of the loop
if stack.top > arr[pos] > arr[pos+1]
stack.push(arr[pos])
else if stack.top > arr[pos+1] > arr[pos]
merge(arr[pos], arr[pos+1])
else while arr[pos+1] > stack.top > arr[pos]
merge(arr[pos], stack.pop)
Java code:
Stack<Integer> stack = new Stack<Integer>();
int cost = 0;
int arr[] = {10,1,2,3,4,5};
for (int pos = 0; pos < arr.length; pos++)
if (pos < arr.length-1 && (stack.empty() || stack.peek() >= arr[pos+1]))
if (arr[pos] > arr[pos+1])
stack.push(arr[pos]);
else
cost += arr[pos+1]; // merge pos and pos+1
else
{
int last = Integer.MAX_VALUE; // required otherwise a merge may be missed
while (!stack.empty() && (pos == arr.length-1 || stack.peek() < arr[pos+1]))
{
last = stack.peek();
cost += stack.pop(); // merge stack.pop() and pos or the last popped item
}
if (last != Integer.MAX_VALUE)
{
int costTemp = Integer.MAX_VALUE;
if (!stack.empty())
costTemp = stack.peek();
if (pos != arr.length-1)
costTemp = Math.min(arr[pos+1], costTemp);
cost += costTemp;
}
}
System.out.println(cost);
I am confused if you mean by "cost" of reduction "computational cost" i.e. an operation taking time max(a[i],a[i+1]) or simply something you want to calculate. If it is the latter, then the following algorithm is better than O(n^2):
sort the list, or more precise, define b[i] s.t. a[b[i]] is the sorted list: O(n) if you can use RADIX sort, O(n log n) otherwise.
starting from the second-lowest item i in the sorted list: if left/right is lower than i, then perform reduction: O(1) for each item, update list from 2, O(n) in total.
I have no idea if that is the optimal solution, but it's O(n) for integers and O(n log n), otherwise.
edit: Realized that removing a precomputing step made it much simpler
If you don't consider it cheating to sort the list, then do it in n log n time and then merge the first two entries recursively. The total cost in this case will be the sum of the entries minus the smallest entry. This is optimal since
the cost will be the sum of n-1 entries (with repeats allowed)
the ith smallest entry can appear at most i-1 times in the cost function
The same fundamental idea works even if the list isn't sorted. An optimal solution is to merge the smallest element with its smallest neighbor. To see that this is optimal, note that
the cost will be the sum of n-1 entries (with repeats allowed)
entry a_i can appear at most j-1 times in the cost function, where j is the length of the longest consecutive subsequence containing a_i such that a_i is the maximum element of the subsequence
In the worst case, the sequence is decreasing and the time is O(n^2).
Greedy approach indeed works.
You can always reduce the smallest number with its smaller neighbor.
Proof: we have to reduce smallest number at some point. Any reduction of a neighbor will make the value of neighbor at least the same(possibly) bigger, so operation that reduces minimal element a[i] will always have cost c>=min(a[i-1], a[i+1])
Now we need to
quickly find/remove smallest number
find its neigbors
I'd go with 2 RMQs on that. Doing operation 2 as a binary search. Which gives us O(N * log^2(N))
EDIT: first RMQ - values. When you remove an element put some big value there
second RMQ - "presence". 0 or 1 (value is there/isn't there). To find a [for example] left neighbor of a[i], you need to find the greatest l, that sum[l,i-1] = 1.

Finding All possible Longest Increasing subsequence

I want to find all possible Longest Increasing Subsequences in a given string.
For eg: Given string is qapbso
Here the length of longest increasing subsequence is 3.
I want to find all possible longest subsequence of length 3. i.e "abs", "aps", "abo".
I am using Dynamic Programming but I am only getting one LIS. I want to list down all of them.
So the typical DP solution to this will produce an array in which you know what is the longest possible subsequence starting at a given position i let's say that you have and array with length n in which dp[i] stores the length of the longest non-decreasing subseq that starts with the element with index i.
Now printing all the LNDSS(longest non-decreasing subsequences) is easiest to be done using a recursion. First find the actual length of the LNDSS by seleting the greast value in dp. Next start the recursion from each element in dp that has the maximum value in dp(these may be more then one). The recursion from a given index should print all the sequences starting from the current index that have length equal to the value dp[i]:
int st[1000];
int st_index
void rec(int index) {
if (dp[index] == 1) {
cout << "Another sequence:\n";
for (int i = 0; i < st_index; ++i) {
cout << st[i] << endl;
}
}
for (int j = index + 1; j < n; ++j) {
if (dp[j] == dp[index] - 1 && a[j] >= a[index]) {
st[st_index++] = a[j];
rec(j);
st_index--;
}
}
}
This is example code in c++(hope it is understandable in other languages as well). I have a global stack called stack that keeps the elements we have already added. It has size 1000 assuming you will never have more then 1000 elements in the LNDSS but this can be increased. The solution does not have the best possible design, but I have focused on making it simple rather then well designed.
Hope this helps.
Given this graph where all nodes are connected to the nodes that have both higher value and appear later in the sequence of letters of your input string:
You want to find the longest path from START to END. This is equivalent to the shortest path if all segments have negative length of -1.
Reference on longest path problem: http://en.wikipedia.org/wiki/Longest_path_problem
In 2000 Sergei Bespamyatnikh and Michael Segal proposed an algorithm for finding all longest increasing subsequences of a given permutation. The algorithm uses a Van Emde Boas tree and has a time complexity of O(n + Kl(p)) and space complexity of O(n), where n is the length of a permutation p, l(p) is the length of its longest increasing subsequence and K is the number of such subsequences.
See the paper in the ACM Digital Library or search the web for "Enumerating longest increasing subsequences and patience sorting" to get a free PDF.

Find the farthest sum of two elements from zero in an array

Given an array, what is the most time- and space-efficient algorithm to find the sum of two elements farthest from zero in that array?
Edit
For example, [1, -1, 3, 6, -10] has the farthest sum equal to -11 which is equal to (-1)+(-10).
Using a tournament comparison method to find the largest and second largest elements uses the fewest comparisons, in total n+log(n)-2. Do this twice, once to find the largest and second largest elements, say Z and Y, and again to find the smallest and second smallest elements, say A and B. Then the answer is either Z+Y or -A-B, so one more comparison solves the problem. Overall, this takes 2n+2log(n)-3 comparisons. This is still O(n), but in practice is faster than scanning the entire list 4 times to find A,B,Y,Z (in total uses 4n-5 comparisons).
The tournament method is nicely explained with pictures and sample code in these two tutorials: one and two
If you mean the sum whose absolute value is maximum, it is either the largest sum or the smallest sum. The largest sum is the sum of the two maximal elements. The smallest sum is the sum of the two minimal elements.
So you need to find the four values: Maximal, second maximal, minimal, second minimal. You can do it in a single pass in O(n) time and O(1) memory. I suspect that this question might be about minimizing the constant in O(n) - you can do it by taking elements in fives, sorting each five (it can be done in 7 comparisons) and comparing the two top elements with current-max elements (3 comparisons at worst) and the two bottom elements with current-min elements (ditto.) This gives 2.6 comparisons per element which is a small improvement over the 3 comparisons per element of the obvious algorithm.
Then just sum the two max elements, sum the two min elements and take whichever value has the larger abs().
Let's look at the problem from a general perspective:
Find the largest sum of k integers in your array.
Begin by tracking the FIRST k integers - keep them sorted as you go.
Iterate over the array, testing each integer against the min value of the saved integers thus far.
If it is larger than the min value of the saved integers, replace it with the smallest value, and bubble it up to its proper sorted position.
When you've finished the array, you have your largest k integers.
Now you can easily apply this to k=2.
Just iterate over the array keeping track of the smallest and the largest elements encountered so far. This is time O(n), space O(1) and obviously you can't do better than that.
int GetAnswer(int[] arr){
int min = arr[0];
int max = arr[0];
int maxDistSum = 0;
for (int i = 1; i < arr.Length; ++i)
{
int x = arr[i];
if(Math.Abs(maxDistSum) < Math.Abs(max+x)) maxDistSum = max+x;
if(Math.Abs(maxDistSum) < Math.Abs(min+x)) maxDistSum = min+x;
if(x < min) min = x;
if(x > max) max = x;
}
return maxDistSum;
}
The key observation is that the furthest distance is either the sum of the two smallest elements or the sum of the two largest.

Resources