Dynamic Programming: Longest Common Subsequence - algorithm

I'm going over notes that discuss dynamic programming in the context of finding the longest common subsequence of two equal-length strings. The algorithm in question outputs the length (not the substring).
So I have two strings, say:
S = ABAZDC, T = BACBAD
Longest common subsequence is ABAD (substrings don't have to be adjacent letters)
Algorithm is as follows, where LCS[i, j] denotes longest common subsequence of S[1..i] and T[1..j]:
if S[i] != T[j] then
LCS[i, j] = max(LCS[i - 1, j], LCS[i, j - 1])
else
LCS[i, j] = 1 + LCS[i - 1, j - 1]
My notes claim you can fill out a table where each string is written along an axis. Something like:
B A C B A D
A 0 1 1 1 1 1
B 1 1 1 2 2 2
A ...
Z
D
C
Two questions:
1) How do we actually start filling this table out. Algorithm is recursive but doesn't seem to provide a base case (otherwise I'd just call LCS[5, 5])? Notes claim you can do two simple loops with i and j and fill out each spot in constant time...
2) How could we modify the algorithm so the longest common subsequence would be of adjacent letters? My thought is that I'd have to reset the length of the current subsequence to 0 once I find that the next letter in S doesn't match the next letter in T. But it's tricky because I want to keep track of the longest seen thus far (It's possible the first subsequence I see is the longest one). So maybe I'd have an extra argument, longestThusFar, that is 0 when we call our algorithm initially and changes in subsequent calls.
Can someone make this a bit more rigorous?
Thanks!

Firstly,the algorithm is recursive,but the implementation is always iterative.In other words,we do not explicitly,call the same function from the function itself(Recursion).
We use the table entries already filled to compensate for the recursion.
Say,you have two strings of length M.
Then a table is defined of dimensions (M+1)X(M+1).
for(i = 0 to M)
{
LCS[0][i]=0;
}
for(i = 1 to M)
{
LCS[i][0]=0;
}
And you get a table like
B,A,C,B,A,D
0,0,0,0,0,0,0
A 0
B 0
A 0
Z 0
D 0
C 0
Each zero in 0th col means that if no character of string BACBAD is considered,then length of LCS = 0.
Each zero in 0th row means that if no character of string ABAZDC is considered,then length of LCS = 0.
The rest of entries are filled using the rules as you mentioned.
for(i = 1 to M)
{
for(j = 1 to M)
{
if S[i-1] != T[j-1] then
LCS[i, j] = max(LCS[i - 1, j], LCS[i, j - 1])
else
LCS[i, j] = 1 + LCS[i - 1, j - 1]
}
}
Notice that its S[i-1] != T[j-1] and not S[i] != T[j] because when you fill LCS[i,j], you are always comparing i-1 th char of S and j-1 th char of T.
The length of LCS is given by LCS[M,M].
The best way to understand this is try it by hand.
In answer to your second question,You do not need to modify the algo much.
the solution lies in the table that is used to retrieve the LCS.
In order to retrieve the LCS, we make an extra table T of characters of dimensions MXM. and we modify the algo as follows.
for(i = 1 to M)
{
for(j = 1 to M)
{
if S[i-1] != T[j-1] then
{
LCS[i, j] = max(LCS[i - 1, j], LCS[i, j - 1])
if(LCS[i - 1, j]>=LCS[i, j - 1])
T[i-1][j-1]='u'//meaning up
else T[i-1][j-1]='l'//meaning left
}
else
{
LCS[i, j] = 1 + LCS[i - 1, j - 1]
T[i-1][j-1]='d'//meaning diagonally up
}
}
}
Now,in order to know longest substring(OF ADJACENT LETTERS) common to both,traverse T diagonally.
The length = largest number of consecutive d's found in a diagonal.
The diagonal traversal of any square matrix NXN is done by.
Lower Triangle including the main diagonal
j=N-1
while(j>=0)
{
i=j;k=0;
while(i <= N-1)
{
entry T[i][k];
++i;++k
}
--j;
}
Upper triangle
j=1;
while(j<=N-1)
{
i=j;k=0;
while(i<=N-1)
{
entry T[k][i];
++k;++i;
}
--j;
}

Related

Counting valid sequences with dynamic programming

I am pretty new to Dynamic Programming, but I am trying to get better. I have an exercise from a book, which asks me the following question (slightly abridged):
You want to construct sequence of length N from numbers from the set {1, 2, 3, 4, 5, 6}. However, you cannot place the number i (i = 1, 2, 3, 4, 5, 6) more than A[i] times consecutively, where A is a given array. Given the sequence length N (1 <= N <= 10^5) and the constraint array A (1 <= A[i] <= 50), how many sequences are possible?
For instance if A = {1, 2, 1, 2, 1, 2} and N = 2, this would mean you can only have one consecutive 1, two consecutive 2's, one consecutive 3, etc. Here, something like "11" is invalid since it has two consecutive 1's, whereas something like "12" or "22" are both valid. It turns out that the actual answer for this case is 33 (there are 36 total two-digit sequences, but "11", "33", and "55" are all invalid, which gives 33).
Somebody told me that one way to solve this problem is to use dynamic programming with three states. More specifically, they say to keep a 3d array dp(i, j, k) with i representing the current position we are at in the sequence, j representing the element put in position i - 1, and k representing the number of times that this element has been repeated in the block. They also told me that for the transitions, we can put in position i every element different from j, and we can only put j in if A[j] > k.
It all makes sense to me in theory, but I've been struggling with implementing this. I have no clue how to begin with the actual implementation other than initializing the matrix dp. Typically, most of the other exercises had some sort of "base case" that were manually set in the matrix, and then a loop was used to fill in the other entries.
I guess I am particularly confused because this is a 3D array.
For a moment let's just not care about the array. Let's implement this recursively. Let dp(i, j, k) be the number of sequences with length i, last element j, and k consecutive occurrences of j at the end of the array.
The question now becomes how do we write the solution of dp(i, j, k) recursively.
Well we know that we are adding a j the kth time, so we have to take each sequence of length i - 1, and has j occurring k - 1 times, and add another j to that sequence. Notice that this is simply dp(i - 1, j, k - 1).
But what if k == 1? If that's the case we can add one occurence of j to every sequence of length i - 1 that doesn't end with j. Essentially we need the sum of all dp(i, x, k), such that A[x] >= k and x != j.
This gives our recurrence relation:
def dp(i, j, k):
# this is the base case, the number of sequences of length 1
# one if k is valid, otherwise zero
if i == 1: return int(k == 1)
if k > 1:
# get all the valid sequences [0...i-1] and add j to them
return dp(i - 1, j, k - 1)
if k == 1:
# get all valid sequences that don't end with j
res = 0
for last in range(len(A)):
if last == j: continue
for n_consec in range(1, A[last] + 1):
res += dp(i - 1, last, n_consec)
return res
We know that our answer will be all valid subsequences of length N, so our final answer is sum(dp(N, j, k) for j in range(len(A)) for k in range(1, A[j] + 1))
Believe it or not this is the basis of dynamic programming. We just broke our main problem down into a set of subproblems. Of course, right now our time is exponential because of the recursion. We have two ways to lower this:
Caching, we can simply keep track of the result of each (i, j, k) and then spit out what we originally computed when it's called again.
Use an array. We can reimplement this idea with bottom-up dp, and have an array dp[i][j][k]. All of our function calls just become array accesses in a for loop. Note that using this method forces us iterate over the array in topological order which may be tricky.
There are 2 kinds of dp approaches: top-down and bottom-up
In bottom up, you fill the terminal cases in dp table and then use for loops to build up from that. Lets consider bottom-up algo to generate Fibonacci sequence. We set dp[0] = 1 and dp[1] = 1 and run a for loop from i = 2 to n.
In top down approach, we start from the "top" view of the problem and go down from there. Consider the recursive function to get n-th Fibonacci number:
def fib(n):
if n <= 1:
return 1
if dp[n] != -1:
return dp[n]
dp[n] = fib(n - 1) + fib(n - 2)
return dp[n]
Here we don't fill the complete table, but only the cases we encounter.
Why I am talking about these 2 types is because when you start learning dp, it is often difficult to come up with bottom-up approaches (like you are trying to). When this happens, first you want to come up with a top-down approach, and then try to get a bottom up solution from that.
So let's create a recursive dp function first:
# let m be size of A
# initialize dp table with all values -1
def solve(i, j, k, n, m):
# first write terminal cases
if k > A[j]:
# this means sequence is invalid. so return 0
return 0
if i >= n:
# this means a valid sequence.
return 1
if dp[i][j][k] != -1:
return dp[i][j][k]
result = 0
for num = 1 to m:
if num == j:
result += solve(i + 1, num, k + 1, n)
else:
result += solve(i + 1, num, 1, n)
dp[i][j][k] = result
return dp[i][j][k]
So we know what terminal cases are. We create a dp table of size dp[n + 1][m][50]. Initialize it with all values 0, not -1.
So we can do bottom-up as:
# initially all values in table are zero. With loop below, we set the valid endings as 1.
# So any state trying to reach valid terminal states will get 1, but invalid states will
# return the values 0
for num = 1 to m:
for occour = 1 to A[num]:
dp[n][num][occour] = 1
# now to build up from bottom, we start by filling n-1 th position
for i = n-1 to 1:
for num = 1 to m:
for occour = 1 to A[num]:
for next_num = 1 to m:
if next_num != num:
dp[i][num][occour] += dp[i + 1][next_num][1]
else:
dp[i][num][occour] += dp[i + 1][num][occour + 1]
The answer will be:
sum = 0
for num = 1 to m:
sum += dp[1][num][1]
I am sure there must be some more elegant dp solution, but I believe this answers your question. Note that I considered that k is the number of times j-th number has been repeated consecutively, correct me if I am wrong with this.
Edit:
With the given constraints the size of the table will be, in the worst case, 10^5 * 6 * 50 = 3e7. This would be > 100MB. It is workable, but can be considered too much space use (I think some kernels doesn't allow that much stack space to a process). One way to reduce it would be to use a hash-map instead of an array with top down approach since top-down doesn't visit all the states. That would be mostly true in this case, for example if A[1] is 2, then all the other states where 1 has occoured more that twice need not be stored. Ofcourse this would not save much space if A[i] has large values, say [50, 50, 50, 50, 50, 50]. Another approach would be to modify our approach a bit. We dont actually need to store the dimension k, i.e. the times j has appeared consecutively:
dp[i][j] = no of ways from i-th position if (i - 1)th position didn't have j and i-th position is j.
Then, we would need to modify our algo to be like:
def solve(i, j):
if i == n:
return 1
if i > n:
return 0
if dp[i][j] != -1
return dp[i][j]
result = 0
# we will first try 1 consecutive j, then 2 consecutive j's then 3 and so on
for count = 1 to A[j]:
for num = 1 to m:
if num != j:
result += solve(i + count, num)
dp[i][j] = result
return dp[i][j]
This approach will reduce our space complexity to O(10^6) ~= 2mb, while time complexity is still the same : O(N * 6 * 50)

dynamic programming reduction of brute force

A emoticon consists of an arbitrary positive number of underscores between two semicolons. Hence, the shortest possible emoticon is ;_;. The strings ;__; and ;_____________; are also valid emoticons.
given a String containing only(;,_).The problem is to divide string into one or more emoticons and count how many division are possible. Each emoticon must be a subsequence of the message, and each character of the message must belong to exactly one emoticon. Note that the subsequences are not required to be contiguous. subsequence definition.
The approach I thought of is to write a recursive method as follows:
countDivision(string s){
//base cases
if(s.empty()) return 1;
if(s.length()<=3){
if(s.length()!=3) return 0;
return s[0]==';' && s[1]=='_' && s[2]==';';
}
result=0;
//subproblems
genrate all valid emocticon and remove it from s let it be w
result+=countDivision(w);
return result;
}
The solution above will easily timeout when n is large such as 100. What kind of approach should I use to convert this brute force solution to a dynamic programming solution?
Few examples
1. ";_;;_____;" ans is 2
2. ";;;___;;;" ans is 36
Example 1.
";_;;_____;" Returns: 2
There are two ways to divide this string into two emoticons.
One looks as follows: ;_;|;_____; and the other looks like
this(rembember we can pick subsequence it need not be contigous): ;_ ;|; _____;
I'll describe an O(n^4)-time and -space dynamic programming solution (that can easily be improved to use just O(n^3) space) that should work for up to n=100 or so.
Call a subsequence "fresh" if consists of a single ;.
Call a subsequence "finished" if it corresponds to an emoticon.
Call a subsequence "partial" if it has nonzero length and is a proper prefix of an emoticon. (So for example, ;, ;_, and ;___ are all partial subsequences, while the empty string, _, ;; and ;___;; are not.)
Finally, call a subsequence "admissible" if it is fresh, finished or partial.
Let f(i, j, k, m) be the number of ways of partitioning the first i characters of the string into exactly j+k+m admissible subsequences, of which exactly j are fresh, k are partial and m are finished. Notice that any prefix of a valid partition into emoticons determines i, j, k and m uniquely -- this means that no prefix of a valid partition will be counted by more than one tuple (i, j, k, m), so if we can guarantee that, for each tuple (i, j, k, m), the partition prefixes within that tuple are all counted once and only once, then we can add together the counts for tuples to get a valid total. Specifically, the answer to the question will then be the sum over all 1 <= j <= n of f(n, 0, j, 0).
If s[i] = "_":
f(i, j, k, m) =
(j+1) * f(i-1, j+1, k, m-1) // Convert any of the j+1 fresh subsequences to partial
+ m * f(i-1, j, k, m) // Add _ to any of the m partial subsequences
Else if s[i] = ";":
f(i, j, k, m) =
f(i-1, j-1, k, m) // Start a fresh subsequence
+ (m+1) * f(i-1, j, k-1, m+1) // Finish any of the m+1 partial subsequences
We also need the base cases
f(0, 0, 0, 0) = 1
f(0, _, _, _) = 0
f(i, j, k, m) = 0 if any of i, j, k or m are negative
My own C++ implementation gives the correct answer of 36 for ;;;___;;; in a few milliseconds, and e.g. for ;;;___;;;_;_; it gives an answer of 540 (also in a few milliseconds). For a string consisting of 66 ;s followed by 66 _s followed by 66 ;s, it takes just under 2s and reports an answer of 0 (probably due to overflow of the long long).
Here's a fairly straightforward memoized recursion that returns an answer immediately for a string of 66 ;s followed by 66 _s followed by 66 ;s. The function has three parameters: i = index in the string, j = number of accumulating emoticons with only a left semi-colon, and k = number of accumulating emoticons with a left semi-colon and one or more underscores.
An array is also constructed for how many underscores and semi-colons are available to the right of each index, to help decide on the next possibilities.
Complexity is O(n^3) and the problem constrains the search space, where j is at most n/2 and k at most n/4.
Commented JavaScript code:
var s = ';_;;__;_;;';
// record the number of semi-colons and
// underscores to the right of each index
var cs = new Array(s.length);
cs.push(0);
var us = new Array(s.length);
us.push(0);
for (var i=s.length-1; i>=0; i--){
if (s[i] == ';'){
cs[i] = cs[i+1] + 1;
us[i] = us[i+1];
} else {
us[i] = us[i+1] + 1;
cs[i] = cs[i+1];
}
}
// memoize
var h = {};
function f(i,j,k){
// memoization
var key = [i,j,k].join(',');
if (h[key] !== undefined){
return h[key];
}
// base case
if (i == s.length){
return 1;
}
var a = 0,
b = 0;
if (s[i] == ';'){
// if there are still enough colons to start an emoticon
if (cs[i] > j + k){
// start a new emoticon
a = f(i+1,j+1,k);
}
// close any of k partial emoticons
if (k > 0){
b = k * f(i+1,j,k-1);
}
}
if (s[i] == '_'){
// if there are still extra underscores
if (j < us[i] && k > 0){
// apply them to partial emoticons
a = k * f(i+1,j,k);
}
// convert started emoticons to partial
if (j > 0){
b = j * f(i+1,j-1,k+1);
}
}
return h[key] = a + b;
}
console.log(f(0,0,0)); // 52

Longest common sub-sequence with a certain property?

We say that a sequence of numbers x(1),x(2),...,x(k) is zigzag if no three of its consecutive elements create a nonincreasing or nondecreasing sequence. More precisely, for all i=1,2,...,k-2 either
x(i) >( x(i+1),x(i-1) )
or
x(i) < ( x(i+1) , x(i-1))
I have two sequences of numbers a(1),a(2),...,a(n) and b(1),b(2),...,b(m). The problem is to compute the length of their longest common zigzag subsequence. In other words, you're going to delete elements from the two sequences so that they are equal, and so that they're a zigzag sequence. If the minimum number of elements required to do this is k then your answer is m+n-2k.
Note. sequences with length two and one are trivially zigzag
Now i tried writing a memoized recursive solution for the same using the below state variables
i= current position of sequence 1.
j= current position of sequence 2.
last= last taken number in the zigzag sequence currently being considered.
direction = current requirement of the number i.e. should it be greater than previous,less or same;
i call the below function with
magic(0,0,Integer.MIN_VALUE,0);
Here Integer.MIN_VALUE is used a sentinel value denoting no numbers are taken yet in the sequence.
The function is given below:
static int magic(int i, int j, int last, int direction) {
if (hm.containsKey(i + " " + j + " " + last + " " + direction))
return hm.get(i + " " + j + " " + last + " " + direction);
if (i == seq1.length || j == seq2.length) {
return 0;
}
int take_both = 0, leave_both = 0, leave1 = 0, leave2 = 0;
if (seq1[i] == seq2[j] && last == Integer.MIN_VALUE)
take_both = 1 + magic(i + 1, j + 1, seq1[i], direction); // this is the first digit hence direction is 0.
else if (seq1[i] == seq2[j] && (direction == 0 || direction == 1 && seq1[i] > last || direction == -1 && seq1[i] < last))
take_both = 1 + magic(i + 1, j + 1, seq1[i], last != seq1[i] ? (last > seq1[i] ? 1 : -1) : 2);
leave_both = magic(i + 1, j + 1, last, direction);
leave1 = magic(i + 1, j, last, direction);
leave2 = magic(i, j + 1, last, direction);
int ans;
ans = Math.max(Math.max(Math.max(take_both, leave_both), leave1), leave2);
hm.put(i + " " + j + " " + last + " " + direction, ans);
return ans;
}
Now the above code is working for as much test cases i could make, but the complexity is high.
How do i reduce the time complexity,can i eliminate some state variables here? is there a efficient way to do this?
First let's reduce the number of states: Let f(i, j, d) be the length of the longest common zig-zag sequence starting at position i in the first string and position j in the second string and starting with direction d (up or down).
We have the recurrence
f(i, j, up) >= MAX(i' > i, j' > j : f(i', j', up))
if s1[i] = s2[j]:
f(i, j, up) >= MAX(i' > i, j' > j, s1[i'] > x : f(i', j', down))
an similar for the down direction. Solving this in a straightforward way
will lead to a runtime of something like O(n4 · W) where W is the range of integers in the array. W is not polynomially bounded, so we definitely want to get rid of this factor, and ideally a couple of n factors along the way.
To solve the first part, you have to find the maximum f(i', j', up) with
i' > i and j' > j. This is a standard standard 2-d orthogonal range maximum query.
For the second case, you need to find the maximum (i', j', down) with i' > i, j' > j and s1[i'] > s1[i]. That is a range maximum query in the rectangle (i, ∞) x (j, ∞) x (s1[i], ∞).
Now having 3 dimensions here looks scary. However, if we process the states in say, decreasing order of i, then we can get rid of one dimension.
We thus reduced the problem to a range query in the rectangle (j, ∞) x (s1[i], ∞). Coordinate compression gets the dimension of values down to O(n).
You can use a 2-d data structure such as a range tree or binary-indexed tree to solve both kinds of range queries in O(log2 n). The total runtime will be O(n2 · log2 n).
You can get rid of one log factor using fractional cascading, but that is associated with a high constant factor. The runtime is then only one log-factor short of that for finding the longest common subsequence, which seems like a lower-bound for our problem.

Counting number of points in lower left quadrant?

I am having trouble understanding a solution to an algorithmic problem
In particular, I don't understand how or why this part of the code
s += a[i];
total += query(s);
update(s);
allows you to compute the total number of points in the lower left quadrant of each point.
Could someone please elaborate?
As an analogue for the plane problem, consider this:
For a point (a, b) to lie in the lower left quadrant of (x, y), a <
x & b < y; thus, points of the form (i, P[i]) lie in the lower left quadrant
of (j, P[j]) iff i < j and P[i] < P[j]
When iterating in ascending order, all points that were considered earlier lie on the left compared to the current (i, P[i])
So one only has to locate all P[j]s less that P[i] that have been considered until now
*current point refers to the point in consideration in the current iteration of the for loop that you quoted ie, (i, P[i])
Let's define another array, C[s]:
C[s] = Number of Prefix Sums of array A[1..(i - 1)] that amount to s
So the solution to #3 becomes the sum ... C[-2] + C[-1] + C[0] + C[1] + C[2] ... C[P[i] - 1], ie prefix sum of C[P[i]]
Use the BIT to store the prefix sum of C, thus defining query(s) as:
query(s) = Number of Prefix Sums of array A[1..(i - 1)] that amount to a value < s
Using these definitions, s in the given code gives you the prefix sum up to the current index i (P[i]). total builds the answer, and update simply adds P[i] to the BIT.
We have to repeat this method for all i, hence the for loop.
PS: It uses a data structure called a Binary Indexed Tree (http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees) for operations. If you aren't acquainted with it, I'd recommend that you check the link.
EDIT:
You are given a array S and a value X. You can split S into two disjoint subarrays such that L has all elements of S less than X, and H that has those that are greater than or equal to X.
A: All elements of L are less than all elements of H.
Any subsequence T of S will have some elements of L and some elements of H. Let's say it has p elements of L and q of H. When T is sorted to give T', all p elements of L appear before the q elements of H because of A.
Median being the central value is the value at location m = (p + q)/2
It is intuitive to think that having q >= p implies that the median lies in X, as a proof:
Values in locations [1..p] in T' belong to L. Therefore for the median to be in H, it's position m should be greater than p:
m > p
(p + q)/2 > p
p + q > 2p
q > p
B: q - p > 0
To computer q - p, I replace all elements in T' with -1 if they belong to L ( < X ) and +1 if they belong to H ( >= X)
T looks something like {-1, -1, -1... 1, 1, 1}
It has p times -1 and q times 1. Sum of T' will now give me:
Sum = p * (-1) + q * (1)
C: Sum = q - p
I can use this information to find the value in B.
All subsequences are of the form {A[i], A[i + 2], A[i + 3] ... A[j + 1]} since they are contiguous, To compute sum of A[i] to A[j + 1], I can compute the prefix sum of A[i] with P[i] = A[1] + A[2] + .. A[i - 1]
Sum of subsequence from A[i] to A[j] then can be computed as P[j] - P[i] (j is greater of j and i)
With C and B in mind, we conclude:
Sum = P[j] - P[i] = q - p (q - p > 0)
P[j] - P[i] > 0
P[j] > P[i]
j > i and P[j] > P[i] for each solution that gives you a median >= X
In summary:
Replace all A[i] with -1 if they are less than X and -1 otherwise
Computer prefix sums of A[i]
For each pair (i, P[i]), count pairs which lie to its lower left quadrant.

Max suffix of a list

This problem is trying to find the lexicographical max suffix of a given list.
Suppose we have an array/list [e1;e2;e3;e4;e5].
Then all suffixes of [e1;e2;e3;e4;e5] are:
[e1;e2;e3;e4;e5]
[e2;e3;e4;e5]
[e3;e4;e5]
[e4;e5]
[e5]
Then our goal is to find the lexicographical max one among the above 5 lists.
for example, all suffixes of [1;2;3;1;0] are
[1;2;3;1;0]
[2;3;1;0]
[3;1;0]
[1;0]
[0].
The lexicographical max suffix is [3;1;0] from above example.
The straightforward algorithm is just to compare all suffixes one by one and always record the max. The time complexity is O(n^2) as comparing two lists need O(n).
However, the desired time complexity is O(n) and no suffix tree (no suffix array either) should be used.
please note that elements in the list may not be distinct
int max_suffix(const vector<int> &a)
{
int n = a.size(),
i = 0,
j = 1,
k;
while (j < n)
{
for (k = 0; j + k < n && a[i + k] == a[j + k]; ++k);
if (j + k == n) break;
(a[i + k] < a[j + k] ? i : j) += k + 1;
if (i == j)
++j;
else if (i > j)
swap(i, j);
}
return i;
}
My solution is a little modification of the solution to the problem Minimum Rotations.
In the above code, each time it step into the loop, it's keeped that i < j, and all a[p...n] (0<=p<j && p!=i) are not the max suffix. Then in order to decide which of a[i...n] and a[j...n] is less lexicographical, use the for-loop to find the least k that make a[i+k]!=a[j+k], then update i and j according to k.
We can skip k elements for i or j, and still keep it true that all a[p...n] (0<=p<j && p!=i) are not the max suffix. For example, if a[i+k]<a[j+k], then a[i+p...n](0<=p<=k) is not max suffix, since a[j+p...n] is lexicographically greater than it.
Imagine in a two player game, two opponents A and B work against each other, on finding the max suffix of a given string s. Whoever first finds the max suffix will win the game. In the first round, A picks suffix s[i..], and B picks suffix s[j..].
i: _____X
j: _____Y
Matched length = k
A judge compares two suffixes and finds there is mismatch after k comparisons, as shown in the fig above.
Without the loss of generality, we assume X > Y, then B is lost in this round. So he has to pick a different suffix in order to (possibly) beat A in next round. If B is smart, he will not pick any suffix starting at position j, j + 1, ..., j + k, because s[j..] is already beaten by s[i..] and he knows s[j+1..] will be beaten by s[i+1..], and s[j+2..] will be beaten by s[i+2..] and so on. So B should pick suffix S[j + k + 1..] for next round. One extra observation is that B should not pick the same suffix as A either because the first person who finds the max suffix wins the game. If j + k + 1 happens to be equal to i, B should skip to the next position.
Finally, after many rounds, either A or B will run out choices and lose the game, because the number of choices are limited for both A and B, and some choices will be eliminated after each round.
When this happens, the current suffix that the winner holds is the max suffix (Remember the loser runs out all choices. A choice is given up because either it cannot possibly be max suffix, or it is currently held by the other person. So the only reason that the loser gives up the actual max suffix in some round is that his opponent is holding it. Once a player holds max suffix, he will never lose and give it up).
The program below in C++ is almost literal translation of this game.
int maxSuffix(const std::string& s) {
std::size_t i = 0, j = 1, k;
while (i < s.size() && j < s.size()) {
for (k = 0; i + k < s.size() && j + k < s.size() && s[i + k] == s[j +k]; ++k) { } //judge
if (j + k >= s.size()) return i; //B is finally lost
if (i + k >= s.size()) return j; //A is finally lost
if (s[i + k] > s[j + k]) { //B is lost in this round so he needs a new choice
j = j + k + 1;
if (j == i) ++j;
} else { //A is lost in this round so he needs a new choice
i = i + k + 1;
if (i == j) ++i;
}
}
return j >= s.size() ? i : j;
}
Running time analysis: Initially each player has n choices. After each round, the judge makes k comparisons, and at least k possible choices are eliminated from either A or B. So the total number of comparisons are bounded by 2n when the game is over.
The discussion above is in the context of string, but it should work with minor modification on any container that supports sequential access only.

Resources