Max suffix of a list - algorithm

This problem is trying to find the lexicographical max suffix of a given list.
Suppose we have an array/list [e1;e2;e3;e4;e5].
Then all suffixes of [e1;e2;e3;e4;e5] are:
[e1;e2;e3;e4;e5]
[e2;e3;e4;e5]
[e3;e4;e5]
[e4;e5]
[e5]
Then our goal is to find the lexicographical max one among the above 5 lists.
for example, all suffixes of [1;2;3;1;0] are
[1;2;3;1;0]
[2;3;1;0]
[3;1;0]
[1;0]
[0].
The lexicographical max suffix is [3;1;0] from above example.
The straightforward algorithm is just to compare all suffixes one by one and always record the max. The time complexity is O(n^2) as comparing two lists need O(n).
However, the desired time complexity is O(n) and no suffix tree (no suffix array either) should be used.
please note that elements in the list may not be distinct

int max_suffix(const vector<int> &a)
{
int n = a.size(),
i = 0,
j = 1,
k;
while (j < n)
{
for (k = 0; j + k < n && a[i + k] == a[j + k]; ++k);
if (j + k == n) break;
(a[i + k] < a[j + k] ? i : j) += k + 1;
if (i == j)
++j;
else if (i > j)
swap(i, j);
}
return i;
}
My solution is a little modification of the solution to the problem Minimum Rotations.
In the above code, each time it step into the loop, it's keeped that i < j, and all a[p...n] (0<=p<j && p!=i) are not the max suffix. Then in order to decide which of a[i...n] and a[j...n] is less lexicographical, use the for-loop to find the least k that make a[i+k]!=a[j+k], then update i and j according to k.
We can skip k elements for i or j, and still keep it true that all a[p...n] (0<=p<j && p!=i) are not the max suffix. For example, if a[i+k]<a[j+k], then a[i+p...n](0<=p<=k) is not max suffix, since a[j+p...n] is lexicographically greater than it.

Imagine in a two player game, two opponents A and B work against each other, on finding the max suffix of a given string s. Whoever first finds the max suffix will win the game. In the first round, A picks suffix s[i..], and B picks suffix s[j..].
i: _____X
j: _____Y
Matched length = k
A judge compares two suffixes and finds there is mismatch after k comparisons, as shown in the fig above.
Without the loss of generality, we assume X > Y, then B is lost in this round. So he has to pick a different suffix in order to (possibly) beat A in next round. If B is smart, he will not pick any suffix starting at position j, j + 1, ..., j + k, because s[j..] is already beaten by s[i..] and he knows s[j+1..] will be beaten by s[i+1..], and s[j+2..] will be beaten by s[i+2..] and so on. So B should pick suffix S[j + k + 1..] for next round. One extra observation is that B should not pick the same suffix as A either because the first person who finds the max suffix wins the game. If j + k + 1 happens to be equal to i, B should skip to the next position.
Finally, after many rounds, either A or B will run out choices and lose the game, because the number of choices are limited for both A and B, and some choices will be eliminated after each round.
When this happens, the current suffix that the winner holds is the max suffix (Remember the loser runs out all choices. A choice is given up because either it cannot possibly be max suffix, or it is currently held by the other person. So the only reason that the loser gives up the actual max suffix in some round is that his opponent is holding it. Once a player holds max suffix, he will never lose and give it up).
The program below in C++ is almost literal translation of this game.
int maxSuffix(const std::string& s) {
std::size_t i = 0, j = 1, k;
while (i < s.size() && j < s.size()) {
for (k = 0; i + k < s.size() && j + k < s.size() && s[i + k] == s[j +k]; ++k) { } //judge
if (j + k >= s.size()) return i; //B is finally lost
if (i + k >= s.size()) return j; //A is finally lost
if (s[i + k] > s[j + k]) { //B is lost in this round so he needs a new choice
j = j + k + 1;
if (j == i) ++j;
} else { //A is lost in this round so he needs a new choice
i = i + k + 1;
if (i == j) ++i;
}
}
return j >= s.size() ? i : j;
}
Running time analysis: Initially each player has n choices. After each round, the judge makes k comparisons, and at least k possible choices are eliminated from either A or B. So the total number of comparisons are bounded by 2n when the game is over.
The discussion above is in the context of string, but it should work with minor modification on any container that supports sequential access only.

Related

O(n) solution to counting sub-arrays with sum constraints

I'm trying to improve my intuition around the following two sub-array problems.
Problem one
Return the length of the shortest, non-empty, contiguous sub-array of A with sum at least
K. If there is no non-empty sub-array with sum at least K, return -1
I've come across an O(N) solution online.
public int shortestSubarray(int[] A, int K) {
int N = A.length;
long[] P = new long[N+1];
for (int i = 0; i < N; ++i)
P[i+1] = P[i] + (long) A[i];
// Want smallest y-x with P[y] - P[x] >= K
int ans = N+1; // N+1 is impossible
Deque<Integer> monoq = new LinkedList(); //opt(y) candidates, as indices of P
for (int y = 0; y < P.length; ++y) {
// Want opt(y) = largest x with P[x] <= P[y] - K;
while (!monoq.isEmpty() && P[y] <= P[monoq.getLast()])
monoq.removeLast();
while (!monoq.isEmpty() && P[y] >= P[monoq.getFirst()] + K)
ans = Math.min(ans, y - monoq.removeFirst());
monoq.addLast(y);
}
return ans < N+1 ? ans : -1;
}
It seems to be maintaining a sliding window with a deque. It looks like a variant of Kadane's algorithm.
Problem two
Given an array of N integers (positive and negative), find the number of
contiguous sub array whose sum is greater or equal to K (also, positive or
negative)"
The best solution I've seen to this problem is O(nlogn) as described in the following answer.
tree = an empty search tree
result = 0
// This sum corresponds to an empty prefix.
prefixSum = 0
tree.add(prefixSum)
// Iterate over the input array from left to right.
for elem <- array:
prefixSum += elem
// Add the number of subarrays that have this element as the last one
// and their sum is not less than K.
result += tree.getNumberOfLessOrEqual(prefixSum - K)
// Add the current prefix sum the tree.
tree.add(prefixSum)
print result
My questions
Is my intuition that algorithm one is a variant of Kandane's algorithm correct?
If so, is there a variant of this algorithm (or another O(n) solution) that can be used to solve problem two?
Why can problem two only be solved in O(nlogn) time when they look so similar?

Have O(n^2) algorithm for "two-sum", convert to O(n) linear solution [duplicate]

This question already has answers here:
Find a pair of elements from an array whose sum equals a given number
(33 answers)
Closed 5 years ago.
I have an O(n^2) solution to the classic two-sum problem. Where A[1...n] sorted array of positive integers. t is some positive integer.
Need to show that A contains two distinct elements a and b s.t. a+ b = t
Here is my solution so far:
t = a number;
for (i=0; i<A.length; i++)
for each A[j]
if A[i] + A[j] == t
return true
return false
How do I make this a linear solution? O(n) scratching my head trying to figure it out.
Here's an approach I have in mind so far. i will start at the beginning of A, j will start at the end of A. i will increment, j will decrement. So I'll have two counter variables in the for loop, i & j.
There are couple of ways to improve upon that.
You could extend your algorithm, but instead of doing a simple search for every term, you could do a binary search
t = a number
for (i = 0; i < A.length; i++)
j = binarySearch(A, t - A[i], i, A.length - 1)
if (j != null)
return true
return false
Binary search is done by O(log N) steps, since you perform a binary search per every element in the array, the complexity of the whole algorithm would be O(N*log N)
This already is a tremendous improvement upon O(N^2), but you can do better.
Let's take the sum 11 and the array 1, 3, 4, 8, 9 for example.
You can already see that (3,8) satisfy the sum. To find that, imagine having two pointers, once pointing at the beginning of the array (1), we'll call it H and denote it with bold and another one pointing at the end of the array (9), we'll call it T and denote it with emphasis.
1 3 4 8 9
Right now the sum of the two pointers is 1 + 9 = 10.
10 is less than the desired sum (11), there is no way to reach the desired sum by moving the T pointer, so we'll move the H pointer right:
1 3 4 8 9
3 + 9 = 12 which is greater than the desired sum, there is no way to reach the desired sum by moving the H pointer, moving it right will further increase the sum, moving it left bring us to the initital state, so we'll move the T pointer left:
1 3 4 8 9
3 + 8 = 11 <-- this is the desired sum, we're done.
So the rules of the algorithm consist of moving the H pointer left or moving the T pointer right, we're finished when the sum of the two pointer is equal to the desired sum, or H and T crossed (T became less than H).
t = a number
H = 0
T = A.length - 1
S = -1
while H < T && S != t
S = A[H] + A[T]
if S < t
H++
else if S > t
T--
return S == t
It's easy to see that this algorithm runs at O(N) because we traverse each element at most once.
You make 2 new variables that contain index 0 and index n-1, let's call them i and j respectively.
Then, you check the sum of A[i] and A[j] and if the sum is smaller than t, then increment i (the lower index), and if it is bigger then decrement j (the higher index). continue until you either find i and j such that A[i] + A[j] = t so you return true, or j <= i, and you return false.
int i = 0, j = n-1;
while(i < j) {
if(A[i] + A[j] == t)
return true;
if(A[i] + A[j] < t)
i++;
else
j--;
return false;
Given that A[i] is relatively small (maybe less than 10^6), you can create an array B of size 10^6 with each value equal to 0. Then apply the following algorithm:
for i in 1...N:
B[A[i]] += 1
for i in 1...N:
if t - A[i] > 0:
if B[t-A[i]] > 0:
return True
Edit: well, now that we know that the array is sorted, it may be wiser to find another algorithm. I'll leave the answer here since it still applies to a certain class of related problems.

dynamic programming reduction of brute force

A emoticon consists of an arbitrary positive number of underscores between two semicolons. Hence, the shortest possible emoticon is ;_;. The strings ;__; and ;_____________; are also valid emoticons.
given a String containing only(;,_).The problem is to divide string into one or more emoticons and count how many division are possible. Each emoticon must be a subsequence of the message, and each character of the message must belong to exactly one emoticon. Note that the subsequences are not required to be contiguous. subsequence definition.
The approach I thought of is to write a recursive method as follows:
countDivision(string s){
//base cases
if(s.empty()) return 1;
if(s.length()<=3){
if(s.length()!=3) return 0;
return s[0]==';' && s[1]=='_' && s[2]==';';
}
result=0;
//subproblems
genrate all valid emocticon and remove it from s let it be w
result+=countDivision(w);
return result;
}
The solution above will easily timeout when n is large such as 100. What kind of approach should I use to convert this brute force solution to a dynamic programming solution?
Few examples
1. ";_;;_____;" ans is 2
2. ";;;___;;;" ans is 36
Example 1.
";_;;_____;" Returns: 2
There are two ways to divide this string into two emoticons.
One looks as follows: ;_;|;_____; and the other looks like
this(rembember we can pick subsequence it need not be contigous): ;_ ;|; _____;
I'll describe an O(n^4)-time and -space dynamic programming solution (that can easily be improved to use just O(n^3) space) that should work for up to n=100 or so.
Call a subsequence "fresh" if consists of a single ;.
Call a subsequence "finished" if it corresponds to an emoticon.
Call a subsequence "partial" if it has nonzero length and is a proper prefix of an emoticon. (So for example, ;, ;_, and ;___ are all partial subsequences, while the empty string, _, ;; and ;___;; are not.)
Finally, call a subsequence "admissible" if it is fresh, finished or partial.
Let f(i, j, k, m) be the number of ways of partitioning the first i characters of the string into exactly j+k+m admissible subsequences, of which exactly j are fresh, k are partial and m are finished. Notice that any prefix of a valid partition into emoticons determines i, j, k and m uniquely -- this means that no prefix of a valid partition will be counted by more than one tuple (i, j, k, m), so if we can guarantee that, for each tuple (i, j, k, m), the partition prefixes within that tuple are all counted once and only once, then we can add together the counts for tuples to get a valid total. Specifically, the answer to the question will then be the sum over all 1 <= j <= n of f(n, 0, j, 0).
If s[i] = "_":
f(i, j, k, m) =
(j+1) * f(i-1, j+1, k, m-1) // Convert any of the j+1 fresh subsequences to partial
+ m * f(i-1, j, k, m) // Add _ to any of the m partial subsequences
Else if s[i] = ";":
f(i, j, k, m) =
f(i-1, j-1, k, m) // Start a fresh subsequence
+ (m+1) * f(i-1, j, k-1, m+1) // Finish any of the m+1 partial subsequences
We also need the base cases
f(0, 0, 0, 0) = 1
f(0, _, _, _) = 0
f(i, j, k, m) = 0 if any of i, j, k or m are negative
My own C++ implementation gives the correct answer of 36 for ;;;___;;; in a few milliseconds, and e.g. for ;;;___;;;_;_; it gives an answer of 540 (also in a few milliseconds). For a string consisting of 66 ;s followed by 66 _s followed by 66 ;s, it takes just under 2s and reports an answer of 0 (probably due to overflow of the long long).
Here's a fairly straightforward memoized recursion that returns an answer immediately for a string of 66 ;s followed by 66 _s followed by 66 ;s. The function has three parameters: i = index in the string, j = number of accumulating emoticons with only a left semi-colon, and k = number of accumulating emoticons with a left semi-colon and one or more underscores.
An array is also constructed for how many underscores and semi-colons are available to the right of each index, to help decide on the next possibilities.
Complexity is O(n^3) and the problem constrains the search space, where j is at most n/2 and k at most n/4.
Commented JavaScript code:
var s = ';_;;__;_;;';
// record the number of semi-colons and
// underscores to the right of each index
var cs = new Array(s.length);
cs.push(0);
var us = new Array(s.length);
us.push(0);
for (var i=s.length-1; i>=0; i--){
if (s[i] == ';'){
cs[i] = cs[i+1] + 1;
us[i] = us[i+1];
} else {
us[i] = us[i+1] + 1;
cs[i] = cs[i+1];
}
}
// memoize
var h = {};
function f(i,j,k){
// memoization
var key = [i,j,k].join(',');
if (h[key] !== undefined){
return h[key];
}
// base case
if (i == s.length){
return 1;
}
var a = 0,
b = 0;
if (s[i] == ';'){
// if there are still enough colons to start an emoticon
if (cs[i] > j + k){
// start a new emoticon
a = f(i+1,j+1,k);
}
// close any of k partial emoticons
if (k > 0){
b = k * f(i+1,j,k-1);
}
}
if (s[i] == '_'){
// if there are still extra underscores
if (j < us[i] && k > 0){
// apply them to partial emoticons
a = k * f(i+1,j,k);
}
// convert started emoticons to partial
if (j > 0){
b = j * f(i+1,j-1,k+1);
}
}
return h[key] = a + b;
}
console.log(f(0,0,0)); // 52

Longest common sub-sequence with a certain property?

We say that a sequence of numbers x(1),x(2),...,x(k) is zigzag if no three of its consecutive elements create a nonincreasing or nondecreasing sequence. More precisely, for all i=1,2,...,k-2 either
x(i) >( x(i+1),x(i-1) )
or
x(i) < ( x(i+1) , x(i-1))
I have two sequences of numbers a(1),a(2),...,a(n) and b(1),b(2),...,b(m). The problem is to compute the length of their longest common zigzag subsequence. In other words, you're going to delete elements from the two sequences so that they are equal, and so that they're a zigzag sequence. If the minimum number of elements required to do this is k then your answer is m+n-2k.
Note. sequences with length two and one are trivially zigzag
Now i tried writing a memoized recursive solution for the same using the below state variables
i= current position of sequence 1.
j= current position of sequence 2.
last= last taken number in the zigzag sequence currently being considered.
direction = current requirement of the number i.e. should it be greater than previous,less or same;
i call the below function with
magic(0,0,Integer.MIN_VALUE,0);
Here Integer.MIN_VALUE is used a sentinel value denoting no numbers are taken yet in the sequence.
The function is given below:
static int magic(int i, int j, int last, int direction) {
if (hm.containsKey(i + " " + j + " " + last + " " + direction))
return hm.get(i + " " + j + " " + last + " " + direction);
if (i == seq1.length || j == seq2.length) {
return 0;
}
int take_both = 0, leave_both = 0, leave1 = 0, leave2 = 0;
if (seq1[i] == seq2[j] && last == Integer.MIN_VALUE)
take_both = 1 + magic(i + 1, j + 1, seq1[i], direction); // this is the first digit hence direction is 0.
else if (seq1[i] == seq2[j] && (direction == 0 || direction == 1 && seq1[i] > last || direction == -1 && seq1[i] < last))
take_both = 1 + magic(i + 1, j + 1, seq1[i], last != seq1[i] ? (last > seq1[i] ? 1 : -1) : 2);
leave_both = magic(i + 1, j + 1, last, direction);
leave1 = magic(i + 1, j, last, direction);
leave2 = magic(i, j + 1, last, direction);
int ans;
ans = Math.max(Math.max(Math.max(take_both, leave_both), leave1), leave2);
hm.put(i + " " + j + " " + last + " " + direction, ans);
return ans;
}
Now the above code is working for as much test cases i could make, but the complexity is high.
How do i reduce the time complexity,can i eliminate some state variables here? is there a efficient way to do this?
First let's reduce the number of states: Let f(i, j, d) be the length of the longest common zig-zag sequence starting at position i in the first string and position j in the second string and starting with direction d (up or down).
We have the recurrence
f(i, j, up) >= MAX(i' > i, j' > j : f(i', j', up))
if s1[i] = s2[j]:
f(i, j, up) >= MAX(i' > i, j' > j, s1[i'] > x : f(i', j', down))
an similar for the down direction. Solving this in a straightforward way
will lead to a runtime of something like O(n4 · W) where W is the range of integers in the array. W is not polynomially bounded, so we definitely want to get rid of this factor, and ideally a couple of n factors along the way.
To solve the first part, you have to find the maximum f(i', j', up) with
i' > i and j' > j. This is a standard standard 2-d orthogonal range maximum query.
For the second case, you need to find the maximum (i', j', down) with i' > i, j' > j and s1[i'] > s1[i]. That is a range maximum query in the rectangle (i, ∞) x (j, ∞) x (s1[i], ∞).
Now having 3 dimensions here looks scary. However, if we process the states in say, decreasing order of i, then we can get rid of one dimension.
We thus reduced the problem to a range query in the rectangle (j, ∞) x (s1[i], ∞). Coordinate compression gets the dimension of values down to O(n).
You can use a 2-d data structure such as a range tree or binary-indexed tree to solve both kinds of range queries in O(log2 n). The total runtime will be O(n2 · log2 n).
You can get rid of one log factor using fractional cascading, but that is associated with a high constant factor. The runtime is then only one log-factor short of that for finding the longest common subsequence, which seems like a lower-bound for our problem.

Maximize sum of list with no more than k consecutive elements from input

I have an array of N numbers and I want remove only those elements from the list which when removed will create a new list where there are no more K numbers adjacent to each other. There can be multiple lists that can be created with this restriction. So I just want that list in which the sum of the remaining numbers is maximum and as an output print that sum only.
The algorithm that I have come up with so far has a time complexity of O(n^2). Is it possible to get better algorithm for this problem?
Link to the question.
Here's my attempt:
int main()
{
//Total Number of elements in the list
int count = 6;
//Maximum number of elements that can be together
int maxTogether = 1;
//The list of numbers
int billboards[] = {4, 7, 2, 0, 8, 9};
int maxSum = 0;
for(int k = 0; k<=maxTogether ; k++){
int sum=0;
int size= k;
for (int i = 0; i< count; i++) {
if(size != maxTogether){
sum += billboards[i];
size++;
}else{
size = 0;
}
}
printf("%i\n", sum);
if(sum > maxSum)
{
maxSum = sum;
}
}
return 0;
}
The O(NK) dynamic programming solution is fairly easy:
Let A[i] be the best sum of the elements to the left subject to the not-k-consecutive constraint (assuming we're removing the i-th element as well).
Then we can calculate A[i] by looking back K elements:
A[i] = 0;
for j = 1 to k
A[i] = max(A[i], A[i-j])
A[i] += input[i]
And, at the end, just look through the last k elements from A, adding the elements to the right to each and picking the best one.
But this is too slow.
Let's do better.
So A[i] finds the best from A[i-1], A[i-2], ..., A[i-K+1], A[i-K].
So A[i+1] finds the best from A[i], A[i-1], A[i-2], ..., A[i-K+1].
There's a lot of redundancy there - we already know the best from indices i-1 through i-K because of A[i]'s calculation, but then we find the best of all of those except i-K (with i) again in A[i+1].
So we can just store all of them in an ordered data structure and then remove A[i-K] and insert A[i]. My choice - A binary search tree to find the minimum, along with a circular array of size K+1 of tree nodes, so we can easily find the one we need to remove.
I swapped the problem around to make it slightly simpler - instead of finding the maximum of remaining elements, I find the minimum of removed elements and then return total sum - removed sum.
High-level pseudo-code:
for each i in input
add (i + the smallest value in the BST) to the BST
add the above node to the circular array
if it wrapper around, remove the overridden element from the BST
// now the remaining nodes in the BST are the last k elements
return (the total sum - the smallest value in the BST)
Running time:
O(n log k)
Java code:
int getBestSum(int[] input, int K)
{
Node[] array = new Node[K+1];
TreeSet<Node> nodes = new TreeSet<Node>();
Node n = new Node(0);
nodes.add(n);
array[0] = n;
int arrPos = 0;
int sum = 0;
for (int i: input)
{
sum += i;
Node oldNode = nodes.first();
Node newNode = new Node(oldNode.value + i);
arrPos = (arrPos + 1) % array.length;
if (array[arrPos] != null)
nodes.remove(array[arrPos]);
array[arrPos] = newNode;
nodes.add(newNode);
}
return sum - nodes.first().value;
}
getBestSum(new int[]{1,2,3,1,6,10}, 2) prints 21, as required.
Let f[i] be the maximum total value you can get with the first i numbers, while you don't choose the last(i.e. the i-th) one. Then we have
f[i] = max{
f[i-1],
max{f[j] + sum(j + 1, i - 1) | (i - j) <= k}
}
you can use a heap-like data structure to maintain the options and get the maximum one in log(n) time, keep a global delta or whatever, and pay attention to the range i - j <= k.
The following algorithm is of O(N*K) complexity.
Examine the 1st K elements (0 to K-1) of the array. There can be at most 1 gap in this region.
Reason: If there were two gaps, then there would not be any reason to have the lower (earlier gap).
For each index i of these K gap options, following holds true:
1. Sum upto i-1 is the present score of each option.
2. If the next gap is after a distance of d, then the options for d are (K - i) to K
For every possible position of gap, calculate the best sum upto that position among the options.
The latter part of the array can be traversed similarly independently from the past gap history.
Traverse the array further till the end.

Resources