Finding the longest zero-sum subsequence - algorithm

WARNING: this is NOT an instance of "finding the longest subarray which sums to zero" problem
I'm wondering if there's any algorithm to find the length of the maximum subsequence (i.e. elements can be contiguous or not) which sums to zero in a sequence, e.g.
S = {1, 4, 6, -1, 2, 8, -2}
^ ^ ^ ^
maximum length = 4
I searched for it but I couldn't find any

It's a slight variation on the subset sum problem.
Let d[i] = maximum length of a subsequence that sums to i. Initially, this is all zero. If your numbers were all positive, you could do:
s = 0
for i = 1 to n:
s += a[i]
for j = s down to a[i]:
d[j] = max(d[j], <- keep j as it is
d[j - a[i]] + 1 <- add a[i] to j - a[i], obtaining sum j
)
return ???
However, this does not account for the possibility of having negative elements. In order to handle those, you can use two dictionaries instead of an array:
a = [1, 4, 6, -1, 2, 8, -2] # the input array
d1 = {0: 0} # first dictionary: explicitly initialize d[0] = 0
d2 = {0: 0} # second dictionary is the same initially
n = len(a) # the length of the input array
for i in range(n): # for each index of the input array
for j in d1: # for each value in the first dictionary
x = 0
if j + a[i] in d2: # if we already have answer for j + a[i]
# in the second dictionary, we store it
x = d2[j + a[i]]
d2[j + a[i]] = max(x, d1[j] + 1) # add a[i] to the j in the first dictionary
# and get a new value in the second one,
# or keep the existing one in the second dictionary,
# if it leads to a longer subsequence
d1 = dict(d2) # copy the second dictionary into the first.
# We need two dictionaries to make sure that
# we don't use the same element twice
print(d1[0]) # prints 4
You can also implement this with arrays if you add some constants so you don't access negative indexes, but dictionaries are cleaner.
Time complexity will be O(n*S) where S is the sum of all numbers in the array and n the number of elements in the array.

There is also a solution using dynamic programming and functional progamming. In javascript:
function maxSumZeroAcc(arr, sumS, nbElt) {
//returns nbElt + the size of the longest sequence summing to sumS
if(arr.length === 0) {
return (sumS===0)?nbElt:0;
}
else {
return Math.max(maxSumZeroAcc(arr.slice(1), sumS, nbElt),maxSumZeroAcc(arr.slice(1), sumS-arr[0], nbElt+1));
}
}
function maxSumZero(arr) {
//simply calls the previous function with proper initial parameters
return maxSumZeroAcc(arr, 0, 0);
}
var myS = [1,4,6,-1,2,8,-2];
console.log(maxSumZero(myS));//returns 4
The core of the code is the function maxSumZeroAcc(arr, sumS, nbElt) that returns nbElt augmented of the size of the longest sequence in arr summing to sumS -- sumS and nbElt being two auxiliairy parameters set up in the function maxSumZero.
The idea behind maxSumZeroAcc is that the max we are looking for is either the max of maxSumZeroAcc applied to the tail of the array (we simply discard the first element) or of maxSumZeroAcc(.,sumS-arr[0],nbElt+1) applied to the tail of the array (we take into accound the first element and instead of finding a sum of element to sumS, we look for sum of elements to sumS diminished of the first element).
This solution is rather short to write down and quite easy to understand but the complexity is pretty bad and is in O(2^n), where n is the size of the array.

Related

Iterative algorithm for N choose K, without repetitions, order matters

I'm looking for an iterative algorithm to obtain all of the permutations of K elements extracted from a set of N elements, without repetitions (i.e., without substitutions), and for which order matters.
I know the amount of permutations has to be N!/(N-K)!.
Have you any ideas?
Thank you.
Ivan
Approach 1:
Simpler Idea:
Since the order matters, we will try to utilize next_permutation algorithm. The next_permutation function gives next lexicographically unique permutation.
Algorithm:
Create a new array A of size equals to size of original array.
Assign last k numbers 1,2,3,..k such that k ends last. Assign remaining numbers as 0.
Now while iterating through permutations using next_permutation, select only indices in original array where value in A > 0, maintaning order.
Explanation for correctness:
Since in newly created array there are N numbers of which N-k are repeated, total unique permutations become N!/(N-k)! which gives us our desired outcome.
Example:
Input: X = [1,2,3], k=2
Now, we create A = [0,1,2].
All permutations:
[0, 1, 2],
[0, 2, 1],
[1, 0, 2],
[1, 2, 0],
[2, 0, 1],
[2, 1, 0].
Choose only indices i of these permutations from original array where A[i] > 0, which will yield,
[2,3],
[3,2],
[1,3],
[1,2],
[3,1],
[2,1].
If you want above in sorted order, use negative numbers and initialize first k numbers with -k,-k-1,..-1 and remaining with 0 and apply the algorithm with slight modification, by selecting index i in original array, such that A[i] < 0 while maintaining order.
Sidenote:
If order doesn't matter, initialize A with k -1s in the beginning and remaining 0 and use the iterative permutations algorithm which will generate unique possible k selections from N items.
Approach 2:(better than Approach 1)
Algorithm:
Start by selecting first K numbers and populate in array A, it will store the index of chosen elements from original array. We mark it as the first selection.
Now get all remaining permutations in lexicographical order.
How to get next combination considering order? We will permute the selection if possible, otherwise return lexicographically greater selection.
Idea for getting next selection in lexicograhic order if permutations are exhausted:
We consider our current combination, and find the rightmost element
that has not yet reached its highest possible value. Once finding this
element, we increment it by 1, and assign the lowest valid value to
all subsequent elements.
from: https://cp-algorithms.com/combinatorics/generating_combinations.html
Example:
Input: X = [1,2,3,4], k=3
Now, we create A = [0,1,2].
All permutations:
[0,1,2] // initialization
[0,2,1] // next permutation
... // all permutations of [0,1,2]
...
[2,1,0] // reached last permutation of current selection
[0,1,3] // get next selection in lexicographic order as permutations are exhausted
...
[3,1,0] // reached last permutation of current selection
[0,2,3] // get next selection in lexicographic order as permutations are exhausted
...
[3,2,0] // reached last permutation of current selection
[1,2,3] // get next selection in lexicographic order as permutations are exhausted
...
[3,2,1] // reached last permutation of current selection
Code (0-based indexing, so start with 0-k-1 initialization):
bool next_combination_with_order(vector<int>& a, int n, bool order_matters=true) {
int k = (int)a.size();
if(order_matters && std::next_permutation(a.begin(), a.end()))return True; // check if next permutation exists otherwise move forward and get next unique combination
// if `a` was already in descending order,
// next_permutation returned false and changed the array to sorted order.
// now find next combination if possible
for (int i = k - 1; i >= 0; i--) {
if (a[i] < n - k + i + 1) {
a[i]++;
for (int j = i + 1; j < k; j++)
a[j] = a[j - 1] + 1;
return true;
}
}
return false;
}
References:
Next permutations:
https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order
std::next_permutation Implementation Explanation
Next combination without order:
https://cp-algorithms.com/combinatorics/generating_combinations.html

How to make this Longest Increasing Subsequence program return this subsequence

I have this code for Longest Increasing Subsequence. Right now it returns length of the longest increasing subsequence, but I don't know how to make it return this exact subsequence. For ex. in this case it should return [3,6,7,8,9]. Any ideas? I would appreciate not using very complicated syntax :D
a = [3, 6, 7, 2, 1, 8, 9, 5]
n = len(a)
q = [0] * n
for k in range(n):
max = 0
for j in range(k):
if a[k] > a[j]:
if q[j] > max:
max = q[j]
q[k] = max + 1
return(max(q))
outer loop iterates after all elements from a and inner loop checks if element k from table is greater than item from index 0 to k-1
Thanks to that q table for this example looks like this [1,2,3,1,1,4,5,2] We can see that first element makes subsequence of length 1, second one makes subsequence of length 2 (with first element), third element makes subsequence of length 3 (with first and second elements), fourth element makes subsequence of length 1 because non of previous elements is smaller than it, and so on. So basically at each iteration it gets the length of the longest increasing subsequence ending at index k
Shorter version of the same program:
for i in range(n):
for j in range(i):
if a[i] > a[j]:
q[i] = max(q[i], 1 + q[j])
It would have been great if you had described what your code was doing but as I understand at each iteration it gets the length of the longest increasing subsequence ending at index k.
To trace back the actual indices in the array, just add another array previous[].
Initialize previous = [-1] * n.
And then
if q[j] > max:
max = q[j]
previous[k] = j

Counting valid sequences with dynamic programming

I am pretty new to Dynamic Programming, but I am trying to get better. I have an exercise from a book, which asks me the following question (slightly abridged):
You want to construct sequence of length N from numbers from the set {1, 2, 3, 4, 5, 6}. However, you cannot place the number i (i = 1, 2, 3, 4, 5, 6) more than A[i] times consecutively, where A is a given array. Given the sequence length N (1 <= N <= 10^5) and the constraint array A (1 <= A[i] <= 50), how many sequences are possible?
For instance if A = {1, 2, 1, 2, 1, 2} and N = 2, this would mean you can only have one consecutive 1, two consecutive 2's, one consecutive 3, etc. Here, something like "11" is invalid since it has two consecutive 1's, whereas something like "12" or "22" are both valid. It turns out that the actual answer for this case is 33 (there are 36 total two-digit sequences, but "11", "33", and "55" are all invalid, which gives 33).
Somebody told me that one way to solve this problem is to use dynamic programming with three states. More specifically, they say to keep a 3d array dp(i, j, k) with i representing the current position we are at in the sequence, j representing the element put in position i - 1, and k representing the number of times that this element has been repeated in the block. They also told me that for the transitions, we can put in position i every element different from j, and we can only put j in if A[j] > k.
It all makes sense to me in theory, but I've been struggling with implementing this. I have no clue how to begin with the actual implementation other than initializing the matrix dp. Typically, most of the other exercises had some sort of "base case" that were manually set in the matrix, and then a loop was used to fill in the other entries.
I guess I am particularly confused because this is a 3D array.
For a moment let's just not care about the array. Let's implement this recursively. Let dp(i, j, k) be the number of sequences with length i, last element j, and k consecutive occurrences of j at the end of the array.
The question now becomes how do we write the solution of dp(i, j, k) recursively.
Well we know that we are adding a j the kth time, so we have to take each sequence of length i - 1, and has j occurring k - 1 times, and add another j to that sequence. Notice that this is simply dp(i - 1, j, k - 1).
But what if k == 1? If that's the case we can add one occurence of j to every sequence of length i - 1 that doesn't end with j. Essentially we need the sum of all dp(i, x, k), such that A[x] >= k and x != j.
This gives our recurrence relation:
def dp(i, j, k):
# this is the base case, the number of sequences of length 1
# one if k is valid, otherwise zero
if i == 1: return int(k == 1)
if k > 1:
# get all the valid sequences [0...i-1] and add j to them
return dp(i - 1, j, k - 1)
if k == 1:
# get all valid sequences that don't end with j
res = 0
for last in range(len(A)):
if last == j: continue
for n_consec in range(1, A[last] + 1):
res += dp(i - 1, last, n_consec)
return res
We know that our answer will be all valid subsequences of length N, so our final answer is sum(dp(N, j, k) for j in range(len(A)) for k in range(1, A[j] + 1))
Believe it or not this is the basis of dynamic programming. We just broke our main problem down into a set of subproblems. Of course, right now our time is exponential because of the recursion. We have two ways to lower this:
Caching, we can simply keep track of the result of each (i, j, k) and then spit out what we originally computed when it's called again.
Use an array. We can reimplement this idea with bottom-up dp, and have an array dp[i][j][k]. All of our function calls just become array accesses in a for loop. Note that using this method forces us iterate over the array in topological order which may be tricky.
There are 2 kinds of dp approaches: top-down and bottom-up
In bottom up, you fill the terminal cases in dp table and then use for loops to build up from that. Lets consider bottom-up algo to generate Fibonacci sequence. We set dp[0] = 1 and dp[1] = 1 and run a for loop from i = 2 to n.
In top down approach, we start from the "top" view of the problem and go down from there. Consider the recursive function to get n-th Fibonacci number:
def fib(n):
if n <= 1:
return 1
if dp[n] != -1:
return dp[n]
dp[n] = fib(n - 1) + fib(n - 2)
return dp[n]
Here we don't fill the complete table, but only the cases we encounter.
Why I am talking about these 2 types is because when you start learning dp, it is often difficult to come up with bottom-up approaches (like you are trying to). When this happens, first you want to come up with a top-down approach, and then try to get a bottom up solution from that.
So let's create a recursive dp function first:
# let m be size of A
# initialize dp table with all values -1
def solve(i, j, k, n, m):
# first write terminal cases
if k > A[j]:
# this means sequence is invalid. so return 0
return 0
if i >= n:
# this means a valid sequence.
return 1
if dp[i][j][k] != -1:
return dp[i][j][k]
result = 0
for num = 1 to m:
if num == j:
result += solve(i + 1, num, k + 1, n)
else:
result += solve(i + 1, num, 1, n)
dp[i][j][k] = result
return dp[i][j][k]
So we know what terminal cases are. We create a dp table of size dp[n + 1][m][50]. Initialize it with all values 0, not -1.
So we can do bottom-up as:
# initially all values in table are zero. With loop below, we set the valid endings as 1.
# So any state trying to reach valid terminal states will get 1, but invalid states will
# return the values 0
for num = 1 to m:
for occour = 1 to A[num]:
dp[n][num][occour] = 1
# now to build up from bottom, we start by filling n-1 th position
for i = n-1 to 1:
for num = 1 to m:
for occour = 1 to A[num]:
for next_num = 1 to m:
if next_num != num:
dp[i][num][occour] += dp[i + 1][next_num][1]
else:
dp[i][num][occour] += dp[i + 1][num][occour + 1]
The answer will be:
sum = 0
for num = 1 to m:
sum += dp[1][num][1]
I am sure there must be some more elegant dp solution, but I believe this answers your question. Note that I considered that k is the number of times j-th number has been repeated consecutively, correct me if I am wrong with this.
Edit:
With the given constraints the size of the table will be, in the worst case, 10^5 * 6 * 50 = 3e7. This would be > 100MB. It is workable, but can be considered too much space use (I think some kernels doesn't allow that much stack space to a process). One way to reduce it would be to use a hash-map instead of an array with top down approach since top-down doesn't visit all the states. That would be mostly true in this case, for example if A[1] is 2, then all the other states where 1 has occoured more that twice need not be stored. Ofcourse this would not save much space if A[i] has large values, say [50, 50, 50, 50, 50, 50]. Another approach would be to modify our approach a bit. We dont actually need to store the dimension k, i.e. the times j has appeared consecutively:
dp[i][j] = no of ways from i-th position if (i - 1)th position didn't have j and i-th position is j.
Then, we would need to modify our algo to be like:
def solve(i, j):
if i == n:
return 1
if i > n:
return 0
if dp[i][j] != -1
return dp[i][j]
result = 0
# we will first try 1 consecutive j, then 2 consecutive j's then 3 and so on
for count = 1 to A[j]:
for num = 1 to m:
if num != j:
result += solve(i + count, num)
dp[i][j] = result
return dp[i][j]
This approach will reduce our space complexity to O(10^6) ~= 2mb, while time complexity is still the same : O(N * 6 * 50)

Finding largest sum in an unsorted array using divide and conquer algorithm

I have a sequence of n real numbers stored in a array, A[1], A[2], …, A[n]. I am trying to implement a divide and conquer algorithm to find two numbers A[i] and A[j], where i < j, such that A[i] ≤ A[j] and their sum is the largest.
For eg. {2, 5, 9, 3, -2, 7} will give the output of 14 (5+9, not 16=9+7). Can anyone suggest me some ideas on how to do it?
Thanks in advance.
This problem is not really suited to a divide and conquer approach. It's easy to observe that if (i, j) is a solution for this problem, then A[j] >= A[k] for every k > j, i.e A[j] is the maximum in A[j..n]
Prove: if there exists such k > j and A[k] > A[j], then (j, k) is a better solution than (i, j)
So we only need to consider js that satisfies that criteria.
Algorithm (pseudo-code)
maxj = n
for (j = n - 1 down to 1):
if (a[j] > a[maxj]) then:
maxj = j
else:
check if (j, maxj) is a better solution
Complexity: O(n)
C++ implementation: http://ideone.com/ENp5WR (The implementation use an integer array, but it should be the same for floats)
Declare two variables, during your algorithm check if the current number is bigger than either of the two values currently be stored in the variables, if yes replace the smallest, if not, continue.
Here's a recursive solution in Python. I wouldn't exactly call it "divide and conquer" but then again, this problem isn't very suited to a divide and conquer approach.
def recurse(lst, pair): # the remaining list left to process
if not lst: return # if lst is empty, return
for i in lst[1:]: # for each elements in lst starting from index 1
curr_sum = lst[0] + i
if lst[0] < i and curr_sum > pair[0]+pair[1]: # if the first value is less than the second and curr_sum is greater than the max sum so far
pair[0] = lst[0]
pair[1] = i # update pair to contain the new pair of values that give the max sum
recurse(lst[1:], pair) # recurse on the sub list from index 1 to the end
def find_pair(s):
if len(s) < 2: return s[0]
pair = [s[0],s[1]] # initialises pair array
recurse(s, pair) # passed by reference
return pair
Sample output:
s = [2, 5, 9, 3, -2, 7]
find_pair(s) # ============> (5,9)
I think you can just use an algorithm in O(n) as described follow
(The merge part uses constant time)
Here is the outline of the algorithm:
Divide the problem into two half: LHS & RHS
Each half should returned the largest answer meeting the requirement in that half AND the largest element in that half
Merge and return the answer to upper level: answer is the maximum of LHS's answer, RHS's answer, and the sum of the largest element in both half (consider this only if RHS's largest element >= LHS's largest element)
Here is the demonstration of the algorithm using your example: {2, 5, 9, 3, -2, 7}
Divide into {2,5,9}, {3,-2,7}
Divide into {2,5}, {9}, {3,-2}, {7}
{2,5} return max(2,5, 5+2) = 7, largest element = 5
{9} return 9, largest element = 9
{3,-2} return max(3,-2) = 3, largest element = 3
{7} return 7, largest element = 7
{2,5,9} merged from {2,5} & {9}: return max(7,9,9+5) = 14, largest element = max(9,5) = 9
{3,-2,7} merged from {3,-2} & {7}: return max(3,7,7+3) = 10, largest element = max(7,3) = 7
{2,5,9,3,-2,7} merged from {2,5,9} and {3,-2,7}: return max(14,10) = 14, largest element = max(9,7) = 9
ans = 14
Special cases like {5,4,3,2,1} which yields no answer needs extra handling but not affecting the core part and the complexity of the algorithm.

Sum of continuous sequences

Given an array A with N elements, I want to find the sum of minimum elements in all the possible contiguous sub-sequences of A. I know if N is small we can look for all possible sub sequences but as N is upto 10^5 what can be best way to find this sum?
Example: Let N=3 and A[1,2,3] then ans is 10 as Possible contiguous sub sequences {(1),(2),(3),(1,2),(1,2,3),(2,3)} so Sum of minimum elements = 1 + 2 + 3 + 1 + 1 + 2 = 10
Let's fix one element(a[i]). We want to know the position of the rightmost element smaller than this one located to the left from i(L). We also need to know the position of the leftmost element smaller than this one located to the right from i(R).
If we know L and R, we should add (i - L) * (R - i) * a[i] to the answer.
It is possible to precompute L and R for all i in linear time using a stack. Pseudo code:
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
We can reverse the array and run the same algorithm to find R.
How to deal with equal elements? Let's assume that a[i] is a pair <a[i], i>. All elements are distinct now.
The time complexity is O(n).
Here is a full pseudo code(I assume that int can hold any integer value here, you should
choose a feasible type to avoid an overflow in a real code. I also assume that all elements are distinct):
int[] getLeftSmallerElementPositions(int[] a):
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
return L
int[] getRightSmallerElementPositions(int[] a):
R = getLeftSmallerElementPositions(reversed(a))
for i <- 0 ... n - 1:
R[i] = n - 1 - R[i]
return reversed(R)
int findSum(int[] a):
L = getLeftSmallerElementPositions(a)
R = getRightSmallerElementPositions(a)
int res = 0
for i <- 0 ... n - 1:
res += (i - L[i]) * (R[i] - i) * a[i]
return res
If the list is sorted, you can consider all subsets for size 1, then 2, then 3, to N. The algorithm is initially somewhat inefficient, but an optimized version is below. Here's some pseudocode.
let A = {1, 2, 3}
let total_sum = 0
for set_size <- 1 to N
total_sum += sum(A[1:N-(set_size-1)])
First, sets with one element:{{1}, {2}, {3}}: sum each of the elements.
Then, sets of two element {{1, 2}, {2, 3}}: sum each element but the last.
Then, sets of three elements {{1, 2, 3}}: sum each element but the last two.
But this algorithm is inefficient. To optimize to O(n), multiply each ith element by N-i and sum (indexing from zero here). The intuition is that the first element is the minimum of N sets, the second element is the minimum of N-1 sets, etc.
I know it's not a python question, but sometimes code helps:
A = [1, 2, 3]
# This is [3, 2, 1]
scale = range(len(A), 0, -1)
# Take the element-wise product of the vectors, and sum
sum(a*b for (a,b) in zip(A, scale))
# Or just use the dot product
np.dot(A, scale)

Resources