Find largest continuous sum such that the minimum of it and it's complement is largest - algorithm

I'm given a sequence of numbers a_1,a_2,...,a_n. It's sum is S=a_1+a_2+...+a_n and I need to find a subsequence a_i,...,a_j such that min(S-(a_i+...+a_j),a_i+...+a_j) is the largest possible (both sums must be non-empty).
Example:
1,2,3,4,5 the sequence is 3,4, because then min(S-(a_i+...+a_j),a_i+...+a_j)=min(8,7)=7 (and it's the largest possible which can be checked for other subsequences).
I tried to do this the hard way.
I load all values into the array tab[n].
I do this n-1 times tab[i]+=tab[i-j]. So that tab[j] is the sum from the beginning till j.
I check all possible sums a_i+...+a_j=tab[j]-tab[i-1] and substract it from the sum, take the minimum and see if it's larger than before.
It takes O(n^2). This makes me very sad and miserable. Is there a better way?

Seems like this can be done in O(n) time.
Compute the sum S. The ideal subsequence sum is the longest one which gets closest to S/2.
Start with i=j=0 and increase j until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and save the values of i_best,j_best,sum_best.
Increment i and then increase j again until sum(a_i..a_j) and sum(a_i..a_{j+1}) are as close as possible to S/2. Note which ever is closer and replace the values of i_best,j_best,sum_best if they are better. Repeat this step until done.
Note that both i and j are never decremented, so they are changed a total of at most O(n) times. Since all other operations take only constant time, this results in an O(n) runtime for the entire algorithm.

Let's first do some clarifications.
A subsequence of a sequence is actually a subset of the indices of the sequence. Haivng said that, and specifically int he case where you sequence has distinct elements, your problem will reduce to the famous Partition problem, which is known to be NP-complete. If that is the case, you can manage to solve the problem in O(Sn) where "n" is the number of elements and "S" is the total sum. This is not polynomial time as "S" can be arbitrarily large.
So lets consider the case with a contiguous subsequence. You need to observe array elements twice. First run sums them up into some "S". In the second run you carefully adjust array length. Lets assume you know that a[i] + a[i + 1] + ... + a[j] > S / 2. Then you let i = i + 1 to reduce the sum. Conversely, if it was smaller, you would increase j.
This code runs in O(n).
Python code:
from math import fabs
a = [1, 2, 3, 4, 5]
i = 0
j = 0
S = sum(a)
s = 0
while s + a[j] <= S / 2:
s = s + a[j]
j = j + 1
s = s + a[j]
best_case = (i, j)
best_difference = fabs(S / 2 - s)
while True:
if fabs(S / 2 - s) < best_difference:
best_case = (i, j)
best_difference = fabs(S / 2 - s)
if s > S / 2:
s -= a[i]
i += 1
else:
j += 1
if j == len(a):
break
s += a[j]
print best_case
i = best_case[0]
j = best_case[1]
print "Best subarray = ", a[i:j + 1]
print "Best sum = " , sum(a[i:j + 1])

Related

Get highest score in this game: choosing and removing elements in an array

Given an array arr of n integers, what is the highest score that a player can reach, playing the following game?
Choose an index 0 < i < n-1 in the array
Add arr[i-1] * arr[i+1] points to the score (initially the score is 0)
Shrink the array by removing element i (forall j >= i: arr[j] = arr[j+1]; then n = n - 1
Repeat steps 1-3 until n == 2.
Do the above until there are only 2 elements (which are the first and the last element because you can't remove them).
What is the highest score you can get ?
Example
arr = [1 2 3 4]
Choose i=2, get: 2*4 = 8 points, remove 3
Remaining: arr = [1 2 4]
Choose i=1, get 1*4 = 4 points, remove 2
Remaining: arr = [1 4].
The sum of points is 8 + 4 = 12, which is the highest possible score on this example.
I think it is related to Dynamic programming but I'm not sure how to solve it.
This problem has a dynamic programming approach similar to Matrix-chain multiplication problem. You can find further explanation in the book "Introduction to Algorithms", 3rd Edition (Cormen, page 370).
Let's find the optimal substructure property and then use it to construct an optimal solution to the problem from optimal solutions to subproblems.
Notation: Ci..j, where i ≤ j, stands for elements Ci,Ci+1,...,Cj.
Definition: A removal sequence for Ci..j is a permutation of i+1,i+2,...,j-1.
A removal sequence for Ci..j is optimal if the score achieved by removing the elements of Ci..j in that order is maximum among all possible removal sequences for Ci..j.
1. Characterize the structure of an optimal solution
If the problem is nontrivial, i.e. i + 1 < j, then any solution has a last removed element which corresponding index is k in the range
i < k < j. Such k split the problem into Ci..k and Ck..j. That is, for some value k, we first remove non extremal elements of Ci..k and Ck..j and then we remove element k. As removing non extremal elements of Ci..k doesn't affect score obtained by removing non extremal elements of Ck..j and an analogous reasoning for removing non extremal elements of Ck..j is also true we state that both subproblems are independent. Then, for a given removal sequence where kth-element is last, the score of Ci..j is equal to the sum of scores of Ci..k and Ck..j, plus the score of removing kth-element (C[i] * C[j]).
The optimal substructure of this problem is as follows. Suppose there is an optimal removal sequence O for Ci..j that ends at kth-element, then the ordering of removed elements from Ci..k must be optimal too. We can prove it by contradiction: If there was a removal sequence for Ci..k that scored higher than removal subsequence extracted from O for Ci..k then we can produce another removal sequence for Ci..j with higher score than optimal removal sequence (contradiction). A similar observation holds for the ordering of removed elements from Ck..j in the optimal removal sequence for Ci..j: it must be optimal too.
We can build an optimal solution for nontrivial instances of the problem by splitting the problem into two subproblems, finding optimal solutions to subproblem instances, and them combining these optimal subproblem solutions.
2. Recursively define the value of an optimal solution.
For this problem our subproblems are the maximum score obtained in Ci..j for 1 ≤ i ≤ j ≤ N. Let S[i, j] be the maximum score obtained in Ci..j; for the full problem, the highest score when evaluating the given rules is S[1, N].
We can define S[i, j] recursively as follows:
If j ≤ i + 1 then S[i, j] = 0
If i + 1 < j then S[i, j] = maxi < k < j{S[i, k] + S[k, j] + C[i] * C[j]}
We ensure that we search for the correct place to split because we consider all possible places, so that we are sure of having examined the optimal one.
3. Compute the value of an optimal solution
You can use your favorite method to compute S:
top-down approach (recursive)
bottom-up approach (iterative)\
I would use bottom-up for computing the solution since it would be < 5 lines long in almost any programming language.
Example in C++11:
for(int l = 2; l <= N; ++l) \\ increasing length intervals
for(int i = 1, j = i + l; j <= N; ++i, ++j)
for(int k = i + 1; k < j; ++k)
S[i, j] = max(S[i, j], S[i, k] + S[k, j] + C[i] * C[j])
4. Time Complexity and Space Complexity
There are nC2 + n = Θ(n2) subproblems and every subproblem do an operation which running time is Θ(l) where l is length of the subproblem so the math yield a running time of Θ(n3) for the algorithm (it's easy to spot the O(n3) part :-)). Also, the algorithm requires Θ(n2) space to store the S table.

What is the minimum cost of arranging the range a (n) (a [i] <= 20) such that the equal values will form a continuous segment?

You provide 1 string: a1, a2..an (a [i] <= 20)
Requirement: The minimum cost (number of steps) to swap any two elements in the sequence so that the final sequence obtained has equal values ​​that lie in succession:
Each step you can only choose 2 adjacent values to swap: swap (a [i], a [i + 1]) = 1steps
example:
1 1 3 1 3 2 3 2
Swap (a [3], a [4])
Swap (a [6], a [7])
-> 1 1 1 3 3 3 2 2
minimum = 2
I need your help.
Note that since A[i] <= 20 we can go ahead and enumerate every subset of all A[i] and fit comfortably within any time constraints.
Let M be the number of unique A[i], then there is a O(NM + M * 2^M) dynamic programming solution with bitmasks.
note that when I say moving an A[i] I mean moving every element with value A[i]
To understand how we do this let's first consider the brute force solution. We have some set of unique A[i] moved to the front of the string, and then at each step we pick the next A[i] to move behind what we had originally. This is O(M! * N).
There's one important observation to be made here: if we have some set of A[i] at the start of the string, and then we move the next one, the order of our original set of A[i] doesn't actually matter. Any move will cost the same regardless of the order.
Let cost(subset, A[i]) be the cost of moving all A[i] behind that subset of A[i] at the front of the string. Then we can write the following:
dp = [float('inf')] * (1 << M) # every subset of A[i]
dp[0] = 0
for mask in range(len(dp)):
for bit in range(M):
# if this A[i] hasn't been moved to the front, we move it to the front
if (mask >> bit) & 1 == 0:
dp[mask^(1 << bit)] = min(dp[mask^(1 << bit)], dp[mask] + cost(mask, bit))
If we compute cost naively then we have O(M * 2^M * N). However we can precompute every value of cost with O(1) per value.
Here's how we can do this:
Idea: The number of swaps needed to sort an array is the number of inversions.
Let's define a new array inversions[M][M], where inversions[i][j] is the number of times j comes after i in the arr. For clarity here's how we would compute it naively:
for i in range(len(arr)):
for j in range(i + 1, len(arr)):
if arr[i] != arr[j]: inversions[arr[i]][arr[j]] += 1
Assume that we have inversions, then we can compute cost(subset, A[i]) like so:
cost = 0
for bit in range(M):
# if bit isn't in the mask and thus needs to get swapped with A[i]
if (subset >> bit) & 1 == 0:
cost += inversions[bit][A[i]]
What's left is the following:
Compute inversions in O(NM). This can be done with keeping a count of each M at each index in N.
Currently cost is O(M) and not O(1). We can run a separate dynamic programming on cost to build an array cost[(1 << M)][M], where cost[i][j] is the cost to move item j to subset i.
For sake of completeness here is a complete code written in C++. It's my submission for the same problem on codeforces. Note that in that code cost is named contribution.

Finding best algorithm for sum of a section of an array's values

Given an array of n integers in the locations A[1], A[2], …, A[n], describe an O(n^2) time algorithm to
compute the sum A[i] + A[i+1] + … + A[j] for all i, j, 1 ≤ i < j ≤ n.
I've tried multiple ways of solving this problem but none have in O(n^2) time.
So for an array containing {1,2,3,4}
You would output:
1+2 = 3
1+2+3 = 6
1+2+3+4 = 10
2+3 = 5
2+3+4 = 9
3+4 = 7
The answer does not need to be in a specific language, pseudocode is preferred.
A good preperation is everything.
You could create an array of integrals:
I[0..n] = (0, I[0] + A[1], I[1] + A[2], ..., I[n-1]+A[n]);
This will cost you O(n) * O(1) (looping over all elements and doing one addition);
Now you can calculate each Sum(A, i, j) with just a single subtraction: I[j] - I[i-1];
so this has O(1)
Looping over all combinations of i and j with 1 <= (i,j) <= n has O(n^2).
So you end up with O(n) * O(1) + O(n^2) * O(1) = O(n^2) .
Edit:
Your array A starts at 1 - adapted to this - this also solves the little quirk with i-1
So the integral array I starts with index 0 and is 1 element larger than A
Edit:
First you'll maybe have thought about the most naive idea:
Naive idea
Create a function that for given values of i and of j will return the sum A[i] + ... + A[j].
function sumRange(A, i, j):
sum = 0
for k = i to j
sum = sum + A[k]
return sum
Then generate all pairs of i and j (with i < j) and call the above function for each pair:
for i = 1 to n
for j = i+1 to n
output sumRange(A, i, j)
This is not O(n²), because already the two loops on i and j represent O(n²) iterations, and then the function will perform yet another loop, making it O(n³).
Better idea
The above can be improved. Look at the repetition it performs. The sum that was calculated for given values of i and j could be reused to calculate the sum for when j has increased with 1, without starting from scratch and summing the values between i and (now) j-1 again, only to add that one more value to it.
We should just remember what the previous sum was, and add A[j] to it.
So without a separate function:
for i = 1 to n
sum = A[i]
for j = i+1 to n
sum = sum + A[j]
output sum
Note how the sum is not reset to 0 once it is output. It is preserved, so that when j is incremented, only one value needs to be added to it.
Now it is O(n²). Note also how it does not require an extra array for storage. It only needs the memory for a few variables (i, j, sum), so its space complexity is O(1).
As the number of sums you need to output is O(n²), there is no way to improve this time complexity any further.
NB: I assume here that single array values do not constitute a "sum". As you stated in your question, i < j, and also in your example you only showed sums of at least two array values. The above can be easily adapted to also include single value "sums" if ever that were needed.

Counting valid sequences with dynamic programming

I am pretty new to Dynamic Programming, but I am trying to get better. I have an exercise from a book, which asks me the following question (slightly abridged):
You want to construct sequence of length N from numbers from the set {1, 2, 3, 4, 5, 6}. However, you cannot place the number i (i = 1, 2, 3, 4, 5, 6) more than A[i] times consecutively, where A is a given array. Given the sequence length N (1 <= N <= 10^5) and the constraint array A (1 <= A[i] <= 50), how many sequences are possible?
For instance if A = {1, 2, 1, 2, 1, 2} and N = 2, this would mean you can only have one consecutive 1, two consecutive 2's, one consecutive 3, etc. Here, something like "11" is invalid since it has two consecutive 1's, whereas something like "12" or "22" are both valid. It turns out that the actual answer for this case is 33 (there are 36 total two-digit sequences, but "11", "33", and "55" are all invalid, which gives 33).
Somebody told me that one way to solve this problem is to use dynamic programming with three states. More specifically, they say to keep a 3d array dp(i, j, k) with i representing the current position we are at in the sequence, j representing the element put in position i - 1, and k representing the number of times that this element has been repeated in the block. They also told me that for the transitions, we can put in position i every element different from j, and we can only put j in if A[j] > k.
It all makes sense to me in theory, but I've been struggling with implementing this. I have no clue how to begin with the actual implementation other than initializing the matrix dp. Typically, most of the other exercises had some sort of "base case" that were manually set in the matrix, and then a loop was used to fill in the other entries.
I guess I am particularly confused because this is a 3D array.
For a moment let's just not care about the array. Let's implement this recursively. Let dp(i, j, k) be the number of sequences with length i, last element j, and k consecutive occurrences of j at the end of the array.
The question now becomes how do we write the solution of dp(i, j, k) recursively.
Well we know that we are adding a j the kth time, so we have to take each sequence of length i - 1, and has j occurring k - 1 times, and add another j to that sequence. Notice that this is simply dp(i - 1, j, k - 1).
But what if k == 1? If that's the case we can add one occurence of j to every sequence of length i - 1 that doesn't end with j. Essentially we need the sum of all dp(i, x, k), such that A[x] >= k and x != j.
This gives our recurrence relation:
def dp(i, j, k):
# this is the base case, the number of sequences of length 1
# one if k is valid, otherwise zero
if i == 1: return int(k == 1)
if k > 1:
# get all the valid sequences [0...i-1] and add j to them
return dp(i - 1, j, k - 1)
if k == 1:
# get all valid sequences that don't end with j
res = 0
for last in range(len(A)):
if last == j: continue
for n_consec in range(1, A[last] + 1):
res += dp(i - 1, last, n_consec)
return res
We know that our answer will be all valid subsequences of length N, so our final answer is sum(dp(N, j, k) for j in range(len(A)) for k in range(1, A[j] + 1))
Believe it or not this is the basis of dynamic programming. We just broke our main problem down into a set of subproblems. Of course, right now our time is exponential because of the recursion. We have two ways to lower this:
Caching, we can simply keep track of the result of each (i, j, k) and then spit out what we originally computed when it's called again.
Use an array. We can reimplement this idea with bottom-up dp, and have an array dp[i][j][k]. All of our function calls just become array accesses in a for loop. Note that using this method forces us iterate over the array in topological order which may be tricky.
There are 2 kinds of dp approaches: top-down and bottom-up
In bottom up, you fill the terminal cases in dp table and then use for loops to build up from that. Lets consider bottom-up algo to generate Fibonacci sequence. We set dp[0] = 1 and dp[1] = 1 and run a for loop from i = 2 to n.
In top down approach, we start from the "top" view of the problem and go down from there. Consider the recursive function to get n-th Fibonacci number:
def fib(n):
if n <= 1:
return 1
if dp[n] != -1:
return dp[n]
dp[n] = fib(n - 1) + fib(n - 2)
return dp[n]
Here we don't fill the complete table, but only the cases we encounter.
Why I am talking about these 2 types is because when you start learning dp, it is often difficult to come up with bottom-up approaches (like you are trying to). When this happens, first you want to come up with a top-down approach, and then try to get a bottom up solution from that.
So let's create a recursive dp function first:
# let m be size of A
# initialize dp table with all values -1
def solve(i, j, k, n, m):
# first write terminal cases
if k > A[j]:
# this means sequence is invalid. so return 0
return 0
if i >= n:
# this means a valid sequence.
return 1
if dp[i][j][k] != -1:
return dp[i][j][k]
result = 0
for num = 1 to m:
if num == j:
result += solve(i + 1, num, k + 1, n)
else:
result += solve(i + 1, num, 1, n)
dp[i][j][k] = result
return dp[i][j][k]
So we know what terminal cases are. We create a dp table of size dp[n + 1][m][50]. Initialize it with all values 0, not -1.
So we can do bottom-up as:
# initially all values in table are zero. With loop below, we set the valid endings as 1.
# So any state trying to reach valid terminal states will get 1, but invalid states will
# return the values 0
for num = 1 to m:
for occour = 1 to A[num]:
dp[n][num][occour] = 1
# now to build up from bottom, we start by filling n-1 th position
for i = n-1 to 1:
for num = 1 to m:
for occour = 1 to A[num]:
for next_num = 1 to m:
if next_num != num:
dp[i][num][occour] += dp[i + 1][next_num][1]
else:
dp[i][num][occour] += dp[i + 1][num][occour + 1]
The answer will be:
sum = 0
for num = 1 to m:
sum += dp[1][num][1]
I am sure there must be some more elegant dp solution, but I believe this answers your question. Note that I considered that k is the number of times j-th number has been repeated consecutively, correct me if I am wrong with this.
Edit:
With the given constraints the size of the table will be, in the worst case, 10^5 * 6 * 50 = 3e7. This would be > 100MB. It is workable, but can be considered too much space use (I think some kernels doesn't allow that much stack space to a process). One way to reduce it would be to use a hash-map instead of an array with top down approach since top-down doesn't visit all the states. That would be mostly true in this case, for example if A[1] is 2, then all the other states where 1 has occoured more that twice need not be stored. Ofcourse this would not save much space if A[i] has large values, say [50, 50, 50, 50, 50, 50]. Another approach would be to modify our approach a bit. We dont actually need to store the dimension k, i.e. the times j has appeared consecutively:
dp[i][j] = no of ways from i-th position if (i - 1)th position didn't have j and i-th position is j.
Then, we would need to modify our algo to be like:
def solve(i, j):
if i == n:
return 1
if i > n:
return 0
if dp[i][j] != -1
return dp[i][j]
result = 0
# we will first try 1 consecutive j, then 2 consecutive j's then 3 and so on
for count = 1 to A[j]:
for num = 1 to m:
if num != j:
result += solve(i + count, num)
dp[i][j] = result
return dp[i][j]
This approach will reduce our space complexity to O(10^6) ~= 2mb, while time complexity is still the same : O(N * 6 * 50)

Maximum sum of all contiguous subarrays of prime length

I was recently asked this question in an interview,
Given an array of non-negative integers find the
maximum cumulative sum that could be obtained such that the length of all the
participating subarray is a prime number. I tried to come up with a solution for this using Dynamic Programming but unfortunately could not.
Eg: If the array is [9,8,7,6,5,4,3,1,2,2] it should return 46 (sum of the subarray [9,8,7,6,5,4,3] of length 7 and [2,2] of length 2). You cannot combine [9,8,7,6,5,4,3] and [1,2,2] since it would result in a contiguous subarray (idempotency) of length 10 which is non prime.
Can anyone explain how to solve such problems using DP? Thanks.
What you can do:
take the length of the list and go back until you find a prime number
get a window of elements and sum them
check if it's the maximum sum and in case it is, store it
go to the next window
This works because of the constraint that all integers are positive, it would not work otherwise.
Basically something like this (very roughly, in pseudo-python, obviously not tested):
input_list = (8, 1, 3, 4, 5, 2)
list_size = len(input_list)
while (list_size):
if (is_prime(list_size)):
window_size = list_size
break
list_size--
max_sum = -1
for i in xrange(0, list_size - window_size):
current_sum = sum(input_list[i:i+window_size])
if (max_sum < current_sum):
max_sum = current_sum
print max_sum
How about something like this (approximately) O(n * n / log n) time, O(n) space, DP?
Let f(i) represent the greatest sum up to index i where a[i] is either excluded from a contiguous subset or is the last of a subset of prime length. Then:
f(i) = sum(a[0]...a[i]) if (i + 1) is prime, otherwise
max(
// a[i] excluded
f(i-1),
f(i-2),
// a[i] is last of a subset
sum(a[i - primes[j] + 1]...a[i]) + f(i - primes[j] - 1)
for primes[j] <= i
)
(Summing the intervals can be done in O(1) time with O(n) preprocessing of prefix-sums.)
Since others have solved the problem for non-negative integers.
But if you have -ve numbers also, then also this algorithm will work.
I think you have to slightly tweak Kadane's Algo.
Following are the changes for modified Kadane's Algo. All lines marked ** are changes.
Initialize:
max_so_far = 0
max_ending_here = 0
** MAX_SO_FAR_FOR_PRIMES =0
** SUB_ARRAY_SIZE = 0
Loop for each element of the array
max_ending_here = max_ending_here + a[i]
** SUB_ARRAY_SIZE = SUB_ARRAY_SIZE + 1 // since a[i] included in subarray, increase sub_array_size
if(max_ending_here < 0)
max_ending_here = 0
** SUB_ARRAY_SIZE = 0
if(max_so_far < max_ending_here)
max_so_far = max_ending_here
** if(MAX_SO_FAR_FOR_PRIMES < max_ending_here && isPrime(SUB_ARRAY_SIZE)) // comparing when SUB_ARRAY_SIZE is Prime.
** MAX_SO_FAR_FOR_PRIMES = max_ending_here.
return MAX_SO_FAR_FOR_PRIMES
Basically, take one more variable MAX_SO_FAR_FOR_PRIMES, which keeps the maximum sum subarray for prime sized subarray so far.
Also store the SUB_ARRAY_SIZE, which stores the size of the sub-array anytime during looping.
Now just compare you MAX_SO_FAR_FOR_PRIMES with your max_ending_here whenever the SUBARRAY_SIZE is prime. And update `MAX_SO_FAR_FOR_PRIMES1 accordingly.

Resources