Algorithm for subarray of length k with maximum sum - algorithm

I was asked this question in an interview:
Find the maximum subarray of elements with length of k
For example:
Input: [1,-5,4,3,6,8,2,4], k = 3
Output: [3,6,8]
I thought to just take all possible slices of the input array and calculate the sum of each, and then keep the greatest sum. It turns out this is not efficient.
How can this be done more efficiently?

The idea is that you slide a window over the array.
Use two indices in the array: one at 0, one at k. Calculate the sum of the values in that range (up to and including the value at k-1). Then move the window and adapt the sum. Remember which was the greatest sum and its location.
Here in JavaScript implementation:
function greatestSum(arr, k) {
const n = arr.length;
// Initialise
let sum = 0;
for (let i = 0; i < k; i++) {
sum += arr[i];
}
let maxSum = sum;
let maxStart = 0;
// Look for all alternative subarrays
for (let i = 0; i < n - k; i++) {
sum = sum - arr[i] + arr[i + k]; // Move the window
if (sum > maxSum) { // Improvement found
maxSum = sum;
maxStart = i + 1;
}
}
// Extract the slice that has the maximum sum
return arr.slice(maxStart, maxStart + k);
}
const arr = [1,-5,4,3,6,8,2,4]
const k = 3;
const output = greatestSum(arr, k);
console.log(output); // [3,6,8]

Related

Find a subsequence of length k whose sum is equal to given sum

Given an array A and a sum, I want to find out if there exists a subsequence of length K such that the sum of all elements in the subsequence equals the given sum.
Code:
for i in(1,N):
for len in (i-1,0):
for sum in (0,Sum of all element)
Possible[len+1][sum] |= Possible[len][sum-A[i]]
Time complexity O(N^2.Sum). Is there any way to improve the time complexity to O(N.Sum)
My function shifts a window of k adjacent array items across the array A and keeps the sum up-to-data until it matches of the search fails.
int getSubSequenceStart(int A[], size_t len, int sum, size_t k)
{
int sumK = 0;
assert(len > 0);
assert(k <= len);
// compute sum for first k items
for (int i = 0; i < k; i++)
{
sumK += A[i];
}
// shift k-window upto end of A
for (int j = k; j < len; j++)
{
if (sumK == sum)
{
return j - k;
}
sumK += A[j] - A[j - k];
}
return -1;
}
Complexity is linear with the length of array A.
Update for the non-contiguous general subarray case:
To find a possibly non-contiguous subarray, you could transform your problem into a subset sum problem by subtracting sum/k from every element of A and looking for a subset with sum zero. The complexity of the subset sum problem is known to be exponential. Therefore, you cannot hope for a linear algorithm, unless your array A has special properties.
Edit:
This could actually be solved without the queue in linear time (negative numbers allowed).
C# code:
bool SubsequenceExists(int[] a, int k, int sum)
{
int currentSum = 0;
if (a.Length < k) return false;
for (int i = 0; i < a.Length; i++)
{
if (i < k)
{
currentSum += a[i];
continue;
}
if (currentSum == sum) return true;
currentSum += a[i] - a[i-k];
}
return false;
}
Original answer:
Assuming you can use a queue of length K something like that should do the job in linear time.
C# code:
bool SubsequenceExists(int[] a, int k, int sum)
{
int currentSum = 0;
var queue = new Queue<int>();
for (int i = 0; i < a.Length; i++)
{
if (i < k)
{
queue.Enqueue(a[i]);
currentSum += a[i];
continue;
}
if (currentSum == sum) return true;
currentSum -= queue.Dequeue();
queue.Enqueue(a[i]);
currentSum += a[i];
}
return false;
}
The logic behind that is pretty much straightforward:
We populate a queue with first K elements while also storing its sum somewhere.
If the resulting sum is not equal to sum then we dequeue an element from the queue and add the next one from A (while updating the sum).
We repeat step 2 until we either reach the end of sequence or find the matching subsequence.
Ta-daa!
Let is_subset_sum(int set[], int n, int sum) be the function to find whether there is a subset of set[] with sum equal to sum. n is the number of elements in set[].
The is_subset_sum problem can be divided into two subproblems
Include the last element, recur for n = n-1, sum = sum – set[n-1]
Exclude the last element, recur for n = n-1.
If any of the above subproblems return true, then return true.
Following is the recursive formula for is_subset_sum() problem.
is_subset_sum(set, n, sum) = is_subset_sum(set, n-1, sum) || is_subset_sum(set, n-1, sum-set[n-1])
Base Cases:
is_subset_sum(set, n, sum) = false, if sum > 0 and n == 0
is_subset_sum(set, n, sum) = true, if sum == 0
We can solve the problem in Pseudo-polynomial time using Dynamic programming. We create a boolean 2D table subset[][] and fill it in a bottom-up manner. The value of subset[i][j] will be true if there is a subset of set[0..j-1] with sum equal to i., otherwise false. Finally, we return subset[sum][n]
The time complexity of the solution is O(sum*n).
Implementation in C
// A Dynamic Programming solution for subset sum problem
#include <stdio.h>
// Returns true if there is a subset of set[] with sun equal to given sum
bool is_subset_sum(int set[], int n, int sum) {
// The value of subset[i][j] will be true if there is a
// subset of set[0..j-1] with sum equal to i
bool subset[sum+1][n+1];
// If sum is 0, then answer is true
for (int i = 0; i <= n; i++)
subset[0][i] = true;
// If sum is not 0 and set is empty, then answer is false
for (int i = 1; i <= sum; i++)
subset[i][0] = false;
// Fill the subset table in botton up manner
for (int i = 1; i <= sum; i++) {
for (int j = 1; j <= n; j++) {
subset[i][j] = subset[i][j-1];
if (i >= set[j-1])
subset[i][j] = subset[i][j] || subset[i - set[j-1]][j-1];
}
}
/* // uncomment this code to print table
for (int i = 0; i <= sum; i++) {
for (int j = 0; j <= n; j++)
printf ("%4d", subset[i][j]);
printf("\n");
} */
return subset[sum][n];
}
// Driver program to test above function
int main() {
int set[] = {3, 34, 4, 12, 5, 2};
int sum = 9;
int n = sizeof(set)/sizeof(set[0]);
if (is_subset_sum(set, n, sum) == true)
printf("Found a subset with given sum");
else
printf("No subset with given sum");
return 0;
}

Maximum subarray sum modulo M

Most of us are familiar with the maximum sum subarray problem. I came across a variant of this problem which asks the programmer to output the maximum of all subarray sums modulo some number M.
The naive approach to solve this variant would be to find all possible subarray sums (which would be of the order of N^2 where N is the size of the array). Of course, this is not good enough. The question is - how can we do better?
Example: Let us consider the following array:
6 6 11 15 12 1
Let M = 13. In this case, subarray 6 6 (or 12 or 6 6 11 15 or 11 15 12) will yield maximum sum ( = 12 ).
We can do this as follow:
Maintaining an array sum which at index ith, it contains the modulus sum from 0 to ith.
For each index ith, we need to find the maximum sub sum that end at this index:
For each subarray (start + 1 , i ), we know that the mod sum of this sub array is
int a = (sum[i] - sum[start] + M) % M
So, we can only achieve a sub-sum larger than sum[i] if sum[start] is larger than sum[i] and as close to sum[i] as possible.
This can be done easily if you using a binary search tree.
Pseudo code:
int[] sum;
sum[0] = A[0];
Tree tree;
tree.add(sum[0]);
int result = sum[0];
for(int i = 1; i < n; i++){
sum[i] = sum[i - 1] + A[i];
sum[i] %= M;
int a = tree.getMinimumValueLargerThan(sum[i]);
result = max((sum[i] - a + M) % M, result);
tree.add(sum[i]);
}
print result;
Time complexity :O(n log n)
Let A be our input array with zero-based indexing. We can reduce A modulo M without changing the result.
First of all, let's reduce the problem to a slightly easier one by computing an array P representing the prefix sums of A, modulo M:
A = 6 6 11 2 12 1
P = 6 12 10 12 11 12
Now let's process the possible left borders of our solution subarrays in decreasing order. This means that we will first determine the optimal solution that starts at index n - 1, then the one that starts at index n - 2 etc.
In our example, if we chose i = 3 as our left border, the possible subarray sums are represented by the suffix P[3..n-1] plus a constant a = A[i] - P[i]:
a = A[3] - P[3] = 2 - 12 = 3 (mod 13)
P + a = * * * 2 1 2
The global maximum will occur at one point too. Since we can insert the suffix values from right to left, we have now reduced the problem to the following:
Given a set of values S and integers x and M, find the maximum of S + x modulo M
This one is easy: Just use a balanced binary search tree to manage the elements of S. Given a query x, we want to find the largest value in S that is smaller than M - x (that is the case where no overflow occurs when adding x). If there is no such value, just use the largest value of S. Both can be done in O(log |S|) time.
Total runtime of this solution: O(n log n)
Here's some C++ code to compute the maximum sum. It would need some minor adaptions to also return the borders of the optimal subarray:
#include <bits/stdc++.h>
using namespace std;
int max_mod_sum(const vector<int>& A, int M) {
vector<int> P(A.size());
for (int i = 0; i < A.size(); ++i)
P[i] = (A[i] + (i > 0 ? P[i-1] : 0)) % M;
set<int> S;
int res = 0;
for (int i = A.size() - 1; i >= 0; --i) {
S.insert(P[i]);
int a = (A[i] - P[i] + M) % M;
auto it = S.lower_bound(M - a);
if (it != begin(S))
res = max(res, *prev(it) + a);
res = max(res, (*prev(end(S)) + a) % M);
}
return res;
}
int main() {
// random testing to the rescue
for (int i = 0; i < 1000; ++i) {
int M = rand() % 1000 + 1, n = rand() % 1000 + 1;
vector<int> A(n);
for (int i = 0; i< n; ++i)
A[i] = rand() % M;
int should_be = 0;
for (int i = 0; i < n; ++i) {
int sum = 0;
for (int j = i; j < n; ++j) {
sum = (sum + A[j]) % M;
should_be = max(should_be, sum);
}
}
assert(should_be == max_mod_sum(A, M));
}
}
For me, all explanations here were awful, since I didn't get the searching/sorting part. How do we search/sort, was unclear.
We all know that we need to build prefixSum, meaning sum of all elems from 0 to i with modulo m
I guess, what we are looking for is clear.
Knowing that subarray[i][j] = (prefix[i] - prefix[j] + m) % m (indicating the modulo sum from index i to j), our maxima when given prefix[i] is always that prefix[j] which is as close as possible to prefix[i], but slightly bigger.
E.g. for m = 8, prefix[i] being 5, we are looking for the next value after 5, which is in our prefixArray.
For efficient search (binary search) we sort the prefixes.
What we can not do is, build the prefixSum first, then iterate again from 0 to n and look for index in the sorted prefix array, because we can find and endIndex which is smaller than our startIndex, which is no good.
Therefore, what we do is we iterate from 0 to n indicating the endIndex of our potential max subarray sum and then look in our sorted prefix array, (which is empty at the beginning) which contains sorted prefixes between 0 and endIndex.
def maximumSum(coll, m):
n = len(coll)
maxSum, prefixSum = 0, 0
sortedPrefixes = []
for endIndex in range(n):
prefixSum = (prefixSum + coll[endIndex]) % m
maxSum = max(maxSum, prefixSum)
startIndex = bisect.bisect_right(sortedPrefixes, prefixSum)
if startIndex < len(sortedPrefixes):
maxSum = max(maxSum, prefixSum - sortedPrefixes[startIndex] + m)
bisect.insort(sortedPrefixes, prefixSum)
return maxSum
From your question, it seems that you have created an array to store the cumulative sums (Prefix Sum Array), and are calculating the sum of the sub-array arr[i:j] as (sum[j] - sum[i] + M) % M. (arr and sum denote the given array and the prefix sum array respectively)
Calculating the sum of every sub-array results in a O(n*n) algorithm.
The question that arises is -
Do we really need to consider the sum of every sub-array to reach the desired maximum?
No!
For a value of j the value (sum[j] - sum[i] + M) % M will be maximum when sum[i] is just greater than sum[j] or the difference is M - 1.
This would reduce the algorithm to O(nlogn).
You can take a look at this explanation! https://www.youtube.com/watch?v=u_ft5jCDZXk
There are already a bunch of great solutions listed here, but I wanted to add one that has O(nlogn) runtime without using a balanced binary tree, which isn't in the Python standard library. This solution isn't my idea, but I had to think a bit as to why it worked. Here's the code, explanation below:
def maximumSum(a, m):
prefixSums = [(0, -1)]
for idx, el in enumerate(a):
prefixSums.append(((prefixSums[-1][0] + el) % m, idx))
prefixSums = sorted(prefixSums)
maxSeen = prefixSums[-1][0]
for (a, a_idx), (b, b_idx) in zip(prefixSums[:-1], prefixSums[1:]):
if a_idx > b_idx and b > a:
maxSeen = max((a-b) % m, maxSeen)
return maxSeen
As with the other solutions, we first calculate the prefix sums, but this time we also keep track of the index of the prefix sum. We then sort the prefix sums, as we want to find the smallest difference between prefix sums modulo m - sorting lets us just look at adjacent elements as they have the smallest difference.
At this point you might think we're neglecting an essential part of the problem - we want the smallest difference between prefix sums, but the larger prefix sum needs to appear before the smaller prefix sum (meaning it has a smaller index). In the solutions using trees, we ensure that by adding prefix sums one by one and recalculating the best solution.
However, it turns out that we can look at adjacent elements and just ignore ones that don't satisfy our index requirement. This confused me for some time, but the key realization is that the optimal solution will always come from two adjacent elements. I'll prove this via a contradiction. Let's say that the optimal solution comes from two non-adjacent prefix sums x and z with indices i and k, where z > x (it's sorted!) and k > i:
x ... z
k ... i
Let's consider one of the numbers between x and z, and let's call it y with index j. Since the list is sorted, x < y < z.
x ... y ... z
k ... j ... i
The prefix sum y must have index j < i, otherwise it would be part of a better solution with z. But if j < i, then j < k and y and x form a better solution than z and x! So any elements between x and z must form a better solution with one of the two, which contradicts our original assumption. Therefore the optimal solution must come from adjacent prefix sums in the sorted list.
Here is Java code for maximum sub array sum modulo. We handle the case we can not find least element in the tree strictly greater than s[i]
public static long maxModulo(long[] a, final long k) {
long[] s = new long[a.length];
TreeSet<Long> tree = new TreeSet<>();
s[0] = a[0] % k;
tree.add(s[0]);
long result = s[0];
for (int i = 1; i < a.length; i++) {
s[i] = (s[i - 1] + a[i]) % k;
// find least element in the tree strictly greater than s[i]
Long v = tree.higher(s[i]);
if (v == null) {
// can't find v, then compare v and s[i]
result = Math.max(s[i], result);
} else {
result = Math.max((s[i] - v + k) % k, result);
}
tree.add(s[i]);
}
return result;
}
Few points from my side that might hopefully help someone understand the problem better.
You do not need to add +M to the modulo calculation, as mentioned, % operator handles negative numbers well, so a % M = (a + M) % M
As mentioned, the trick is to build the proxy sum table such that
proxy[n] = (a[1] + ... a[n]) % M
This then allows one to represent the maxSubarraySum[i, j] as
maxSubarraySum[i, j] = (proxy[j] - proxy[j]) % M
The implementation trick is to build the proxy table as we iterate through the elements, instead of first pre-building it and then using. This is because for each new element in the array a[i] we want to compute proxy[i] and find proxy[j] that is bigger than but as close as possible to proxy[i] (ideally bigger by 1 because this results in a reminder of M - 1). For this we need to use a clever data structure for building proxy table while keeping it sorted and
being able to quickly find a closest bigger element to proxy[i]. bisect.bisect_right is a good choice in Python.
See my Python implementation below (hope this helps but I am aware this might not necessarily be as concise as others' solutions):
def maximumSum(a, m):
prefix_sum = [a[0] % m]
prefix_sum_sorted = [a[0] % m]
current_max = prefix_sum_sorted[0]
for elem in a[1:]:
prefix_sum_next = (prefix_sum[-1] + elem) % m
prefix_sum.append(prefix_sum_next)
idx_closest_bigger = bisect.bisect_right(prefix_sum_sorted, prefix_sum_next)
if idx_closest_bigger >= len(prefix_sum_sorted):
current_max = max(current_max, prefix_sum_next)
bisect.insort_right(prefix_sum_sorted, prefix_sum_next)
continue
if prefix_sum_sorted[idx_closest_bigger] > prefix_sum_next:
current_max = max(current_max, (prefix_sum_next - prefix_sum_sorted[idx_closest_bigger]) % m)
bisect.insort_right(prefix_sum_sorted, prefix_sum_next)
return current_max
Total java implementation with O(n*log(n))
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.TreeSet;
import java.util.stream.Stream;
public class MaximizeSumMod {
public static void main(String[] args) throws Exception{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
Long times = Long.valueOf(in.readLine());
while(times --> 0){
long[] pair = Stream.of(in.readLine().split(" ")).mapToLong(Long::parseLong).toArray();
long mod = pair[1];
long[] numbers = Stream.of(in.readLine().split(" ")).mapToLong(Long::parseLong).toArray();
printMaxMod(numbers,mod);
}
}
private static void printMaxMod(long[] numbers, Long mod) {
Long maxSoFar = (numbers[numbers.length-1] + numbers[numbers.length-2])%mod;
maxSoFar = (maxSoFar > (numbers[0]%mod)) ? maxSoFar : numbers[0]%mod;
numbers[0] %=mod;
for (Long i = 1L; i < numbers.length; i++) {
long currentNumber = numbers[i.intValue()]%mod;
maxSoFar = maxSoFar > currentNumber ? maxSoFar : currentNumber;
numbers[i.intValue()] = (currentNumber + numbers[i.intValue()-1])%mod;
maxSoFar = maxSoFar > numbers[i.intValue()] ? maxSoFar : numbers[i.intValue()];
}
if(mod.equals(maxSoFar+1) || numbers.length == 2){
System.out.println(maxSoFar);
return;
}
long previousNumber = numbers[0];
TreeSet<Long> set = new TreeSet<>();
set.add(previousNumber);
for (Long i = 2L; i < numbers.length; i++) {
Long currentNumber = numbers[i.intValue()];
Long ceiling = set.ceiling(currentNumber);
if(ceiling == null){
set.add(numbers[i.intValue()-1]);
continue;
}
if(ceiling.equals(currentNumber)){
set.remove(ceiling);
Long greaterCeiling = set.ceiling(currentNumber);
if(greaterCeiling == null){
set.add(ceiling);
set.add(numbers[i.intValue()-1]);
continue;
}
set.add(ceiling);
ceiling = greaterCeiling;
}
Long newMax = (currentNumber - ceiling + mod);
maxSoFar = maxSoFar > newMax ? maxSoFar :newMax;
set.add(numbers[i.intValue()-1]);
}
System.out.println(maxSoFar);
}
}
Adding STL C++11 code based on the solution suggested by #Pham Trung. Might be handy.
#include <iostream>
#include <set>
int main() {
int N;
std::cin>>N;
for (int nn=0;nn<N;nn++){
long long n,m;
std::set<long long> mSet;
long long maxVal = 0; //positive input values
long long sumVal = 0;
std::cin>>n>>m;
mSet.insert(m);
for (long long q=0;q<n;q++){
long long tmp;
std::cin>>tmp;
sumVal = (sumVal + tmp)%m;
auto itSub = mSet.upper_bound(sumVal);
maxVal = std::max(maxVal,(m + sumVal - *itSub)%m);
mSet.insert(sumVal);
}
std::cout<<maxVal<<"\n";
}
}
As you can read in Wikipedia exists a solution called Kadane's algorithm, which compute the maximum subarray sum watching ate the maximum subarray ending at position i for all positions i by iterating once over the array. Then this solve the problem with with runtime complexity O(n).
Unfortunately, I think that Kadane's algorithm isn't able to find all possible solution when more than one solution exists.
An implementation in Java, I didn't tested it:
public int[] kadanesAlgorithm (int[] array) {
int start_old = 0;
int start = 0;
int end = 0;
int found_max = 0;
int max = array[0];
for(int i = 0; i<array.length; i++) {
max = Math.max(array[i], max + array[i]);
found_max = Math.max(found_max, max);
if(max < 0)
start = i+1;
else if(max == found_max) {
start_old=start;
end = i;
}
}
return Arrays.copyOfRange(array, start_old, end+1);
}
I feel my thoughts are aligned with what have been posted already, but just in case - Kotlin O(NlogN) solution:
val seen = sortedSetOf(0L)
var prev = 0L
return max(a.map { x ->
val z = (prev + x) % m
prev = z
seen.add(z)
seen.higher(z)?.let{ y ->
(z - y + m) % m
} ?: z
})
Implementation in java using treeset...
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.TreeSet;
public class Main {
public static void main(String[] args) throws IOException {
BufferedReader read = new BufferedReader(new InputStreamReader(System.in)) ;
String[] str = read.readLine().trim().split(" ") ;
int n = Integer.parseInt(str[0]) ;
long m = Long.parseLong(str[1]) ;
str = read.readLine().trim().split(" ") ;
long[] arr = new long[n] ;
for(int i=0; i<n; i++) {
arr[i] = Long.parseLong(str[i]) ;
}
long maxCount = 0L ;
TreeSet<Long> tree = new TreeSet<>() ;
tree.add(0L) ;
long prefix = 0L ;
for(int i=0; i<n; i++) {
prefix = (prefix + arr[i]) % m ;
maxCount = Math.max(prefix, maxCount) ;
Long temp = tree.higher(prefix) ;
System.out.println(temp);
if(temp != null) {
maxCount = Math.max((prefix-temp+m)%m, maxCount) ;
}
//System.out.println(maxCount);
tree.add(prefix) ;
}
System.out.println(maxCount);
}
}
Here is one implementation of solution in java for this problem which works using TreeSet in java for optimized solution !
public static long maximumSum2(long[] arr, long n, long m)
{
long x = 0;
long prefix = 0;
long maxim = 0;
TreeSet<Long> S = new TreeSet<Long>();
S.add((long)0);
// Traversing the array.
for (int i = 0; i < n; i++)
{
// Finding prefix sum.
prefix = (prefix + arr[i]) % m;
// Finding maximum of prefix sum.
maxim = Math.max(maxim, prefix);
// Finding iterator poing to the first
// element that is not less than value
// "prefix + 1", i.e., greater than or
// equal to this value.
long it = S.higher(prefix)!=null?S.higher(prefix):0;
// boolean isFound = false;
// for (long j : S)
// {
// if (j >= prefix + 1)
// if(isFound == false) {
// it = j;
// isFound = true;
// }
// else {
// if(j < it) {
// it = j;
// }
// }
// }
if (it != 0)
{
maxim = Math.max(maxim, prefix - it + m);
}
// adding prefix in the set.
S.add(prefix);
}
return maxim;
}
public static int MaxSequence(int[] arr)
{
int maxSum = 0;
int partialSum = 0;
int negative = 0;
for (int i = 0; i < arr.Length; i++)
{
if (arr[i] < 0)
{
negative++;
}
}
if (negative == arr.Length)
{
return 0;
}
foreach (int item in arr)
{
partialSum += item;
maxSum = Math.Max(maxSum, partialSum);
if (partialSum < 0)
{
partialSum = 0;
}
}
return maxSum;
}
Modify Kadane algorithm to keep track of #occurrence. Below is the code.
#python3
#source: https://github.com/harishvc/challenges/blob/master/dp-largest-sum-sublist-modulo.py
#Time complexity: O(n)
#Space complexity: O(n)
def maxContiguousSum(a,K):
sum_so_far =0
max_sum = 0
count = {} #keep track of occurrence
for i in range(0,len(a)):
sum_so_far += a[i]
sum_so_far = sum_so_far%K
if sum_so_far > 0:
max_sum = max(max_sum,sum_so_far)
if sum_so_far in count.keys():
count[sum_so_far] += 1
else:
count[sum_so_far] = 1
else:
assert sum_so_far < 0 , "Logic error"
#IMPORTANT: reset sum_so_far
sum_so_far = 0
return max_sum,count[max_sum]
a = [6, 6, 11, 15, 12, 1]
K = 13
max_sum,count = maxContiguousSum(a,K)
print("input >>> %s max sum=%d #occurrence=%d" % (a,max_sum,count))

Longest positive sum substring

I was wondering how could I get the longest positive-sum subsequence in a sequence:
For example I have -6 3 -4 4 -5, so the longest positive subsequence is 3 -4 4. In fact the sum is positive (3), and we couldn't add -6 neither -5 or it would have become negative.
It could be easily solvable in O(N^2), I think could exist something much more faster, like in O(NlogN)
Do you have any idea?
EDIT: the order must be preserved, and you can skip any number from the substring
EDIT2: I'm sorry if I caused confusion using the term "sebsequence", as #beaker pointed out I meant substring
O(n) space and time solution, will start with the code (sorry, Java ;-) and try to explain it later:
public static int[] longestSubarray(int[] inp) {
// array containing prefix sums up to a certain index i
int[] p = new int[inp.length];
p[0] = inp[0];
for (int i = 1; i < inp.length; i++) {
p[i] = p[i - 1] + inp[i];
}
// array Q from the description below
int[] q = new int[inp.length];
q[inp.length - 1] = p[inp.length - 1];
for (int i = inp.length - 2; i >= 0; i--) {
q[i] = Math.max(q[i + 1], p[i]);
}
int a = 0;
int b = 0;
int maxLen = 0;
int curr;
int[] res = new int[] {-1,-1};
while (b < inp.length) {
curr = a > 0 ? q[b] - p[a-1] : q[b];
if (curr >= 0) {
if(b-a > maxLen) {
maxLen = b-a;
res = new int[] {a,b};
}
b++;
} else {
a++;
}
}
return res;
}
we are operating on input array A of size n
Let's define array P as the array containing the prefix sum until index i so P[i] = sum(0,i) where `i = 0,1,...,n-1'
let's notice that if u < v and P[u] <= P[v] then u will never be our ending point
because of the above we can define an array Q which has Q[n-1] = P[n-1] and Q[i] = max(P[i], Q[i+1])
now let's consider M_{a,b} which shows us the maximum sum subarray starting at a and ending at b or beyond. We know that M_{0,b} = Q[b] and that M_{a,b} = Q[b] - P[a-1]
with the above information we can now initialise our a, b = 0 and start moving them. If the current value of M is bigger or equal to 0 then we know we will find (or already found) a subarray with sum >= 0, we then just need to compare b-a with the previously found length. Otherwise there's no subarray that starts at a and adheres to our constraints so we need to increment a.
Let's make a naive implementation and then improve it.
We move from the left to the right calculating partial sums and for each position we find the most-left partial sum such as the current partial sum is greater than that.
input a
int partialSums[len(a)]
for i in range(len(a)):
partialSums[i] = (i == 0 ? 0 : partialSums[i - 1]) + a[i]
if partialSums[i] > 0:
answer = max(answer, i + 1)
else:
for j in range(i):
if partialSums[i] - partialSums[j] > 0:
answer = max(answer, i - j)
break
This is O(n2). Now the part of finding the left-most "good" sum could be actually maintained via BST, where each node would be represented as a pair (partial sum, index) with a comparison by partial sum. Also each node should support a special field min that would be the minimum of indices in this subtree.
Now instead of the straightforward search of an appropriate partial sum we could descend the BST using the current partial sum as a key following the next three rules (assuming C is the current node, L and R are the roots of the left and the right subtrees respectively):
Maintain the current minimal index of "good" partial sums found in curMin, initially +∞.
If C.partial_sum is "good" then update curMin with C.index.
If we go to R then update curMin with L.min.
And then update the answer with i - curMin, also add the current partial sum to the BST.
That would give us O(n * log n).
We can easily have a O(n log n) solution for longest subsequence.
First, sort the array, remember their indexes.
Pick all the largest numbers, stop when their sum are negative, and you have your answer.
Recover their original order.
Pseudo code
sort(data);
int length = 0;
long sum = 0;
boolean[] result = new boolean[n];
for(int i = n ; i >= 1; i--){
if(sum + data[i] <= 0)
break;
sum += data[i];
result[data[i].index] = true;
length++;
}
for(int i = 1; i <= n; i++)
if(result[i])
print i;
So, rather than waiting, I will propose a O(n log n) solution for longest positive substring.
First, we create an array prefix which is the prefix sum of the array.
Second, we using binary search to look for the longest length that has positive sum
Pseudocode
int[]prefix = new int[n];
for(int i = 1; i <= n; i++)
prefix[i] = data[i];
if(i - 1 >= 1)
prefix[i] += prefix[i - 1];
int min = 0;
int max = n;
int result = 0;
while(min <= max){
int mid = (min + max)/2;
boolean ok = false;
for(int i = 1; i <= n; i++){
if(i > mid && pre[i] - pre[i - mid] > 0){//How we can find sum of segment with mid length, and end at index i
ok = true;
break;
}
}
if(ok){
result = max(result, mid)
min = mid + 1;
}else{
max = mid - 1;
}
}
Ok, so the above algorithm is wrong, as pointed out by piotrekg2 what we need to do is
create an array prefix which is the prefix sum of the array.
Sort the prefix array, and we need to remember the index of the prefix array.
Iterate through the prefix array, storing the minimum index we meet so far, the maximum different between the index is the answer.
Note: when we comparing value in prefix, if two indexes have equivalent values, so which has smaller index will be considered larger, this will avoid the case when the sum is 0.
Pseudo code:
class Node{
int val, index;
}
Node[]prefix = new Node[n];
for(int i = 1; i <= n; i++)
prefix[i] = new Node(data[i],i);
if(i - 1 >= 1)
prefix[i].val += prefix[i - 1].val;
sort(prefix);
int min = prefix[1].index;
int result = 0;
for(int i = 2; i <= n; i ++)
if(prefix[i].index > min)
result = max(prefix[i].index - min + 1, result)
min = min(min, prefix[i].index);

Finding minimal absolute sum of a subarray

There's an array A containing (positive and negative) integers. Find a (contiguous) subarray whose elements' absolute sum is minimal, e.g.:
A = [2, -4, 6, -3, 9]
|(−4) + 6 + (−3)| = 1 <- minimal absolute sum
I've started by implementing a brute-force algorithm which was O(N^2) or O(N^3), though it produced correct results. But the task specifies:
complexity:
- expected worst-case time complexity is O(N*log(N))
- expected worst-case space complexity is O(N)
After some searching I thought that maybe Kadane's algorithm can be modified to fit this problem but I failed to do it.
My question is - is Kadane's algorithm the right way to go? If not, could you point me in the right direction (or name an algorithm that could help me here)? I don't want a ready-made code, I just need help in finding the right algorithm.
If you compute the partial sums
such as
2, 2 +(-4), 2 + (-4) + 6, 2 + (-4) + 6 + (-3)...
Then the sum of any contiguous subarray is the difference of two of the partial sums. So to find the contiguous subarray whose absolute value is minimal, I suggest that you sort the partial sums and then find the two values which are closest together, and use the positions of these two partial sums in the original sequence to find the start and end of the sub-array with smallest absolute value.
The expensive bit here is the sort, so I think this runs in time O(n * log(n)).
This is C++ implementation of Saksow's algorithm.
int solution(vector<int> &A) {
vector<int> P;
int min = 20000 ;
int dif = 0 ;
P.resize(A.size()+1);
P[0] = 0;
for(int i = 1 ; i < P.size(); i ++)
{
P[i] = P[i-1]+A[i-1];
}
sort(P.begin(),P.end());
for(int i = 1 ; i < P.size(); i++)
{
dif = P[i]-P[i-1];
if(dif<min)
{
min = dif;
}
}
return min;
}
I was doing this test on Codility and I found mcdowella answer quite helpful, but not enough I have to say: so here is a 2015 answer guys!
We need to build the prefix sums of array A (called P here) like: P[0] = 0, P[1] = P[0] + A[0], P[2] = P[1] + A[1], ..., P[N] = P[N-1] + A[N-1]
The "min abs sum" of A will be the minimum absolute difference between 2 elements in P. So we just have to .sort() P and loop through it taking every time 2 successive elements. This way we have O(N + Nlog(N) + N) which equals to O(Nlog(N)).
That's it!
The answer is yes, Kadane's algorithm is definitely the way to go for solving your problem.
http://en.wikipedia.org/wiki/Maximum_subarray_problem
Source - I've closely worked with a PhD student who's entire PhD thesis was devoted to the maximum subarray problem.
def min_abs_subarray(a):
s = [a[0]]
for e in a[1:]:
s.append(s[-1] + e)
s = sorted(s)
min = abs(s[0])
t = s[0]
for x in s[1:]:
cur = abs(x)
min = cur if cur < min else min
cur = abs(t-x)
min = cur if cur < min else min
t = x
return min
You can run Kadane's algorithmtwice(or do it in one go) to find minimum and maximum sum where finding minimum works in same way as maximum with reversed signs and then calculate new maximum by comparing their absolute value.
Source-Someone's(dont remember who) comment in this site.
Here is an Iterative solution in python. It's 100% correct.
def solution(A):
memo = []
if not len(A):
return 0
for ind, val in enumerate(A):
if ind == 0:
memo.append([val, -1*val])
else:
newElem = []
for i in memo[ind - 1]:
newElem.append(i+val)
newElem.append(i-val)
memo.append(newElem)
return min(abs(n) for n in memo.pop())
Short Sweet and work like a charm. JavaScript / NodeJs solution
function solution(A, i=0, sum =0 ) {
//Edge case if Array is empty
if(A.length == 0) return 0;
// Base case. For last Array element , add and substart from sum
// and find min of their absolute value
if(A.length -1 === i){
return Math.min( Math.abs(sum + A[i]), Math.abs(sum - A[i])) ;
}
// Absolute value by adding the elem with the sum.
// And recusrively move to next elem
let plus = Math.abs(solution(A, i+1, sum+A[i]));
// Absolute value by substracting the elem from the sum
let minus = Math.abs(solution(A, i+1, sum-A[i]));
return Math.min(plus, minus);
}
console.log(solution([-100, 3, 2, 4]))
Here is a C solution based on Kadane's algorithm.
Hopefully its helpful.
#include <stdio.h>
int min(int a, int b)
{
return (a >= b)? b: a;
}
int min_slice(int A[], int N) {
if (N==0 || N>1000000)
return 0;
int minTillHere = A[0];
int minSoFar = A[0];
int i;
for(i = 1; i < N; i++){
minTillHere = min(A[i], minTillHere + A[i]);
minSoFar = min(minSoFar, minTillHere);
}
return minSoFar;
}
int main(){
int A[]={3, 2, -6, 4, 0}, N = 5;
//int A[]={3, 2, 6, 4, 0}, N = 5;
//int A[]={-4, -8, -3, -2, -4, -10}, N = 6;
printf("Minimum slice = %d \n", min_slice(A,N));
return 0;
}
public static int solution(int[] A) {
int minTillHere = A[0];
int absMinTillHere = A[0];
int minSoFar = A[0];
int i;
for(i = 1; i < A.length; i++){
absMinTillHere = Math.min(Math.abs(A[i]),Math.abs(minTillHere + A[i]));
minTillHere = Math.min(A[i], minTillHere + A[i]);
minSoFar = Math.min(Math.abs(minSoFar), absMinTillHere);
}
return minSoFar;
}
int main()
{
int n; cin >> n;
vector<int>a(n);
for(int i = 0; i < n; i++) cin >> a[i];
long long local_min = 0, global_min = LLONG_MAX;
for(int i = 0; i < n; i++)
{
if(abs(local_min + a[i]) > abs(a[i]))
{
local_min = a[i];
}
else local_min += a[i];
global_min = min(global_min, abs(local_min));
}
cout << global_min << endl;
}

Optimal algorithm

I am given an input, "N", i have to find the number of list of length N, which starts with 1, such that the next number to be added is at most 1 more than the max number added till now. For Example,
N = 3, possible lists => (111, 112, 121, 122, 123), [113, or 131 is not possible as while adding '3' to the list, the maximum number present in the list would be '1', thus we can add only 1 or 2].
N = 4, the list 1213 is possible as while adding 3, the maximum number in the list is '2', thus 3 can be added.
Problem is to count the number of such lists possible for a given input "N".
My code is :-
public static void Main(string[] args)
{
var noOfTestCases = Convert.ToInt32(Console.ReadLine());
var listOfOutput = new List<long>();
for (int i = 0; i < noOfTestCases; i++)
{
var requiredSize = Convert.ToInt64(Console.ReadLine());
long result;
const long listCount = 1;
const long listMaxTillNow = 1;
if (requiredSize < 3)
result = requiredSize;
else
{
SeqCount.Add(requiredSize, 0);
AddElementToList(requiredSize, listCount, listMaxTillNow);
result = SeqCount[requiredSize];
}
listOfOutput.Add(result);
}
foreach (var i in listOfOutput)
{
Console.WriteLine(i);
}
}
private static Dictionary<long, long> SeqCount = new Dictionary<long, long>();
private static void AddElementToList(long requiredSize, long listCount, long listMaxTillNow)
{
if (listCount == requiredSize)
{
SeqCount[requiredSize] = SeqCount[requiredSize] + 1;
return;
}
var listMaxTillNowNew = listMaxTillNow + 1;
for(var i = listMaxTillNowNew; i > 0; i--)
{
AddElementToList(requiredSize, listCount + 1,
i == listMaxTillNowNew ? listMaxTillNowNew : listMaxTillNow);
}
return;
}
Which is the brute force method. I wish to know what might be the best algorithm for the problem?
PS : I only wish to know the number of such lists, so i am sure creating all the list won't be required. (The way i am doing in the code)
I am not at all good in algorithms, so please excuse for the long question.
This problem is a classic example of a dynamic programming problem:
If you define a function dp(k, m) to be the number of lists of length k for which the maximum number is m, then you have a recurrence relation:
dp(1, 1) = 1
dp(1, m) = 0, for m > 1
dp(k, m) = dp(k-1, m) * m + dp(k-1, m-1)
Indeed, there is only one list of length 1 and its maximum element is 1.
When you are building a list of length k with max element m, you can take any of the (k-1)-lists with max = m and append 1 or 2 or .... or m. Or you can take a (k-1)-list with max element m-1 and append m. If you take a (k-1)-list with max element less than m-1 then by your rule you can't get a max of m by appending just one element.
You can compute dp(k,m) for all k = 1,...,N and m = 1,...,N+1 using dynamic programming in O(N^2) and then the answer to your question would be
dp(N,1) + dp(N,2) + ... + dp(N,N+1)
Thus the algorithm is O(N^2).
See below for the implementation of dp calculation in C#:
int[] arr = new int[N + 2];
for (int m = 1; m < N + 2; m++)
arr[m] = 0;
arr[1] = 1;
int[] newArr = new int[N + 2];
int[] tmp;
for (int k = 1; k < N; k++)
{
for (int m = 1; m < N + 2; m++)
newArr[m] = arr[m] * m + arr[m - 1];
tmp = arr;
arr = newArr;
newArr = tmp;
}
int answer = 0;strong text
for (int m = 1; m < N + 2; m++)
answer += arr[m];
Console.WriteLine("The answer for " + N + " is " + answer);
Well, I got interrupted by a fire this afternoon (really!) but FWIW, here's my contribution:
/*
* Counts the number of possible integer list on langth N, with the
* property that no integer in a list(starting with one) may be more
* than one greater than the greatest integer preceeding it in the list.
*
* I am calling this "Semi-Factorial" since it is somewhat similar to
* the factorial function and its constituent integer combinations.
*/
public int SemiFactorial(int N)
{
int sumCounts = 0;
// get a list of the counts of all valid lists of length N,
//whose maximum integer is listCounts[maxInt].
List<int> listCounts = SemiFactorialCounts(N);
for (int maxInt = 1; maxInt <= N; maxInt++)
{
// Get the number of lists, of length N-1 whose maximum integer
//is (maxInt):
int maxIntCnt = listCounts[maxInt];
// just sum them up
sumCounts += maxIntCnt;
}
return sumCounts;
}
// Returns a list of the counts of all valid lists of length N, and
//whose maximum integer is [i], where [i] is also its index in this
//returned list. (0 is not used).
public List<int> SemiFactorialCounts(int N)
{
List<int> cnts;
if (N == 0)
{
// no valid lists,
cnts = new List<int>();
// (zero isn't used)
cnts.Add(0);
}
else if (N == 1)
{
// the only valid list is {1},
cnts = new List<int>();
// (zero isn't used)
cnts.Add(0);
//so that's one list of length 1
cnts.Add(1);
}
else
{
// start with the maxInt counts of lists whose length is N-1:
cnts = SemiFactorialCounts(N - 1);
// add an entry for (N)
cnts.Add(0);
// (reverse order because we overwrite the list using values
// from the next lower index.)
for (int K = N; K > 0; K--)
{
// The number of lists of length N and maxInt K { SF(N,K) }
// Equals K times # of lists one shorter, but same maxInt,
// Plus, the number of lists one shorter with maxInt-1.
cnts[K] = K * cnts[K] + cnts[K - 1];
}
}
return cnts;
}
pretty similar to the others. Though I wouldn't call this "classic dynamic programming" so much as just "classic recursion".

Resources