O(n) solution to counting sub-arrays with sum constraints - algorithm

I'm trying to improve my intuition around the following two sub-array problems.
Problem one
Return the length of the shortest, non-empty, contiguous sub-array of A with sum at least
K. If there is no non-empty sub-array with sum at least K, return -1
I've come across an O(N) solution online.
public int shortestSubarray(int[] A, int K) {
int N = A.length;
long[] P = new long[N+1];
for (int i = 0; i < N; ++i)
P[i+1] = P[i] + (long) A[i];
// Want smallest y-x with P[y] - P[x] >= K
int ans = N+1; // N+1 is impossible
Deque<Integer> monoq = new LinkedList(); //opt(y) candidates, as indices of P
for (int y = 0; y < P.length; ++y) {
// Want opt(y) = largest x with P[x] <= P[y] - K;
while (!monoq.isEmpty() && P[y] <= P[monoq.getLast()])
monoq.removeLast();
while (!monoq.isEmpty() && P[y] >= P[monoq.getFirst()] + K)
ans = Math.min(ans, y - monoq.removeFirst());
monoq.addLast(y);
}
return ans < N+1 ? ans : -1;
}
It seems to be maintaining a sliding window with a deque. It looks like a variant of Kadane's algorithm.
Problem two
Given an array of N integers (positive and negative), find the number of
contiguous sub array whose sum is greater or equal to K (also, positive or
negative)"
The best solution I've seen to this problem is O(nlogn) as described in the following answer.
tree = an empty search tree
result = 0
// This sum corresponds to an empty prefix.
prefixSum = 0
tree.add(prefixSum)
// Iterate over the input array from left to right.
for elem <- array:
prefixSum += elem
// Add the number of subarrays that have this element as the last one
// and their sum is not less than K.
result += tree.getNumberOfLessOrEqual(prefixSum - K)
// Add the current prefix sum the tree.
tree.add(prefixSum)
print result
My questions
Is my intuition that algorithm one is a variant of Kandane's algorithm correct?
If so, is there a variant of this algorithm (or another O(n) solution) that can be used to solve problem two?
Why can problem two only be solved in O(nlogn) time when they look so similar?

Related

Length of Longest Subarray with all same elements

I have this problem:
You are given an array of integers A and an integer k.
You can decrement elements of A up to k times, with the goal of producing a consecutive subarray whose elements are all equal. Return the length of the longest possible consecutive subarray that you can produce in this way.
For example, if A is [1,7,3,4,6,5] and k is 6, then you can produce [1,7,3,4-1,6-1-1-1,5-1-1] = [1,7,3,3,3,3], so you will return 4.
What is the optimal solution?
The subarray must be made equal to its lowest member since the only allowed operation is reduction (and reducing the lowest member would add unnecessary cost). Given:
a1, a2, a3...an
the cost to reduce is:
sum(a1..an) - n * min(a1..an)
For example,
3, 4, 6, 5
sum = 18
min = 3
cost = 18 - 4 * 3 = 6
One way to reduce the complexity from O(n^2) to a log factor is: for each element as the rightmost (or leftmost) element of the candidate best subarray, binary search the longest length within cost. To do that, we only need the sum, which we can get from a prefix sum in O(1), the length (which we are searching on already), and minimum range query, which is well-studied.
In response to comments below this post, here is a demonstration that the sequence of costs as we extend a subarray from each element as rightmost increases monotonically and can therefore be queried with binary search.
JavaScript code:
function cost(A, i, j){
const n = j - i + 1;
let sum = 0;
let min = Infinity;
for (let k=i; k<=j; k++){
sum += A[k];
min = Math.min(min, A[k]);
}
return sum - n * min;
}
function f(A){
for (let j=0; j<A.length; j++){
const rightmost = A[j];
const sequence = [];
for (let i=j; i>=0; i--)
sequence.push(cost(A, i, j));
console.log(rightmost + ': ' + sequence);
}
}
var A = [1,7,3,1,4,6,5,100,1,4,6,5,3];
f(A);
def cost(a, i, j):
n = j - i
s = 0
m = a[i]
for k in range(i,j):
s += a[k]
m = min(m, a[k])
return s - n * m;
def solve(n,k,a):
m=1
for i in range(n):
for j in range(i,n+1):
if cost(a,i,j)<=k:
x = j - i
if x>m:
m=x
return m
This is my python3 solution as per your specifications.

Smallest missing integer algorithm that runs in O(n)?

What algorithm might find a missing integer in O(n) time, from an array?
Say we have an array A with elements in a value-range {1,2,3...2n}. Half the elements are missing so length of A = n.
E.g:
A = [1,2,5,3,10] , n=5
Output = 4
The smallest missing integer must be in the range [1, ..., n+1]. So create an array of flags, all initially false, indicating the presence of that integer. Then an algorithm is:
Scan the input array, setting flags to true as you encounter values in the range. This operation is O(n). (That is, set flag[A[i]] to true for each position i in the input array, provided A[i] <= n.)
Scan the flag array for the first false flag. This operation is also O(n). The index of the first false flag is the smallest missing integer.
EDIT: O(n) time algorithm with O(1) extra space:
If A is writable and there are some extra bits available in the elements of A, then a constant-extra-space algorithm is possible. For instance, if the elements of A are signed values, and since all the numbers are positive, we can use the sign bit of the numbers in the original array as the flags, rather than creating a new flag array. So the algorithm would be:
For each position i of the original array, if abs(A[i]) < n+1, make the value at A[abs(A[i])] negative. (This assumes array indexes are based at 1. Adjust in the obvious way if you are using 0-based arrays.) Don't just negate the value, in case there are duplicate values in A.
Find the index of the first element of A that is positive. That index is the smallest missing number in A. If all positions are negative, then A must be a permutation of {1, ..., n} and hence the smallest missing number is n+1.
If the elements are unsigned, but can hold values as high as 4 n + 1, then in step 1, instead of making the element negative, add 2 n + 1 (provided the element is <= 2 n) and use (A[i] mod (2n+1)) instead of abs(A[i]). Then in step 2, find the first element < 2 n + 1 instead of the first positive element. Other such tricks are possible as well.
You can do this in O(1) additional space, assuming that the only valid operations on the array is to read elements, and to swap pairs of elements.
First note that the specification of the problem excludes the possibility of the array containing duplicates: it contains half of the numbers from 1 to 2N.
We perform a quick-select type algorithm. Start with m=1, M=2N+1, and pivot the array on (m + M)/2. If the size of the left part of the array (elements <= (m+M)/2) is less than (m + M)/2 - m + 1, then the first missing number must be there. Otherwise, it must be in the right part of the array. Repeat on the left or right side accordingly until you find the missing number.
The size of the slice of the array under consideration halves each time and pivoting an array of size n can be done in O(n) time and O(1) space. So overall, the time complexity is 2N + N + N/2 + ... + 1 <= 4N = O(N).
An implementation of Paul Hankin's idea in C++
#include <iostream>
using namespace std;
const int MAX = 1000;
int a[MAX];
int n;
void swap(int &a, int &b) {
int tmp = a;
a = b;
b = tmp;
}
// Rearranges elements of a[l..r] in such a way that first come elements
// lower or equal to M, next come elements greater than M. Elements in each group
// come in no particular order.
// Returns an index of the first element among a[l..r] which is greater than M.
int rearrange(int l, int r, int M) {
int i = l, j = r;
while (i <= j)
if (a[i] <= M) i++;
else swap(a[i], a[j--]);
return i;
}
int main() {
cin >> n;
for (int i = 0; i < n; i++) cin >> a[i];
int L = 1, R = 2 * n;
int l = 0, r = n - 1;
while (L < R) {
int M = (L + R) / 2; // pivot element
int m = rearrange(l, r, M);
if (m - l == M - L + 1)
l = m, L = M + 1;
else
r = m - 1, R = M;
}
cout << L;
return 0;
}

How to solve "fixed size maximum subarray" using divide and conquer approach?

Disclaimer: I know this problem can be solved with single pass of array very efficiently, but I am interested in doing this with divide and conquer because it is bit different than typical problems we tackle with divide and conquer.
Suppose you are given a floating point array X[1:n] of size n and interval length l. The problem is to design a divide and conquer algorithm to find the sub-array of length l from the array that has the maximum sum.
Here is what I came up with.For array of length n there are n-l+1 sub-arrays of l consecutive elements. For example for array of length n = 10 and l = 3, there will be 8 sub-arrays of length 3.
Now, to divide the problem into two half, I decided to break array at n-l+1/2 so that equal number of sub-arrays will be distributed to both halves of my division as depicted in algorithm below. Again, for n = 10, l = 3, n-l+1 = 8, so I divided the problem at (n-l+1)/2 = 4. But for 4th sub-array I need array elements up-to 6 i.e. (n+l-1)/2.
void FixedLengthMS(input: X[1:n], l, output: k, max_sum)
{
if(l==n){//only one sub-array
sum = Sumof(X[1:n]);
k=1;
}
int kl, kr;
float sum_l, sum_r;
FixedLengthMS(X[1:(n+l-1)/2], l, kl, sum_l);
FixedLengthMS(X[(n-l+3)/2:n], l, kr, sum_r);
if(sum_l >= sum_r){
sum = sum_l;
k = kl;
}
else{
sum = sum_r;
k = n-l+1/2 + kr;
}
}
Note: to clear out array indexing
for sub-array starting at (n-l+1)/2 we need array elements up-to (n-l+1)/2 + l-1 = (n+l-1)/2
My concern:
To apply divide and conquer I have used some data elements in both array, so I am looking for another method that avoids the extra storage.
Faster method will be appreciated.
Please ignore the syntax of code section, I am just trying to give overview of algorithm.
You don't need divide and conquer. A simple one pass algorithm can be used for the task. Let's suppose, that array is big enough. Then:
double sum = 0;
for (size_t i = 0; i < l; ++i)
sum += X[i];
size_t max_index = 0;
double max_sum = sum;
for (int i = 0; i < n - l; ++i) {
sum += X[i + l] - X[i];
if (sum > max_sum) {
max_sum = sum;
max_index = i;
}
}

Maximize sum of list with no more than k consecutive elements from input

I have an array of N numbers and I want remove only those elements from the list which when removed will create a new list where there are no more K numbers adjacent to each other. There can be multiple lists that can be created with this restriction. So I just want that list in which the sum of the remaining numbers is maximum and as an output print that sum only.
The algorithm that I have come up with so far has a time complexity of O(n^2). Is it possible to get better algorithm for this problem?
Link to the question.
Here's my attempt:
int main()
{
//Total Number of elements in the list
int count = 6;
//Maximum number of elements that can be together
int maxTogether = 1;
//The list of numbers
int billboards[] = {4, 7, 2, 0, 8, 9};
int maxSum = 0;
for(int k = 0; k<=maxTogether ; k++){
int sum=0;
int size= k;
for (int i = 0; i< count; i++) {
if(size != maxTogether){
sum += billboards[i];
size++;
}else{
size = 0;
}
}
printf("%i\n", sum);
if(sum > maxSum)
{
maxSum = sum;
}
}
return 0;
}
The O(NK) dynamic programming solution is fairly easy:
Let A[i] be the best sum of the elements to the left subject to the not-k-consecutive constraint (assuming we're removing the i-th element as well).
Then we can calculate A[i] by looking back K elements:
A[i] = 0;
for j = 1 to k
A[i] = max(A[i], A[i-j])
A[i] += input[i]
And, at the end, just look through the last k elements from A, adding the elements to the right to each and picking the best one.
But this is too slow.
Let's do better.
So A[i] finds the best from A[i-1], A[i-2], ..., A[i-K+1], A[i-K].
So A[i+1] finds the best from A[i], A[i-1], A[i-2], ..., A[i-K+1].
There's a lot of redundancy there - we already know the best from indices i-1 through i-K because of A[i]'s calculation, but then we find the best of all of those except i-K (with i) again in A[i+1].
So we can just store all of them in an ordered data structure and then remove A[i-K] and insert A[i]. My choice - A binary search tree to find the minimum, along with a circular array of size K+1 of tree nodes, so we can easily find the one we need to remove.
I swapped the problem around to make it slightly simpler - instead of finding the maximum of remaining elements, I find the minimum of removed elements and then return total sum - removed sum.
High-level pseudo-code:
for each i in input
add (i + the smallest value in the BST) to the BST
add the above node to the circular array
if it wrapper around, remove the overridden element from the BST
// now the remaining nodes in the BST are the last k elements
return (the total sum - the smallest value in the BST)
Running time:
O(n log k)
Java code:
int getBestSum(int[] input, int K)
{
Node[] array = new Node[K+1];
TreeSet<Node> nodes = new TreeSet<Node>();
Node n = new Node(0);
nodes.add(n);
array[0] = n;
int arrPos = 0;
int sum = 0;
for (int i: input)
{
sum += i;
Node oldNode = nodes.first();
Node newNode = new Node(oldNode.value + i);
arrPos = (arrPos + 1) % array.length;
if (array[arrPos] != null)
nodes.remove(array[arrPos]);
array[arrPos] = newNode;
nodes.add(newNode);
}
return sum - nodes.first().value;
}
getBestSum(new int[]{1,2,3,1,6,10}, 2) prints 21, as required.
Let f[i] be the maximum total value you can get with the first i numbers, while you don't choose the last(i.e. the i-th) one. Then we have
f[i] = max{
f[i-1],
max{f[j] + sum(j + 1, i - 1) | (i - j) <= k}
}
you can use a heap-like data structure to maintain the options and get the maximum one in log(n) time, keep a global delta or whatever, and pay attention to the range i - j <= k.
The following algorithm is of O(N*K) complexity.
Examine the 1st K elements (0 to K-1) of the array. There can be at most 1 gap in this region.
Reason: If there were two gaps, then there would not be any reason to have the lower (earlier gap).
For each index i of these K gap options, following holds true:
1. Sum upto i-1 is the present score of each option.
2. If the next gap is after a distance of d, then the options for d are (K - i) to K
For every possible position of gap, calculate the best sum upto that position among the options.
The latter part of the array can be traversed similarly independently from the past gap history.
Traverse the array further till the end.

number of subarrays where sum of numbers is divisible by K

Given an array, find how many such subsequences (does not require to be contiguous) exist where sum of elements in that subarray is divisible by K.
I know an approach with complexity 2^n as given below. it is like finding all nCi where i=[0,n] and validating if sum is divisible by K.
Please provide Pseudo Code something like linear/quadratic or n^3.
static int numways = 0;
void findNumOfSubArrays(int [] arr,int index, int sum, int K) {
if(index==arr.length) {
if(sum%k==0) numways++;
}
else {
findNumOfSubArrays(arr, index+1, sum, K);
findNumOfSubArrays(arr, index+1, sum+arr[index], K);
}
}
Input - array A in length n, and natural number k.
The algorithm:
Construct array B: for each 1 <= i <= n: B[i] = (A[i] modulo K).
Now we can use dynamic programming:
We define D[i,j] = maximum number of sub-arrays of - B[i..n] that the sum of its elements modulo k equals to j.
1 <= i <= n.
0 <= j <= k-1.
D[n,0] = if (b[n] == 0), 2. Otherwise, 1.
if j > 0 :
D[n,j] = if (B[n] modulo k) == j, than 1. Otherwise, 0.
for i < n and 0 <= j <= k-1:
D[i,j] = max{D[i+1,j], 1 + D[i+1, D[i+1,(j-B[i]+k) modulo k)]}.
Construct D.
Return D[1,0].
Overall running time: O(n*k)
Acutally, I don't think this problem can likely be solved in O(n^3) or even polynomial time, if the range of K and the range of numbers in array is unknown. Here is what I think:
Consider the following case: the N numbers in arr is something like
[1,2,4,8,16,32,...,2^(N-1)]
,
in this way, the sums of 2^N "subarrays" (that does not require to be contiguous) of arr, is exactly all the integer numbers in [0,2^N)
and asking how many of them is divisible by K, is equivalent to asking how many of integers are divisible by K in [0, 2^N).
I know the answer can be calculated directly like (2^N-1)/K (or something) in the above case. But , if we just change a few ( maybe 3? 4? ) numbers in arr randomly, to "dig some random holes" in the perfect-contiguous-integer-range [0,2^N), that makes it looks impossible to calculate the answer without going through almost every number in [0,2^N).
ok just some stupid thoughts ... could be totally wrong.
Use an auxiliary array A
1) While taking input, store the current grand total in the corresponding index (this executes in O(n)):
int sum = 0;
for (int i = 0; i < n; i++)
{
cin >> arr[i];
sum += arr[i];
A[i] = sum;
}
2) now,
for (int i = 0; i < n; i++)
for (int j = i; j < n; j++)
check that (A[j] - A[i] + arr[i]) is divisible by k
There you go: O(n^2)...

Resources