count the number of subarrays in a given array with its average being k - algorithm

Given an integer array a, and an integer k, we want to design an algorithm to count the number of subarrays with the average of that subarray being k. The most naive method is to traverse all possible subarrays and calculate the corresponding average. The time complexity of this naive method is O(n^2) where $n$ is the length of a. I wonder whether it is possible to do better than O(n^2).
Usually for this kind of problem, one uses prefix sum together with a hashmap, but this technique does not seem to apply here.

Consider a prefix sum array, call it a.
You want to find all such pairs (i, j) that (a[j]-a[i])/(j-i) == k.
Now watch the hands:
(a[j]-a[i])/(j-i) == k
a[j]-a[i] == k*(j-i)
a[j]-a[i] == k*j-k*i
a[j]-k*j == a[i]-k*i
So if you subtract k*j from jth element of the prefix sum array, you are left with the task of counting identical pairs.

Related

How can I create an array of 'n' positive integers where none of the subsequences of the array have equal sum?

I was watching a lecture on the question "subsequence sum equals to k" in which you're given an array of n positive integers and a target sum = k. Your task is to check if any of the subsequence of the array have a sum equal to the target sum. The recursive solution works in O(2^N). The lecturer said that if we memoize the recursive solution, the time complexity will drop to O(N*K). But as much as I understand, memoization simply removes overlapping subproblems. So if all of the subsequences have different sum, won't the time complexity of the solution still be O(2^N)? Just to test this hypothesis, I was trying to create an array of n positive integers where none of the subsequences have equal sum.
Also, I tried the tabulation method and was unable to understand why the time complexity drops in the case of tabulation. Please point to any resource where I can learn exactly what subproblems does tabulation avoid.
Note that O(NK) is not always smaller than O(2N). If K = 2N for example, then O(KN) = O(N * 2N), which is larger.
Furthermore, this is the sort of range you're dealing with when every subsequence sum is different.
If your N integers are powers of 2, for example: [20, 21, 22, ...], then every subsequence has a different sum, and K=2N is the smallest positive integer that isn't a subsequence sum.
The tabulation method is only an improvement when K is known to be relatively small.
If each value in the array is a different, positive power of the same base, no two sums will be equal.
Python code:
def f(A):
sums = set([0])
for a in A:
new_sums = set()
for s in sums:
new_sum = s + a
if new_sum in sums:
print(new_sum)
return None
new_sums.add(new_sum)
sums = sums.union(new_sums)
return sums
for b in range(2, 10):
A = []
for p in range(5):
A.append(b**p)
sums = f(A)
print(A)
print(len(sums))
print(sums)
In the recursive case without memoization, you'll compute the sums for all subsequences, which has O(2^N) complexity.
Now consider the memoization case.
Let dp[i][j] = 1 if there exists a subsequence in the array arr[i:] that has sum j, else dp[i][j] = 0.
The algorithm is
for each index i in range(n,0,-1):
j = array[i]
for each x in range(0,k):
dp[i][x] += dp[i+1][x]
if dp[i+1][x] == 1:
dp[i][x+j] = 1
return dp[0][k]
For each index, we traverse the subsequence sums seen till yet (in range k), and mark them True for the current index. For each such sum, we also add the value of the current element, and mark that True.
Which sub-problems were reduced?
We just track if sum x is possible in a subarray. In the recursive case, there could be 100 subsequences that have sum x. Here, since we're using a bool to track if x is possible in the subarray, we effectively avoid going over all subsequences just to check if the sum is possible.
For each index, since we do a O(k) traversal of going through all sums, the complexity becomes O(N*k).

What is the complexity of this approach to finding K largest of N numbers

In this post on how to find the K largest of N elements the 2nd method proposed is:
Store the first k elements in a temporary array temp[0..k-1].
Find the smallest element in temp[], let the smallest element be min.
For each element x in arr[k] to arr[n-1]
If x is greater than the min then remove min from temp[] and insert x.
Print final k elements of temp[]
While I understand the approach, I do not understand their computed
Time Complexity of O((n-k)*k).
From my perspective, you are making a linear traversal of n-k elements and doing a single comparison on each element. And then perhaps replacing one elements of the temporary array of K elements.
More specifically, where does the *k aspect of their computed
Time Complexity of O((n-k)*k) come from? Why do they multipy n-k by that?
Lets consider that at kth iteration :
arr[k] > min(temp[0..k-1]
Now you will replace min(temp[0..k-1]) with arr[k].
And now you again need to compute the updated min of temp[0..k-1], because that would have changed. It can be any number in your updated temp[0..k-1]
So in worst case, u update the min everytime and hence the O(k).
Thus, time complexity = O((n-k)*k)

Finding the kth smallest element in a sequence where duplicates are compressed?

I've been asked to write a program to find the kth order statistic of a data set consisting of character and their occurrences. For example, I have a data set consisting of
B,A,C,A,B,C,A,D
Here I have A with 3 occurrences, B with 2 occurrences C with 2 occurrences and D with on occurrence. They can be grouped in pairs (characters, number of occurrences), so, for example, we could represent the above sequence as
(A,3), (B,2), (C,2) and (D,1).
Assuming than k is the number of these pairs, I am asked to find the kth of the data set in O(n) where n is the number of pairs.
I thought could sort the element based their number of occurrence and find their kth smallest elements, but that won't work in the time bounds. Can I please have some help on the algorithm for this problem?
Assuming that you have access to a linear-time selection algorithm, here's a simple divide-and-conquer algorithm for solving the problem. I'm going to let k denote the total number of pairs and m be the index you're looking for.
If there's just one pair, return the key in that pair.
Otherwise:
Using a linear-time selection algorithm, find the median element. Let medFreq be its frequency.
Sum up the frequencies of the elements less than the median. Call this less. Note that the number of elements less than or equal to the median is less + medFreq.
If less < m < less + medFreq, return the key in the median element.
Otherwise, if m ≤ less, recursively search for the mth element in the first half of the array.
Otherwise (m > less + medFreq), recursively search for the (m - less - medFreq)th element in the second half of the array.
The key insight here is that each iteration of this algorithm tosses out half of the pairs, so each recursive call is on an array half as large as the original array. This gives us the following recurrence relation:
T(k) = T(k / 2) + O(k)
Using the Master Theorem, this solves to O(k).

Finding sub-array sum in an integer array

Given an array of N positive integers. It can have n*(n+1)/2 sub-arrays including single element sub-arrays. Each sub-array has a sum S. Find S's for all sub-arrays is obviously O(n^2) as number of sub-arrays are O(n^2). Many sums S's may be repeated also. Is there any way to find count of all distinct sum (not the exact values of sums but only count) in O(n logn).
I tried an approach but stuck on the way. I iterated the array from index 1 to n.
Say a[i] is the given array. For each index i, a[i] will add to all the sums in which a[i-1] is involved and will include itself also as individual element. But duplicate will emerge if among sums in which a[i-1] is involved, the difference of two sums is a[i]. I mean that, say sums Sp and Sq end up at a[i-1] and difference of both is a[i]. Then Sp + a[i] equals Sq, giving Sq as a duplicate.
Say C[i] is count of the distinct sums in which end up at a[i].
So C[i] = C[i-1] + 1 - numbers of pairs of sums in which a[i-1] is involved whose difference is a[i].
But problem is to find the part of number of pairs in O(log n). Please give me some hint about this or if I am on wrong way and completely different approach is required problem point that out.
When S is not too large, we can count the distinct sums with one (fast) polynomial multiplication. When S is larger, N is hopefully small enough to use a quadratic algorithm.
Let x_1, x_2, ..., x_n be the array elements. Let y_0 = 0 and y_i = x_1 + x_2 + ... + x_i. Let P(z) = z^{y_0} + z^{y_1} + ... + z^{y_n}. Compute the product of polynomials P(z) * P(z^{-1}); the coefficient of z^k with k > 0 is nonzero if and only if k is a sub-array sum, so we just have to read off the number of nonzero coefficients of positive powers. The powers of z, moreover, range from -S to S, so the multiplication takes time on the order of S log S.
You can look at the sub-arrays as a kind of tree. In the sense that subarray [0,3] can be divided to [0,1] and [2,3].
So build up a tree, where nodes are defined by length of the subarray and it's staring offset in the original array, and whenever you compute a subarray, store the result in this tree.
When computing a sub-array, you can check this tree for existing pre-computed values.
Also, when dividing, parts of the array can be computed on different CPU cores, if that matters.
This solution assumes that you don't need all values at once, rather ad-hoc.
For the former, there could be some smarter solution.
Also, I assume that we're talking about counts of elements in 10000's and more. Otherwise, such work is a nice excercise but has not much of a practical value.

Efficiently calculate next permutation of length k from n choices

I need to efficiently calculate the next permutation of length k from n
choices. Wikipedia lists a great
algorithm
for computing the next permutation of length n from n choices.
The best thing I can come up with is using that algorithm (or the Steinhaus–Johnson–Trotter algorithm), and then just only considering the first k items of the list, and iterating again whenever the changes are all above that position.
Constraints:
The algorithm must calculate the next permutation given nothing more than
the current permutation. If it needs to generate a list of all permutations,
it will take up too much memory.
It must be able to compute a permutation of only length k of n (this is
where the other algorithm fails
Non-constraints:
Don't care if it's in-place or not
I don't care if it's in lexographical order, or any order for that matter
I don't care too much how efficiently it computes the next permutation,
within reason of course, it can't give me the next permutation by making a
list of all possible ones each time.
You can break this problem down into two parts:
1) Find all subsets of size k from a set of size n.
2) For each such subset, find all permutations of a subset of size k.
The referenced Wikipedia article provides an algorithm for part 2, so I won't repeat it here. The algorithm for part 1 is pretty similar. For simplicity, I'll describe it for "find all subsets of size k of the integers [0...n-1].
1) Start with the subset [0...k-1]
2) To get the next subset, given a subset S:
2a) Find the smallest j such that j ∈ S ∧ j+1 ∉ S. If j == n-1, there is no next subset; we're done.
2b) The elements less than j form a sequence i...j-1 (since if any of those values were missing, j wouldn't be minimal). If i is not 0, replace these elements with i-i...j-i-1. Replace element j with element j+1.

Resources