Finding largest from each subarray of length k - algorithm

Interview Question :- Given an array and an integer k , find the maximum for each and every contiguous sub array of size k.
Sample Input :
1 2 3 1 4 5 2 3 6
3 [ value of k ]
Sample Output :
3
3
4
5
5
5
6
I cant think of anything better than brute force. Worst case is O(nk) when array is sorted in decreasing order.

Just iterate over the array and keep k last elements in a self-balancing binary tree.
Adding element to such tree, removing element and finding current maximum costs O(logk).
Most languages provide standard implementations for such trees. In STL, IIRC, it's MultiSet. In Java you'd use TreeMap (map, because you need to keep count, how many times each element occurs, and Java doesn't provide Multi- collections).
Pseudocode
for (int i = 0; i < n; ++i) {
tree.add(a[i]);
if (tree.size() > k) {
tree.remove(a[i - k]);
}
if (tree.size() == k) {
print(tree.max());
}
}

You can actually do this in O(n) time with O(n) space.
Split the array into blocks of each.
[a1 a2 ... ak] [a(k+1) ... a2k] ...
For each block, maintain two more blocks, the left block and the right block.
The ith element of the left block will be the max of the i elements from the left.
The ith element of the right block will be the max of the i elements from the right.
You will have two such blocks for each block of k.
Now if you want to find the max in range a[i... i+k], say the elements span two of the above blocks of k.
[j-k+1 ... i i+1 ... j] [j+1 ... i+k ... j+k]
All you need to do is find the max of RightMax of i to j of the first block and the left max of j+1 to i+k of the second block.

Hope this is the solution which you are looking for:
def MaxContigousSum(lst, n):
m = [0]
if lst[0] > 0:
m[0] = lst[0]
maxsum = m[0]
for i in range(1, n):
if m[i - 1] + lst[i] > 0:
m.append(m[i - 1] + lst[i])
else:
m.append(0)
if m[i] > maxsum:
maxsum = m[i]
return maxsum
lst = [-2, 11, -4, 13, -5, 2, 1, -3, 4, -2, -1, -6, -9]
print MaxContigousSum(lst, len(lst))
**Output**
20 for [11, -4, 13]

Related

Find maximum sum of subarray with length less than or equal to k in Python

Given n= 8, pn/ = [2, 5, -7, 8, -6, 4, 1, -9], k= 5. We can select the subarray [2, 5, -7, 8] with sum = 8 and size 4 which is less than k= 5.Hence, the answer is 8. It can be shown that the answer cannot be greater than 8.
Solvable with a deque in O(n). Approximate code, not tested and definitely missing some edge cases, but presents the right idea:
deque = []
s = prefix sums of p, s[i] = p[0] + p[1] + ... + p[i]
for i = 0, n:
if not deque.empty and i - deque.first > k + 1: # otherwise we would have subarrays longer than k (s[a] - s[b] gives the sum pf p[b+1]+...+s[a])
deque.pop_first()
while not deque.empty and s[i] < s[ deque.last ]: # otherwise our queue wouldn't be sorted anymore
deque.pop_last()
if s[i] - s[ deque.first ] > global_max:
global_max = s[i] - s[ deque.first ]
deque.push_last(i)
The idea is to use the deque to store the index of the minimum prefix sum for the last k positions. That one will contribute to the maximum possible subarray for the current position, and it will be the first in the deque. We keep the deque sorted to make sure that the first element is always the minimum.
Because each element can enter and leave the deque at most once, even if we use nested loops, the complexity is linear.

Finding largest sum in an unsorted array using divide and conquer algorithm

I have a sequence of n real numbers stored in a array, A[1], A[2], …, A[n]. I am trying to implement a divide and conquer algorithm to find two numbers A[i] and A[j], where i < j, such that A[i] ≤ A[j] and their sum is the largest.
For eg. {2, 5, 9, 3, -2, 7} will give the output of 14 (5+9, not 16=9+7). Can anyone suggest me some ideas on how to do it?
Thanks in advance.
This problem is not really suited to a divide and conquer approach. It's easy to observe that if (i, j) is a solution for this problem, then A[j] >= A[k] for every k > j, i.e A[j] is the maximum in A[j..n]
Prove: if there exists such k > j and A[k] > A[j], then (j, k) is a better solution than (i, j)
So we only need to consider js that satisfies that criteria.
Algorithm (pseudo-code)
maxj = n
for (j = n - 1 down to 1):
if (a[j] > a[maxj]) then:
maxj = j
else:
check if (j, maxj) is a better solution
Complexity: O(n)
C++ implementation: http://ideone.com/ENp5WR (The implementation use an integer array, but it should be the same for floats)
Declare two variables, during your algorithm check if the current number is bigger than either of the two values currently be stored in the variables, if yes replace the smallest, if not, continue.
Here's a recursive solution in Python. I wouldn't exactly call it "divide and conquer" but then again, this problem isn't very suited to a divide and conquer approach.
def recurse(lst, pair): # the remaining list left to process
if not lst: return # if lst is empty, return
for i in lst[1:]: # for each elements in lst starting from index 1
curr_sum = lst[0] + i
if lst[0] < i and curr_sum > pair[0]+pair[1]: # if the first value is less than the second and curr_sum is greater than the max sum so far
pair[0] = lst[0]
pair[1] = i # update pair to contain the new pair of values that give the max sum
recurse(lst[1:], pair) # recurse on the sub list from index 1 to the end
def find_pair(s):
if len(s) < 2: return s[0]
pair = [s[0],s[1]] # initialises pair array
recurse(s, pair) # passed by reference
return pair
Sample output:
s = [2, 5, 9, 3, -2, 7]
find_pair(s) # ============> (5,9)
I think you can just use an algorithm in O(n) as described follow
(The merge part uses constant time)
Here is the outline of the algorithm:
Divide the problem into two half: LHS & RHS
Each half should returned the largest answer meeting the requirement in that half AND the largest element in that half
Merge and return the answer to upper level: answer is the maximum of LHS's answer, RHS's answer, and the sum of the largest element in both half (consider this only if RHS's largest element >= LHS's largest element)
Here is the demonstration of the algorithm using your example: {2, 5, 9, 3, -2, 7}
Divide into {2,5,9}, {3,-2,7}
Divide into {2,5}, {9}, {3,-2}, {7}
{2,5} return max(2,5, 5+2) = 7, largest element = 5
{9} return 9, largest element = 9
{3,-2} return max(3,-2) = 3, largest element = 3
{7} return 7, largest element = 7
{2,5,9} merged from {2,5} & {9}: return max(7,9,9+5) = 14, largest element = max(9,5) = 9
{3,-2,7} merged from {3,-2} & {7}: return max(3,7,7+3) = 10, largest element = max(7,3) = 7
{2,5,9,3,-2,7} merged from {2,5,9} and {3,-2,7}: return max(14,10) = 14, largest element = max(9,7) = 9
ans = 14
Special cases like {5,4,3,2,1} which yields no answer needs extra handling but not affecting the core part and the complexity of the algorithm.

Sum of continuous sequences

Given an array A with N elements, I want to find the sum of minimum elements in all the possible contiguous sub-sequences of A. I know if N is small we can look for all possible sub sequences but as N is upto 10^5 what can be best way to find this sum?
Example: Let N=3 and A[1,2,3] then ans is 10 as Possible contiguous sub sequences {(1),(2),(3),(1,2),(1,2,3),(2,3)} so Sum of minimum elements = 1 + 2 + 3 + 1 + 1 + 2 = 10
Let's fix one element(a[i]). We want to know the position of the rightmost element smaller than this one located to the left from i(L). We also need to know the position of the leftmost element smaller than this one located to the right from i(R).
If we know L and R, we should add (i - L) * (R - i) * a[i] to the answer.
It is possible to precompute L and R for all i in linear time using a stack. Pseudo code:
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
We can reverse the array and run the same algorithm to find R.
How to deal with equal elements? Let's assume that a[i] is a pair <a[i], i>. All elements are distinct now.
The time complexity is O(n).
Here is a full pseudo code(I assume that int can hold any integer value here, you should
choose a feasible type to avoid an overflow in a real code. I also assume that all elements are distinct):
int[] getLeftSmallerElementPositions(int[] a):
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
return L
int[] getRightSmallerElementPositions(int[] a):
R = getLeftSmallerElementPositions(reversed(a))
for i <- 0 ... n - 1:
R[i] = n - 1 - R[i]
return reversed(R)
int findSum(int[] a):
L = getLeftSmallerElementPositions(a)
R = getRightSmallerElementPositions(a)
int res = 0
for i <- 0 ... n - 1:
res += (i - L[i]) * (R[i] - i) * a[i]
return res
If the list is sorted, you can consider all subsets for size 1, then 2, then 3, to N. The algorithm is initially somewhat inefficient, but an optimized version is below. Here's some pseudocode.
let A = {1, 2, 3}
let total_sum = 0
for set_size <- 1 to N
total_sum += sum(A[1:N-(set_size-1)])
First, sets with one element:{{1}, {2}, {3}}: sum each of the elements.
Then, sets of two element {{1, 2}, {2, 3}}: sum each element but the last.
Then, sets of three elements {{1, 2, 3}}: sum each element but the last two.
But this algorithm is inefficient. To optimize to O(n), multiply each ith element by N-i and sum (indexing from zero here). The intuition is that the first element is the minimum of N sets, the second element is the minimum of N-1 sets, etc.
I know it's not a python question, but sometimes code helps:
A = [1, 2, 3]
# This is [3, 2, 1]
scale = range(len(A), 0, -1)
# Take the element-wise product of the vectors, and sum
sum(a*b for (a,b) in zip(A, scale))
# Or just use the dot product
np.dot(A, scale)

count no. of pairs such that absolute difference is less than K

Given an array A of size N, how do I count the number of pairs(A[i], A[j]) such that the absolute difference between them is less than or equal to K where K is any positive natural number? (i, j<=N and i!=j)
My approach:
Sort the array.
Create another array that stores the absolute difference between two consecutive numbers.
Am I heading in the right direction? If yes, then how do I proceed further?
Here is a O(nlogn) algorithm :-
1. sort input
2. traverse the sorted array in ascending order.
3. for A[i] find largest index A[j]<=A[i]+k using binary search.
4. count = count+j-i
5. do 3 to 4 all i's
Time complexity :-
Sorting : O(n)
Binary Search : O(logn)
Overall : O(nlogn)
This is O(n^2):
Sort the array
For each item_i in array,
For each item_j in array such that j > i
If item_j - item_i <= k, print (item_j, item_i)
Else proceed with the next item_i
Your approach is partially correct. You first sort the array. Then keep two pointers i and j.
1. Initialize i = 0, j = 1.
2. Check if A[j] - A[i] <= K.
- If yes, then increment j,
- else
- **increase the count of pairs by C(j-i,2)**.
- increment i.
- if i == j, then increment j.
3. Do this till pointer j goes past the end of the array. Then just add C(j-1,2) to the count and stop.
By i and j, you are basically maintaining a window within which the difference between elements is <= K.
EDIT: This is the basic idea, you will have to check for boundary conditions. Also you will have to keep track of the past interval that was added to the count. You will need to subtract the overlap with the current interval to avoid double counting.
Complexity: O(NlogN), for the sort operation, linear for the array traversal
Once your array is sorted, you can compute the sum in O(N) time.
Here's some code. The O(N) algorithm is pair_sums, and pair_sums_slow is the obviously correct, but O(N^2) algorithm. I run through some test cases at the end to make sure that the two algorithms returns the same results.
def pair_sums(A, k):
A.sort()
counts = 0
j = 0
for i in xrange(len(A)):
while j < len(A) and A[j] - A[i] <= k:
j+=1
counts += j - i - 1
return counts
def pair_sums_slow(A, k):
counts = 0
for i in xrange(len(A)):
for j in xrange(i+1, len(A)):
if A[j] - A[i] <= k:
counts+=1
return counts
cases = [
([0, 1, 2, 3, 4, 5], 10),
([0, 0, 0, 0, 0], 1),
([0, 1, 2, 4, 8, 16], 9),
([0, -1, -2, 1, 2], 2)
]
for A, k in cases:
want = pair_sums_slow(A, k)
got = pair_sums(A, k)
if want != got:
print A, k, want, got
The idea behind pair_sums is that for each i, we find the smallest j such that A[j] - A[i] > K (or j=N). Then j-i-1 is the number of pairs with i as the first value.
Because the array is sorted, j only ever increases as i increases, so the overall complexity is linear since although there's nested loops the inner operation j+=1 can occur at most N times.

Rank the suffixes of a list

ranking an element x in an array/list is just to find out how many elements in the array/list that strictly smaller than x.
So ranking a list is just get ranks of all elements in the list.
For example, rank [51, 38, 29, 51, 63, 38] = [3, 1, 0, 3, 5, 1], i.e., there are 3 elements smaller than 51, etc.
Ranking a list can be done in O(NlogN). Basically, we can sort the list while remembering the original index of each element, and then see for each element, how many before it.
The question here is How to rank the suffixes of a list, in O(NlogN)?
Ranking the suffixes of a list means:
for list [3; 1; 2], rank [[3;1;2]; [1;2]; [2]]
note that elements may not be distinct.
edit
We don't need to print out all elements for all suffixes. You can image that we just need to print out a list/array, where each element is a rank of a suffix.
For example, rank suffix_of_[3;1;2] = rank [[3;1;2]; [1;2]; [2]] = [2;0;1] and you just print out [2;0;1].
edit 2
Let me explain what is all suffixes and what means sorting/ranking all suffixes more clearly here.
Suppose we have an array/list [e1;e2;e3;e4;e5].
Then all suffixes of [e1;e2;e3;e4;e5] are:
[e1;e2;e3;e4;e5]
[e2;e3;e4;e5]
[e3;e4;e5]
[e4;e5]
[e5]
for example, all suffixes of [4;2;3;1;0] are
[4;2;3;1;0]
[2;3;1;0]
[3;1;0]
[1;0]
[0]
Sorting above 5 suffixes implies lexicographic sort. sorting above all suffixes, you get
[0]
[1;0]
[2;3;1;0]
[3;1;0]
[4;2;3;1;0]
by the way, if you can't image how 5 lists/arrays can be sorted among them, just think of sorting strings in lexicographic order.
"0" < "10" < "2310" < "310" < "42310"
It seems sorting all suffixes is actually sorting all elements of the original array.
However, please be careful that all elements may not be distinct, for example
for [4;2;2;1;0], all suffixes are:
[4;2;2;1;0]
[2;2;1;0]
[2;1;0]
[1;0]
[0]
then the order is
[0]
[1;0]
[2;1;0]
[2;2;1;0]
[4;2;2;1;0]
As MBo noted correctly, your problem is that of constructing the suffix array of your input list. The fast and complicated algorithms to do this are actually linear time, but since you only aim for O(n log n), I will try to propose a simpler version that is much easier to implement.
Basic idea and an initial O(n log² n) implementation
Let's take the sequence [4, 2, 2, 1] as an example. Its suffixes are
0: 4 2 2 1
1: 2 2 1
2: 2 1
3: 1
I numbered the suffixes with their starting index in the original sequence. Ultimately we want to sort this set of suffixes lexicographically, and fast. We know we can represent each suffix using its starting index in constant space and we can sort in O(n log n) comparisons using merge sort, heap sort or a similar algorithm. So the question remains, how can we compare two suffixes fast?
Let's say we want to compare the suffixes [2, 2, 1] and [2, 1]. We can pad those with negative infinity values changing the result of the comparison: [2, 2, 1, -∞] and [2, 1, -∞, -∞].
Now the key idea here is the following divide-and-conquer observation: Instead of comparing the sequences character by character until we find a position where the two differ, we can instead split both lists in half and compare the halves lexicographically:
[a, b, c, d] < [e, f, g, h]
<=> ([a, b], [c, d]) < ([e, f], [g, h])
<=> [a, b] < [e, f] or ([a, b,] = [e, f] and [c, d] < [g, h])
Essentially we have decomposed the problem of comparing the sequences into two problems of comparing smaller sequences. This leads to the following algorithm:
Step 1: Sort the substrings (contiguous subsequences) of length 1. In our example, the substrings of length 1 are [4], [2], [2], [1]. Every substring can be represented by the starting position in the original list. We sort them by a simple comparison sort and get [1], [2], [2], [4]. We store the result by assigning to every position it's rank in the sorted lists of lists:
position substring rank
0 [4] 2
1 [2] 1
2 [2] 1
3 [1] 0
It is important that we assign the same rank to equal substrings!
Step 2: Now we want to sort the substrings of length 2. The are only really 3 such substrings, but we assign one to every position by padding with negative infinity if necessary. The trick here is that we can use our divide-and-conquer idea from above and the ranks assigned in step 1 to do a fast comparison (this isn't really necessary yet but will become important later).
position substring halves ranks from step 1 final rank
0 [4, 2] ([4], [2]) (2, 1) 3
1 [2, 2] ([2], [2]) (1, 1) 2
2 [2, 1] ([2], [2]) (1, 0) 1
3 [1, -∞] ([1], [-∞]) (0, -∞) 0
Step 3: You guessed it, now we sort substrings of length 4 (!). These are exactly the suffixes of the list! We can use the divide-and-conquer trick and the results from step 2 this time:
position substring halves ranks from step 2 final rank
0 [4, 2, 2, 1] ([4, 2], [2, 1]) (3, 1) 3
1 [2, 2, 1, -∞] ([2, 2], [1, -∞]) (2, 0) 2
2 [2, 1, -∞, -∞] ([2, 1], [-∞,-∞]) (1, -∞) 1
3 [1, -∞, -∞, -∞] ([1,-∞], [-∞,-∞]) (0, -∞) 0
We're done! If our initial sequence would have had size 2^k, we would have needed k steps. Or put the other way round, we need log_2 n steps to process a sequence of size n. If its length is not a power of two, we just pad with negative infinity.
For an actual implementation we just need to remember the sequence "final rank" for every step of the algorithm.
An implementation in C++ could look like this (compile with -std=c++11):
#include <algorithm>
#include <iostream>
using namespace std;
int seq[] = {8, 3, 2, 4, 2, 2, 1};
const int n = 7;
const int log2n = 3; // log2n = ceil(log_2(n))
int Rank[log2n + 1][n]; // Rank[i] will save the final Ranks of step i
tuple<int, int, int> L[n]; // L is a list of tuples. in step i,
// this will hold pairs of Ranks from step i - 1
// along with the substring index
const int neginf = -1; // should be smaller than all the numbers in seq
int main() {
for (int i = 0; i < n; ++i)
Rank[1][i] = seq[i]; // step 1 is actually simple if you think about it
for (int step = 2; step <= log2n; ++step) {
int length = 1 << (step - 1); // length is 2^(step - 1)
for (int i = 0; i < n; ++i)
L[i] = make_tuple(
Rank[step - 1][i],
(i + length / 2 < n) ? Rank[step - 1][i + length / 2] : neginf,
i); // we need to know where the tuple came from later
sort(L, L + n); // lexicographical sort
for (int i = 0; i < n; ++i) {
// we save the rank of the index, but we need to be careful to
// assign equal ranks to equal pairs
Rank[step][get<2>(L[i])] = (i > 0 && get<0>(L[i]) == get<0>(L[i - 1])
&& get<1>(L[i]) == get<1>(L[i - 1]))
? Rank[step][get<2>(L[i - 1])]
: i;
}
}
// the suffix array is in L after the last step
for (int i = 0; i < n; ++i) {
int start = get<2>(L[i]);
cout << start << ":";
for (int j = start; j < n; ++j)
cout << " " << seq[j];
cout << endl;
}
}
Output:
6: 1
5: 2 1
4: 2 2 1
2: 2 4 2 2 1
1: 3 2 4 2 2 1
3: 4 2 2 1
0: 8 3 2 4 2 2 1
The complexity is O(log n * (n + sort)), which is O(n log² n) in this implementation because we use a comparison sort of complexity O(n log n)
A simple O(n log n) algorithm
If we manage to do the sorting parts in O(n) per step, we get a O(n log n) bound. So basically we have to sort a sequence of pairs (x, y), where 0 <= x, y < n. We know that we can sort a sequence of integers in the given range in O(n) time using counting sort. We can intepret our pairs (x, y) as numbers z = n * x + y in base n. We can now see how to use LSD radix sort to sort the pairs.
In practice, this means we sort the pairs by increasing y using counting sort, and then use counting sort again to sort by increasing x. Since counting sort is stable, this gives us the lexicographical order of our pairs in 2 * O(n) = O(n). The final complexity is thus O(n log n).
In case you are interested, you can find an O(n log² n) implementation of the approach at my Github repo. The implementation has 27 lines of code. Neat, ain't it?
This is exactly suffix array construction problem, and wiki page contains links to the linear-complexity algorithms (probably, depending on alphabet)

Resources