Arrange n items in k nonempty groups such that the difference between the minimum element and the maximum element of each group is minimized - algorithm

Given N items with values x[1], ..., x[n] and an integer K find a linear time algorithm to arrange these N items in K non empty groups such that in each group the range (difference between minimum and maximum element values/keys in each group) is minimized and therefore the sum of the ranges is minimum.
For example given N=4, K=2 and the elements 1 1 4 3 the minimum range is 1 for groups (1,1) and (4,3).

You can binary search the answer.
Assume the optimal answer is x. Now you should verify whether we can group the items into k groups where the maximum difference between the group items is at most x. This can be done in O(n) [after sorting the array]. Traverse the sorted array and pick consecutive items until the difference between minimum number you have picked for this group and the maximum number you have picked hasn't exceeded x. After that you should initialize a new group and repeat this process. At the end count how many groups you have made. If the number of groups is more than k we can conclude that we can not group the items in k groups with x being the answer. So we should increase x. By binary searching on x we can find the minimum x.
The overall complexity is O(NlogN).
Here is a sample implementation in C++
#include <algorithm>
#include <iostream>
using namespace std;
int main()
{
int n = 4, k = 2;
std::vector<int> v = {1, 1, 4, 3};
sort(v.begin(), v.end());
int low = 0, high = *max_element(v.begin(), v.end());
while ( low < high ){
int x = (low+high)/2;
int groups = 0;
int left = 0;
while (left < v.size()){
int right = left;
while( right < v.size() && v[right] - v[left] <= x ){
++right;
}
++groups;
left = right;
}
// printf("x:%d groups:%d\n", x, groups );
if (groups > k)
{
low = x + 1;
} else {
high = x;
}
}
cout << "result is " << low << endl;
}

Alright, I'll assume that we want to minimize the sum of differences over all groups.
Let's sort the numbers. There's an optimal answer where each group is a consecutive segment in the sorted array (proof: let A1 < B1 < A2 < B2. We can exchange A2 and B1. The answer will not increase).
Let a[l], a[l + 1], ..., a[r] is a group. It's cost is a[r] - a[l] = (a[r] - a[r - 1]) + (a[r - 1] - a[r - 2]) + ... + (a[l + 1] - a[l]). It leads us to a key insight: k groups is k - 1 gaps and the answer is a[n - 1] - a[0] - sum of gaps. Thus, we just need to maximize the gaps.
Here is a final solution:
sort the array
compute differences between adjacent numbers
take k - 1 largest differences. That's exactly where the groups split.
We can find the k-1th largest element in linear time (or if we are fine with O(N log N) time, we can just sort them). That's it.
Here is an example:
x = [1, 1, 4, 3], k = 2
sorted: [1, 1, 3, 4]
differences: [0, 2, 1]
taking k - 1 = 1 largest gaps: it's 2. Thus the groups are [1, 1] and [3, 4].
A slightly more contrived one:
x = [8, 2, 0, 3], k = 3
sorted: [0, 2, 3, 8]
differences: [2, 1, 5]
taking k - 1 = 2 largest gaps: they're 2 and 5. Thus, the groups are [0], [2, 3], [8] with the total cost of 1.

Related

Find maximum sum of subarray with length less than or equal to k in Python

Given n= 8, pn/ = [2, 5, -7, 8, -6, 4, 1, -9], k= 5. We can select the subarray [2, 5, -7, 8] with sum = 8 and size 4 which is less than k= 5.Hence, the answer is 8. It can be shown that the answer cannot be greater than 8.
Solvable with a deque in O(n). Approximate code, not tested and definitely missing some edge cases, but presents the right idea:
deque = []
s = prefix sums of p, s[i] = p[0] + p[1] + ... + p[i]
for i = 0, n:
if not deque.empty and i - deque.first > k + 1: # otherwise we would have subarrays longer than k (s[a] - s[b] gives the sum pf p[b+1]+...+s[a])
deque.pop_first()
while not deque.empty and s[i] < s[ deque.last ]: # otherwise our queue wouldn't be sorted anymore
deque.pop_last()
if s[i] - s[ deque.first ] > global_max:
global_max = s[i] - s[ deque.first ]
deque.push_last(i)
The idea is to use the deque to store the index of the minimum prefix sum for the last k positions. That one will contribute to the maximum possible subarray for the current position, and it will be the first in the deque. We keep the deque sorted to make sure that the first element is always the minimum.
Because each element can enter and leave the deque at most once, even if we use nested loops, the complexity is linear.

Find a subset with sum within a range

How can I find a subset of an array that the sum of its elements is within a given range?
For example:
let a = [ 1, 1, 3, 6, 7, 50]
let b = getSubsetSumRange(3, 5)
so b could potentially be [1, 1, 3], [1, 3], [3], [1, 3]; It doesn't matter the order I only need one of them.
You would probably like to use dynamic programming approach to solve this problem.
Let F[i][j] have value true if it is possible to select some numbers among numbers from the original subset a[1..i] so that their sum is equal to j.
i would obviously vary from 1 to length of a, and j from 0 to max inclusively, where max is the second number from your given range.
F[i][0] = true for all i by definition (you can always select empty subset).
Then F[i][j] = F[i - 1][j - a[i]] | F[i - 1][j]
Logically it means that if you can select a subset with sum j from elements 1..i-1, then you obviously can do it with the subset 1..i, and if you can select a subset with sum j - a[i] from elements 1..i-1, then by adding your new element a[i] to that subset, you can get your desired sum j.
After you have calculated the values of F, you can find any F[n][j] that is true for values j lying in your desired range.
Say you have found such number k. Then the algorithm to find the required set would look like that:
for i = n..1:
if F[i - 1][k - a[i]] == True then
output a[i] to the answer
k -= a[i]
if k == 0
break

Sum of continuous sequences

Given an array A with N elements, I want to find the sum of minimum elements in all the possible contiguous sub-sequences of A. I know if N is small we can look for all possible sub sequences but as N is upto 10^5 what can be best way to find this sum?
Example: Let N=3 and A[1,2,3] then ans is 10 as Possible contiguous sub sequences {(1),(2),(3),(1,2),(1,2,3),(2,3)} so Sum of minimum elements = 1 + 2 + 3 + 1 + 1 + 2 = 10
Let's fix one element(a[i]). We want to know the position of the rightmost element smaller than this one located to the left from i(L). We also need to know the position of the leftmost element smaller than this one located to the right from i(R).
If we know L and R, we should add (i - L) * (R - i) * a[i] to the answer.
It is possible to precompute L and R for all i in linear time using a stack. Pseudo code:
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
We can reverse the array and run the same algorithm to find R.
How to deal with equal elements? Let's assume that a[i] is a pair <a[i], i>. All elements are distinct now.
The time complexity is O(n).
Here is a full pseudo code(I assume that int can hold any integer value here, you should
choose a feasible type to avoid an overflow in a real code. I also assume that all elements are distinct):
int[] getLeftSmallerElementPositions(int[] a):
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
return L
int[] getRightSmallerElementPositions(int[] a):
R = getLeftSmallerElementPositions(reversed(a))
for i <- 0 ... n - 1:
R[i] = n - 1 - R[i]
return reversed(R)
int findSum(int[] a):
L = getLeftSmallerElementPositions(a)
R = getRightSmallerElementPositions(a)
int res = 0
for i <- 0 ... n - 1:
res += (i - L[i]) * (R[i] - i) * a[i]
return res
If the list is sorted, you can consider all subsets for size 1, then 2, then 3, to N. The algorithm is initially somewhat inefficient, but an optimized version is below. Here's some pseudocode.
let A = {1, 2, 3}
let total_sum = 0
for set_size <- 1 to N
total_sum += sum(A[1:N-(set_size-1)])
First, sets with one element:{{1}, {2}, {3}}: sum each of the elements.
Then, sets of two element {{1, 2}, {2, 3}}: sum each element but the last.
Then, sets of three elements {{1, 2, 3}}: sum each element but the last two.
But this algorithm is inefficient. To optimize to O(n), multiply each ith element by N-i and sum (indexing from zero here). The intuition is that the first element is the minimum of N sets, the second element is the minimum of N-1 sets, etc.
I know it's not a python question, but sometimes code helps:
A = [1, 2, 3]
# This is [3, 2, 1]
scale = range(len(A), 0, -1)
# Take the element-wise product of the vectors, and sum
sum(a*b for (a,b) in zip(A, scale))
# Or just use the dot product
np.dot(A, scale)

Rank the suffixes of a list

ranking an element x in an array/list is just to find out how many elements in the array/list that strictly smaller than x.
So ranking a list is just get ranks of all elements in the list.
For example, rank [51, 38, 29, 51, 63, 38] = [3, 1, 0, 3, 5, 1], i.e., there are 3 elements smaller than 51, etc.
Ranking a list can be done in O(NlogN). Basically, we can sort the list while remembering the original index of each element, and then see for each element, how many before it.
The question here is How to rank the suffixes of a list, in O(NlogN)?
Ranking the suffixes of a list means:
for list [3; 1; 2], rank [[3;1;2]; [1;2]; [2]]
note that elements may not be distinct.
edit
We don't need to print out all elements for all suffixes. You can image that we just need to print out a list/array, where each element is a rank of a suffix.
For example, rank suffix_of_[3;1;2] = rank [[3;1;2]; [1;2]; [2]] = [2;0;1] and you just print out [2;0;1].
edit 2
Let me explain what is all suffixes and what means sorting/ranking all suffixes more clearly here.
Suppose we have an array/list [e1;e2;e3;e4;e5].
Then all suffixes of [e1;e2;e3;e4;e5] are:
[e1;e2;e3;e4;e5]
[e2;e3;e4;e5]
[e3;e4;e5]
[e4;e5]
[e5]
for example, all suffixes of [4;2;3;1;0] are
[4;2;3;1;0]
[2;3;1;0]
[3;1;0]
[1;0]
[0]
Sorting above 5 suffixes implies lexicographic sort. sorting above all suffixes, you get
[0]
[1;0]
[2;3;1;0]
[3;1;0]
[4;2;3;1;0]
by the way, if you can't image how 5 lists/arrays can be sorted among them, just think of sorting strings in lexicographic order.
"0" < "10" < "2310" < "310" < "42310"
It seems sorting all suffixes is actually sorting all elements of the original array.
However, please be careful that all elements may not be distinct, for example
for [4;2;2;1;0], all suffixes are:
[4;2;2;1;0]
[2;2;1;0]
[2;1;0]
[1;0]
[0]
then the order is
[0]
[1;0]
[2;1;0]
[2;2;1;0]
[4;2;2;1;0]
As MBo noted correctly, your problem is that of constructing the suffix array of your input list. The fast and complicated algorithms to do this are actually linear time, but since you only aim for O(n log n), I will try to propose a simpler version that is much easier to implement.
Basic idea and an initial O(n log² n) implementation
Let's take the sequence [4, 2, 2, 1] as an example. Its suffixes are
0: 4 2 2 1
1: 2 2 1
2: 2 1
3: 1
I numbered the suffixes with their starting index in the original sequence. Ultimately we want to sort this set of suffixes lexicographically, and fast. We know we can represent each suffix using its starting index in constant space and we can sort in O(n log n) comparisons using merge sort, heap sort or a similar algorithm. So the question remains, how can we compare two suffixes fast?
Let's say we want to compare the suffixes [2, 2, 1] and [2, 1]. We can pad those with negative infinity values changing the result of the comparison: [2, 2, 1, -∞] and [2, 1, -∞, -∞].
Now the key idea here is the following divide-and-conquer observation: Instead of comparing the sequences character by character until we find a position where the two differ, we can instead split both lists in half and compare the halves lexicographically:
[a, b, c, d] < [e, f, g, h]
<=> ([a, b], [c, d]) < ([e, f], [g, h])
<=> [a, b] < [e, f] or ([a, b,] = [e, f] and [c, d] < [g, h])
Essentially we have decomposed the problem of comparing the sequences into two problems of comparing smaller sequences. This leads to the following algorithm:
Step 1: Sort the substrings (contiguous subsequences) of length 1. In our example, the substrings of length 1 are [4], [2], [2], [1]. Every substring can be represented by the starting position in the original list. We sort them by a simple comparison sort and get [1], [2], [2], [4]. We store the result by assigning to every position it's rank in the sorted lists of lists:
position substring rank
0 [4] 2
1 [2] 1
2 [2] 1
3 [1] 0
It is important that we assign the same rank to equal substrings!
Step 2: Now we want to sort the substrings of length 2. The are only really 3 such substrings, but we assign one to every position by padding with negative infinity if necessary. The trick here is that we can use our divide-and-conquer idea from above and the ranks assigned in step 1 to do a fast comparison (this isn't really necessary yet but will become important later).
position substring halves ranks from step 1 final rank
0 [4, 2] ([4], [2]) (2, 1) 3
1 [2, 2] ([2], [2]) (1, 1) 2
2 [2, 1] ([2], [2]) (1, 0) 1
3 [1, -∞] ([1], [-∞]) (0, -∞) 0
Step 3: You guessed it, now we sort substrings of length 4 (!). These are exactly the suffixes of the list! We can use the divide-and-conquer trick and the results from step 2 this time:
position substring halves ranks from step 2 final rank
0 [4, 2, 2, 1] ([4, 2], [2, 1]) (3, 1) 3
1 [2, 2, 1, -∞] ([2, 2], [1, -∞]) (2, 0) 2
2 [2, 1, -∞, -∞] ([2, 1], [-∞,-∞]) (1, -∞) 1
3 [1, -∞, -∞, -∞] ([1,-∞], [-∞,-∞]) (0, -∞) 0
We're done! If our initial sequence would have had size 2^k, we would have needed k steps. Or put the other way round, we need log_2 n steps to process a sequence of size n. If its length is not a power of two, we just pad with negative infinity.
For an actual implementation we just need to remember the sequence "final rank" for every step of the algorithm.
An implementation in C++ could look like this (compile with -std=c++11):
#include <algorithm>
#include <iostream>
using namespace std;
int seq[] = {8, 3, 2, 4, 2, 2, 1};
const int n = 7;
const int log2n = 3; // log2n = ceil(log_2(n))
int Rank[log2n + 1][n]; // Rank[i] will save the final Ranks of step i
tuple<int, int, int> L[n]; // L is a list of tuples. in step i,
// this will hold pairs of Ranks from step i - 1
// along with the substring index
const int neginf = -1; // should be smaller than all the numbers in seq
int main() {
for (int i = 0; i < n; ++i)
Rank[1][i] = seq[i]; // step 1 is actually simple if you think about it
for (int step = 2; step <= log2n; ++step) {
int length = 1 << (step - 1); // length is 2^(step - 1)
for (int i = 0; i < n; ++i)
L[i] = make_tuple(
Rank[step - 1][i],
(i + length / 2 < n) ? Rank[step - 1][i + length / 2] : neginf,
i); // we need to know where the tuple came from later
sort(L, L + n); // lexicographical sort
for (int i = 0; i < n; ++i) {
// we save the rank of the index, but we need to be careful to
// assign equal ranks to equal pairs
Rank[step][get<2>(L[i])] = (i > 0 && get<0>(L[i]) == get<0>(L[i - 1])
&& get<1>(L[i]) == get<1>(L[i - 1]))
? Rank[step][get<2>(L[i - 1])]
: i;
}
}
// the suffix array is in L after the last step
for (int i = 0; i < n; ++i) {
int start = get<2>(L[i]);
cout << start << ":";
for (int j = start; j < n; ++j)
cout << " " << seq[j];
cout << endl;
}
}
Output:
6: 1
5: 2 1
4: 2 2 1
2: 2 4 2 2 1
1: 3 2 4 2 2 1
3: 4 2 2 1
0: 8 3 2 4 2 2 1
The complexity is O(log n * (n + sort)), which is O(n log² n) in this implementation because we use a comparison sort of complexity O(n log n)
A simple O(n log n) algorithm
If we manage to do the sorting parts in O(n) per step, we get a O(n log n) bound. So basically we have to sort a sequence of pairs (x, y), where 0 <= x, y < n. We know that we can sort a sequence of integers in the given range in O(n) time using counting sort. We can intepret our pairs (x, y) as numbers z = n * x + y in base n. We can now see how to use LSD radix sort to sort the pairs.
In practice, this means we sort the pairs by increasing y using counting sort, and then use counting sort again to sort by increasing x. Since counting sort is stable, this gives us the lexicographical order of our pairs in 2 * O(n) = O(n). The final complexity is thus O(n log n).
In case you are interested, you can find an O(n log² n) implementation of the approach at my Github repo. The implementation has 27 lines of code. Neat, ain't it?
This is exactly suffix array construction problem, and wiki page contains links to the linear-complexity algorithms (probably, depending on alphabet)

Finding largest from each subarray of length k

Interview Question :- Given an array and an integer k , find the maximum for each and every contiguous sub array of size k.
Sample Input :
1 2 3 1 4 5 2 3 6
3 [ value of k ]
Sample Output :
3
3
4
5
5
5
6
I cant think of anything better than brute force. Worst case is O(nk) when array is sorted in decreasing order.
Just iterate over the array and keep k last elements in a self-balancing binary tree.
Adding element to such tree, removing element and finding current maximum costs O(logk).
Most languages provide standard implementations for such trees. In STL, IIRC, it's MultiSet. In Java you'd use TreeMap (map, because you need to keep count, how many times each element occurs, and Java doesn't provide Multi- collections).
Pseudocode
for (int i = 0; i < n; ++i) {
tree.add(a[i]);
if (tree.size() > k) {
tree.remove(a[i - k]);
}
if (tree.size() == k) {
print(tree.max());
}
}
You can actually do this in O(n) time with O(n) space.
Split the array into blocks of each.
[a1 a2 ... ak] [a(k+1) ... a2k] ...
For each block, maintain two more blocks, the left block and the right block.
The ith element of the left block will be the max of the i elements from the left.
The ith element of the right block will be the max of the i elements from the right.
You will have two such blocks for each block of k.
Now if you want to find the max in range a[i... i+k], say the elements span two of the above blocks of k.
[j-k+1 ... i i+1 ... j] [j+1 ... i+k ... j+k]
All you need to do is find the max of RightMax of i to j of the first block and the left max of j+1 to i+k of the second block.
Hope this is the solution which you are looking for:
def MaxContigousSum(lst, n):
m = [0]
if lst[0] > 0:
m[0] = lst[0]
maxsum = m[0]
for i in range(1, n):
if m[i - 1] + lst[i] > 0:
m.append(m[i - 1] + lst[i])
else:
m.append(0)
if m[i] > maxsum:
maxsum = m[i]
return maxsum
lst = [-2, 11, -4, 13, -5, 2, 1, -3, 4, -2, -1, -6, -9]
print MaxContigousSum(lst, len(lst))
**Output**
20 for [11, -4, 13]

Resources