The largest number of subsets for a set - algorithm

Given a set of X numbers less than or equal to Y, which may contain repeated numbers:
which algorithm gives you the maximum number of subsets whose sum of their elements is greater than or equal to Y, where none of the elements of one subset can be contained in another, and each subset cannot repeat the same element.
(note: if in the initial set two numbers are repeated, each counts as a distinct element)
subsets can group elements into duos, trios, quartets or any other number.
doing two for loops to search for the best combination for the highest number worked for doubles, but given that it is possible to do trios and so on, cases like "1 1 1 1 1 7 8" would be suboptimized

You could implement a 'brute force' method and go through every possible partitioning and check if it satisfies your requirements. This would be quite simple, but horribly inefficient except for trivial cases.
Suppose you have N elements e_i in your set S, with 0 <= e_i <= Y. Choose numparts as the number of partitions you are going to try to create with element sum >= Y. Assuming sum e_i >= Y, we can set numparts = 1 initially, otherwise, obviously, the answer is zero..
Then you can generate partitions by creating an array of N elements p_i where 0 <= p_i < numparts. There are not more than numparts^N possible such partitions!! Now, you have to try to find one in which for all 0 <= j < numparts, sum {e_i : p_i = j} >= Y. If you find one, increment numparts, if you don't, then you have your answer which is the largest numparts value for which you did find a qualifying partition.
You could improve the efficiency of this approach significantly by avoiding lots of partitions that don't have a sum >= Y. There are 'only' 2^N distinct subsets of S so the number of subsets with sums >=Y must be less than or equal to 2^N. For each such subset S_k, you can try to find the maximum number of partitions of S - S_k each with sums >= Y which is just a recursion of the same problem. This would give you the absolute maximum result you're looking for, but would still be a computational nightmare for non-trivial problems.
A quick-but-suboptimal algorithm is simply to sort the array in ascending order, then partition according to the partition sum as you process the sorted elements sequentially. e.g.
Suppose s[i] are the elements in the sorted array,...
partitionno = 0;
partitionsum = 0;
for (i=0; i<N; i++) {
partitionsum += s[i];
if (partitionsum >= Y) {
partitionsum = 0;
partitionno++;
}
}
giving partitionno subsets each having a sum of at least Y. Sorting can be done in O(N) time, and the algorithm above is also O(N) so you could use this for N in the 1000000s or more.

This is strongly NP hard since it contains as a special case the special case of the 3 partition problem of dividing a set into triplets that each have the same sum where all numbers are in the range of that sum/4 to that sum/2. And that special case is known to be strongly NP hard.
Therefore there is no known algorithm to solve it, and finding one would be a really big deal.

Related

How to find pair with kth largest sum?

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
The pairs with largest sums are
13 + 16 = 29
13 + 12 = 25
8 + 16 = 24
13 + 8 = 21
8 + 12 = 20
So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?
Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.
I am already aware of the O(Klogk) solution , using Max-Heap given here .
It also is one of the favorite Google interview question , and they demand a O(k) solution .
I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .
Can someone explain the correct solution with a pseudocode .
P.S.
Please DON'T post this link as answer/comment.It DOESN'T contain the answer.
I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.
Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N2) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?
We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.
This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.
If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.
I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).
A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.
OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.
Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).
EDIT: This does not work. I leave the answer, since apparently I am not the only one who could have this kind of idea; see the discussion below.
A counter-example is x = (2, 3, 6), y = (1, 4, 5) and k=3, where the algorithm gives 7 (3+4) instead of 8 (3+5).
Let x and y be the two arrays, sorted in decreasing order; we want to construct the K-th largest sum.
The variables are: i the index in the first array (element x[i]), j the index in the second array (element y[j]), and k the "order" of the sum (k in 1..K), in the sense that S(k)=x[i]+y[j] will be the k-th greater sum satisfying your conditions (this is the loop invariant).
Start from (i, j) equal to (0, 0): clearly, S(1) = x[0]+y[0].
for k from 1 to K-1, do:
if x[i+1]+ y[j] > x[i] + y[j+1], then i := i+1 (and j does not change) ; else j:=j+1
To see that it works, consider you have S(k) = x[i] + y[j]. Then, S(k+1) is the greatest sum which is lower (or equal) to S(k), and such as at least one element (i or j) changes. It is not difficult to see that exactly one of i or j should change.
If i changes, the greater sum you can construct which is lower than S(k) is by setting i=i+1, because x is decreasing and all the x[i'] + y[j] with i' < i are greater than S(k). The same holds for j, showing that S(k+1) is either x[i+1] + y[j] or x[i] + y[j+1].
Therefore, at the end of the loop you found the K-th greater sum.
tl;dr: If you look ahead and look behind at each iteration, you can start with the end (which is highest) and work back in O(K) time.
Although the insight underlying this approach is, I believe, sound, the code below is not quite correct at present (see comments).
Let's see: first of all, the arrays are sorted. So, if the arrays are a and b with lengths M and N, and as you have arranged them, the largest items are in slots M and N respectively, the largest pair will always be a[M]+b[N].
Now, what's the second largest pair? It's going to have perhaps one of {a[M],b[N]} (it can't have both, because that's just the largest pair again), and at least one of {a[M-1],b[N-1]}. BUT, we also know that if we choose a[M-1]+b[N-1], we can make one of the operands larger by choosing the higher number from the same list, so it will have exactly one number from the last column, and one from the penultimate column.
Consider the following two arrays: a = [1, 2, 53]; b = [66, 67, 68]. Our highest pair is 53+68. If we lose the smaller of those two, our pair is 68+2; if we lose the larger, it's 53+67. So, we have to look ahead to decide what our next pair will be. The simplest lookahead strategy is simply to calculate the sum of both possible pairs. That will always cost two additions, and two comparisons for each transition (three because we need to deal with the case where the sums are equal);let's call that cost Q).
At first, I was tempted to repeat that K-1 times. BUT there's a hitch: the next largest pair might actually be the other pair we can validly make from {{a[M],b[N]}, {a[M-1],b[N-1]}. So, we also need to look behind.
So, let's code (python, should be 2/3 compatible):
def kth(a,b,k):
M = len(a)
N = len(b)
if k > M*N:
raise ValueError("There are only %s possible pairs; you asked for the %sth largest, which is impossible" % M*N,k)
(ia,ib) = M-1,N-1 #0 based arrays
# we need this for lookback
nottakenindices = (0,0) # could be any value
nottakensum = float('-inf')
for i in range(k-1):
optionone = a[ia]+b[ib-1]
optiontwo = a[ia-1]+b[ib]
biggest = max((optionone,optiontwo))
#first deal with look behind
if nottakensum > biggest:
if optionone == biggest:
newnottakenindices = (ia,ib-1)
else: newnottakenindices = (ia-1,ib)
ia,ib = nottakenindices
nottakensum = biggest
nottakenindices = newnottakenindices
#deal with case where indices hit 0
elif ia <= 0 and ib <= 0:
ia = ib = 0
elif ia <= 0:
ib-=1
ia = 0
nottakensum = float('-inf')
elif ib <= 0:
ia-=1
ib = 0
nottakensum = float('-inf')
#lookahead cases
elif optionone > optiontwo:
#then choose the first option as our next pair
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
elif optionone < optiontwo: # choose the second
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#next two cases apply if options are equal
elif a[ia] > b[ib]:# drop the smallest
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
else: # might be equal or not - we can choose arbitrarily if equal
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#+2 - one for zero-based, one for skipping the 1st largest
data = (i+2,a[ia],b[ib],a[ia]+b[ib],ia,ib)
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
if ia <= 0 and ib <= 0:
raise ValueError("Both arrays exhausted before Kth (%sth) pair reached"%data[0])
return data, narrative
For those without python, here's an ideone: http://ideone.com/tfm2MA
At worst, we have 5 comparisons in each iteration, and K-1 iterations, which means that this is an O(K) algorithm.
Now, it might be possible to exploit information about differences between values to optimise this a little bit, but this accomplishes the goal.
Here's a reference implementation (not O(K), but will always work, unless there's a corner case with cases where pairs have equal sums):
import itertools
def refkth(a,b,k):
(rightia,righta),(rightib,rightb) = sorted(itertools.product(enumerate(a),enumerate(b)), key=lamba((ia,ea),(ib,eb):ea+eb)[k-1]
data = k,righta,rightb,righta+rightb,rightia,rightib
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
return data, narrative
This calculates the cartesian product of the two arrays (i.e. all possible pairs), sorts them by sum, and takes the kth element. The enumerate function decorates each item with its index.
The max-heap algorithm in the other question is simple, fast and correct. Don't knock it. It's really well explained too. https://stackoverflow.com/a/5212618/284795
Might be there isn't any O(k) algorithm. That's okay, O(k log k) is almost as fast.
If the last two solutions were at (a1, b1), (a2, b2), then it seems to me there are only four candidate solutions (a1-1, b1) (a1, b1-1) (a2-1, b2) (a2, b2-1). This intuition could be wrong. Surely there are at most four candidates for each coordinate, and the next highest is among the 16 pairs (a in {a1,a2,a1-1,a2-1}, b in {b1,b2,b1-1,b2-1}). That's O(k).
(No it's not, still not sure whether that's possible.)
[2, 3, 5, 8, 13]
[4, 8, 12, 16]
Merge the 2 arrays and note down the indexes in the sorted array. Here is the index array looks like (starting from 1 not 0)
[1, 2, 4, 6, 8]
[3, 5, 7, 9]
Now start from end and make tuples. sum the elements in the tuple and pick the kth largest sum.
public static List<List<Integer>> optimization(int[] nums1, int[] nums2, int k) {
// 2 * O(n log(n))
Arrays.sort(nums1);
Arrays.sort(nums2);
List<List<Integer>> results = new ArrayList<>(k);
int endIndex = 0;
// Find the number whose square is the first one bigger than k
for (int i = 1; i <= k; i++) {
if (i * i >= k) {
endIndex = i;
break;
}
}
// The following Iteration provides at most endIndex^2 elements, and both arrays are in ascending order,
// so k smallest pairs must can be found in this iteration. To flatten the nested loop, refer
// 'https://stackoverflow.com/questions/7457879/algorithm-to-optimize-nested-loops'
for (int i = 0; i < endIndex * endIndex; i++) {
int m = i / endIndex;
int n = i % endIndex;
List<Integer> item = new ArrayList<>(2);
item.add(nums1[m]);
item.add(nums2[n]);
results.add(item);
}
results.sort(Comparator.comparing(pair->pair.get(0) + pair.get(1)));
return results.stream().limit(k).collect(Collectors.toList());
}
Key to eliminate O(n^2):
Avoid cartesian product(or 'cross join' like operation) of both arrays, which means flattening the nested loop.
Downsize iteration over the 2 arrays.
So:
Sort both arrays (Arrays.sort offers O(n log(n)) performance according to Java doc)
Limit the iteration range to the size which is just big enough to support k smallest pairs searching.

Given a set of n integers, list all possible subsets with sum>=k

Given an unsorted set of integers in the form of array, find all possible subsets whose sum is greater than or equal to a const integer k,
eg:- Our set is {1,2,3} and k=2
Possible subsets:-
{2},
{3},
{1,2},
{1,3},
{2,3},
{1,2,3}
I can only think of a naive algorithm which lists all the subsets of set and checks if sum of subset is >=k or not, but its an exponential algorithm and listing all subsets requires O(2^N). Can I use dynamic programming to solve it in polynomial time?
Listing all the subsets is going to be still O(2^N) because in the worst case you may still have to list all subsets apart from the empty one.
Dynamic programming can help you count the number of sets that have sum >= K
You go bottom-up keeping track of how many subsets summed to some value from range [1..K]. An approach like this will be O(N*K) which is going to be only feasible for small K.
The idea with the dynamic programming solution is best illustrated with an example. Consider this situation. Assume you know that out of all the sets composed of the first i elements you know that t1 sum to 2 and t2 sum to 3. Let's say that the next i+1 element is 4. Given all the existing sets we can build all the new sets by either appending the element i+1 or leaving it out. If we leave it out we get t1 subsets that sum to 2 and t2 subsets that sum to 3. If we append it then we obtain t1 subsets that sum to 6 (2 + 4) and t2 that sum to 7 (3 + 4) and one subset which contains just i+1 which sums to 4. That gives us the numbers of subsets that sum to (2,3,4,6,7) consisting of the first i+1 elements. We continue until N.
In pseudo-code this could look something like this:
int DP[N][K];
int set[N];
//go through all elements in the set by index
for i in range[0..N-1]
//count the one element subset consisting only of set[i]
DP[i][set[i]] = 1
if (i == 0) continue;
//case 1. build and count all subsets that don't contain element set[i]
for k in range[1..K-1]
DP[i][k] += DP[i-1][k]
//case 2. build and count subsets that contain element set[i]
for k in range[0..K-1]
if k + set[i] >= K then break inner loop
DP[i][k+set[i]] += DP[i-1][k]
//result is the number of all subsets - number of subsets with sum < K
//the -1 is for the empty subset
return 2^N - sum(DP[N-1][1..K-1]) - 1
Can I use dynamic programming to solve it in polynomial time?
No. The problem is even harder than #amit (in the comments) mentions. Finding if there exists a subset that sums to a specific k is the subset-sum problem, which is NP-hard. Instead you are asking for how many solutions are equal to a specific k, which is in the much more difficult class of P#. In addition, your exact problem is slightly more difficult since you want to not only count, but enumerate all the possible subsets for k and targets < k.
If k is 0, and every element of the set is positive then you have no choice but to output every possible subset, so the lower-bound to this problem is O(2N) -- the time taken to produce the output.
Unless you know something more about the value k that you haven't told us, there's no faster general solution that to just check every subset.

Finding sub-array sum in an integer array

Given an array of N positive integers. It can have n*(n+1)/2 sub-arrays including single element sub-arrays. Each sub-array has a sum S. Find S's for all sub-arrays is obviously O(n^2) as number of sub-arrays are O(n^2). Many sums S's may be repeated also. Is there any way to find count of all distinct sum (not the exact values of sums but only count) in O(n logn).
I tried an approach but stuck on the way. I iterated the array from index 1 to n.
Say a[i] is the given array. For each index i, a[i] will add to all the sums in which a[i-1] is involved and will include itself also as individual element. But duplicate will emerge if among sums in which a[i-1] is involved, the difference of two sums is a[i]. I mean that, say sums Sp and Sq end up at a[i-1] and difference of both is a[i]. Then Sp + a[i] equals Sq, giving Sq as a duplicate.
Say C[i] is count of the distinct sums in which end up at a[i].
So C[i] = C[i-1] + 1 - numbers of pairs of sums in which a[i-1] is involved whose difference is a[i].
But problem is to find the part of number of pairs in O(log n). Please give me some hint about this or if I am on wrong way and completely different approach is required problem point that out.
When S is not too large, we can count the distinct sums with one (fast) polynomial multiplication. When S is larger, N is hopefully small enough to use a quadratic algorithm.
Let x_1, x_2, ..., x_n be the array elements. Let y_0 = 0 and y_i = x_1 + x_2 + ... + x_i. Let P(z) = z^{y_0} + z^{y_1} + ... + z^{y_n}. Compute the product of polynomials P(z) * P(z^{-1}); the coefficient of z^k with k > 0 is nonzero if and only if k is a sub-array sum, so we just have to read off the number of nonzero coefficients of positive powers. The powers of z, moreover, range from -S to S, so the multiplication takes time on the order of S log S.
You can look at the sub-arrays as a kind of tree. In the sense that subarray [0,3] can be divided to [0,1] and [2,3].
So build up a tree, where nodes are defined by length of the subarray and it's staring offset in the original array, and whenever you compute a subarray, store the result in this tree.
When computing a sub-array, you can check this tree for existing pre-computed values.
Also, when dividing, parts of the array can be computed on different CPU cores, if that matters.
This solution assumes that you don't need all values at once, rather ad-hoc.
For the former, there could be some smarter solution.
Also, I assume that we're talking about counts of elements in 10000's and more. Otherwise, such work is a nice excercise but has not much of a practical value.

Generate list of real values, which sum up to fixed value and satisfy some constraints

I need to generate n random real values P[0], P[1], ..., P[n-1] which satisfy the following constraints:
Pmin[0] <= P[0] <= Pmax[0]
Pmin[1] <= P[1] <= Pmax[1]
...
Pmin[n-1] <= P[n-1] <= Pmax[n-1]
P[0] + P[1] + ... + P[n-1] = S
Any idea how to do this efficiently?
In general, it is not possible to solve this problem if choosing elements uniformly at random from the given ranges.
Example 1: Say that Pmin[i] = 0 and Pmax[i] = 1. Say that n = 10 and S = 100. Then there is no solution, since the greatest possible sum is 10.
Example 2: Say that Pmin[i] = 0 and Pmax[i] = 1. Say that n = 10 and S = 10. Then there is exactly one solution: choose P[i] = 1.
It is possible to write an algorithm such that the resulting sequence is chosen uniformly at random from the set of possible solutions; this is quite different from saying that the P[i] are uniformly distributed between Pmin[i] and Pmax[i].
The basic idea is to, at each stage, further restrict your range, as follows:
The beginning of the range ought to be the larger of the following two quantities: Pmin[i], or S - Smax[i] - P, where Smax[i] is the sum Pmax[i+1] + ... + Pmax[n] and P is the sum P[0] + ... + P[i]. This guarantees that you're picking a number large enough to eventually work.
The end of the range ought to be the smaller of the following two quantities:
Pmax[i], or S - Smin[i] - P, where Smin[i] is the sum Pmin[i+1] + ... + Pmin[n] and P is as before. This guarantees that you're picking a number small enough to eventually work.
If you are able to obey those rules when picking each P[i], there's a solution, and you will find one at random. Otherwise, there is not a solution.
Note that to actually make this select solutions at random, it's probably best to shuffle the indices, perform this algorithm, and then rearrange the sequence so that it's in the proper order. You can shuffle in O(n), do this algorithm (recommend dynamic programming here, since you can build solutions bottom-up) and then spit out the sequence by "unshuffling" the resulting sequence.
For every i, assign P[i] := Pmin[i]
Compute the sum
If sum>S, then stop (it's impossible)
For every i:
If P[i]+S-sum <= Pmax[i]
P[i] = P[i]+S-sum
Stop (it's done :-)
sum = sum+Pmax[i]-P[i]
P[i] = Pmax[i]
Go for next i
Stop (it's impossible)
Ooops, sorry, you said random... that's not so trivial. Let me think about it...
Run the previous algorithm to have a starting point. Now compute the total margin above and below. The margin above is the sum of individual margins Pmax[i]-P[i] for every i. The margin below is the sum of individual margins P[i]-Pmin[i] for every i.
Traverse all the elements but one in a random order, visiting each one of them exactly once. For every one of them:
Update the margin above and the margin below subtracting from them the contribution of the current element.
Establish a min and max for the current value taking into account that:
They must be in the interval [Pmin[i], Pmax[i]] AND
These min and max are near enough to P[i], so that changing other elements later can compensate changing P[i] to this min or max (that's what the margins above and below indicate)
Change P[i] to a random value in the calculated interval [min, max] and update the sum and the margins (I'm not 100% sure of how the margins should be updated here...)
Then adjust the remaining element to fit the sum S.
Regarding the traversal in random order, see the Knuth shuffles.

Find subset with elements that are furthest apart from eachother

I have an interview question that I can't seem to figure out. Given an array of size N, find the subset of size k such that the elements in the subset are the furthest apart from each other. In other words, maximize the minimum pairwise distance between the elements.
Example:
Array = [1,2,6,10]
k = 3
answer = [1,6,10]
The bruteforce way requires finding all subsets of size k which is exponential in runtime.
One idea I had was to take values evenly spaced from the array. What I mean by this is
Take the 1st and last element
find the difference between them (in this case 10-1) and divide that by k ((10-1)/3=3)
move 2 pointers inward from both ends, picking out elements that are +/- 3 from your previous pick. So in this case, you start from 1 and 10 and find the closest elements to 4 and 7. That would be 6.
This is based on the intuition that the elements should be as evenly spread as possible. I have no idea how to prove it works/doesn't work. If anyone knows how or has a better algorithm please do share. Thanks!
This can be solved in polynomial time using DP.
The first step is, as you mentioned, sort the list A. Let X[i,j] be the solution for selecting j elements from first i elements A.
Now, X[i+1, j+1] = max( min( X[k,j], A[i+1]-A[k] ) ) over k<=i.
I will leave initialization step and memorization of subset step for you to work on.
In your example (1,2,6,10) it works the following way:
1 2 6 10
1 - - - -
2 - 1 5 9
3 - - 1 4
4 - - - 1
The basic idea is right, I think. You should start by sorting the array, then take the first and the last elements, then determine the rest.
I cannot think of a polynomial algorithm to solve this, so I would suggest one of the two options.
One is to use a search algorithm, branch-and-bound style, since you have a nice heuristic at hand: the upper bound for any solution is the minimum size of the gap between the elements picked so far, so the first guess (evenly spaced cells, as you suggested) can give you a good baseline, which will help prune most of the branches right away. This will work fine for smaller values of k, although the worst case performance is O(N^k).
The other option is to start with the same baseline, calculate the minimum pairwise distance for it and then try to improve it. Say you have a subset with minimum distance of 10, now try to get one with 11. This can be easily done by a greedy algorithm -- pick the first item in the sorted sequence such that the distance between it and the previous item is bigger-or-equal to the distance you want. If you succeed, try increasing further, if you fail -- there is no such subset.
The latter solution can be faster when the array is large and k is relatively large as well, but the elements in the array are relatively small. If they are bound by some value M, this algorithm will take O(N*M) time, or, with a small improvement, O(N*log(M)), where N is the size of the array.
As Evgeny Kluev suggests in his answer, there is also a good upper bound on the maximum pairwise distance, which can be used in either one of these algorithms. So the complexity of the latter is actually O(N*log(M/k)).
You can do this in O(n*(log n) + n*log(M)), where M is max(A) - min(A).
The idea is to use binary search to find the maximum separation possible.
First, sort the array. Then, we just need a helper function that takes in a distance d, and greedily builds the longest subarray possible with consecutive elements separated by at least d. We can do this in O(n) time.
If the generated array has length at least k, then the maximum separation possible is >=d. Otherwise, it's strictly less than d. This means we can use binary search to find the maximum value. With some cleverness, you can shrink the 'low' and 'high' bounds of the binary search, but it's already so fast that sorting would become the bottleneck.
Python code:
def maximize_distance(nums: List[int], k: int) -> List[int]:
"""Given an array of numbers and size k, uses binary search
to find a subset of size k with maximum min-pairwise-distance"""
assert len(nums) >= k
if k == 1:
return [nums[0]]
nums.sort()
def longest_separated_array(desired_distance: int) -> List[int]:
"""Given a distance, returns a subarray of nums
of length k with pairwise differences at least that distance (if
one exists)."""
answer = [nums[0]]
for x in nums[1:]:
if x - answer[-1] >= desired_distance:
answer.append(x)
if len(answer) == k:
break
return answer
low, high = 0, (nums[-1] - nums[0])
while low < high:
mid = (low + high + 1) // 2
if len(longest_separated_array(mid)) == k:
low = mid
else:
high = mid - 1
return longest_separated_array(low)
I suppose your set is ordered. If not, my answer will be changed slightly.
Let's suppose you have an array X = (X1, X2, ..., Xn)
Energy(Xi) = min(|X(i-1) - Xi|, |X(i+1) - Xi|), 1 < i <n
j <- 1
while j < n - k do
X.Exclude(min(Energy(Xi)), 1 < i < n)
j <- j + 1
n <- n - 1
end while
$length = length($array);
sort($array); //sorts the list in ascending order
$differences = ($array << 1) - $array; //gets the difference between each value and the next largest value
sort($differences); //sorts the list in ascending order
$max = ($array[$length-1]-$array[0])/$M; //this is the theoretical max of how large the result can be
$result = array();
for ($i = 0; i < $length-1; $i++){
$count += $differences[i];
if ($length-$i == $M - 1 || $count >= $max){ //if there are either no more coins that can be taken or we have gone above or equal to the theoretical max, add a point
$result.push_back($count);
$count = 0;
$M--;
}
}
return min($result)
For the non-code people: sort the list, find the differences between each 2 sequential elements, sort that list (in ascending order), then loop through it summing up sequential values until you either pass the theoretical max or there arent enough elements remaining; then add that value to a new array and continue until you hit the end of the array. then return the minimum of the newly created array.
This is just a quick draft though. At a quick glance any operation here can be done in linear time (radix sort for the sorts).
For example, with 1, 4, 7, 100, and 200 and M=3, we get:
$differences = 3, 3, 93, 100
$max = (200-1)/3 ~ 67
then we loop:
$count = 3, 3+3=6, 6+93=99 > 67 so we push 99
$count = 100 > 67 so we push 100
min(99,100) = 99
It is a simple exercise to convert this to the set solution that I leave to the reader (P.S. after all the times reading that in a book, I've always wanted to say it :P)

Resources