How to Partition an array into N smallest sum subsets? - algorithm

How to partition an integer array into N subsets such that the sum of these subsets is minimum?
For example the array is consisted of 11 elements and I need 6 subsets out of it.
{2,1,1,3,4,4,3,2,1,2,3}
The subsets : {2,1,1,3}, {4}, {4,3}, {3,2}, {1,2}, {3} Minimum sum=7.
an alternative answer: {2,1,1} {3,4} {4} {3,2} {1,2} {3} Minimum sum=7.
Note: The order in which the numbers appear in the original set, must be maintained while partitioning.

One possible approach is to binary search for the answer.
We need a procedure that would check whether we can partition the set using only sums equal or below a parameter, S. Let's call this procedure onlySumsBelow(S).
We can use a greedy solution to implement onlySumsBelow(S). Always add as many elements as possible in each subset, and stop just before reaching a sum larger than S (I am assuming here that we don't have negative elements, which may complicate the discussion). If we cannot reach the end of the sequence using the allowed number of subsets, then the sum is not valid (it is too small).
function onlySumsBelow(S) {
partitionsUsed = 1;
currentSum = 0;
for each value in sequence {
if (value > S) return false;
if (currentSum + value > S) {
// start a new partition
currentSum = value;
partitionsUsed++;
} else {
currentSum += value;
}
}
return partitionsUsed <= N;
}
Once we have the onlySumsBelow(S) procedure, we can binary search for the answer, starting with an interval that at the left end has a value that ensures that the searched answer is not below (e.g. 0) and at the right end has a large enough number that ensures that the searched answer is not above (e.g. the sum of all numbers in the sequence).
If the efficiency is not a concern, instead of binary searching you can simply try multiple candidate answers one by one, starting from a small enough value, e.g. sum of all numbers divided by N, and then increasing by one until reaching a fine solution.
Remark: without the note in the end of the question (that restricts us to taking into account subsets of numbers that appear at neighboring positions in the input) the problem is NP-complete, since it is a generalization of the Partition problem, which only uses two sets.

Related

Algorithm: Count the minimum number of distinct sorted subsequence for each update query

This question is asked to me in an interview:
Distinct sorted subsquence containing adjacent values is defined as either its length is one or it only contains adjacent numbers when sorted. Each element can belong to only 1 such subsequence. I have Q queries, each updating a single value in A and I have to answer for each query, how many parts would be in the partition of A into distinct sorted subsequences if the number of parts was minimized.
For example, the number of parts for A = [1,2,3,4,3,5] can be minimized by partitioning it in the following two ways, both of which contain only two parts:
1) [1,2,3] && [4,3,5] ==> answer 2 (4,3,5 sorted is 3,4,5 all adjacent values)
2) [1,2,3,4,5] && [3] ==> answer 2
Approach I tried: Hashing and forming sets but all test cases were not cleared because of Timeout.
Problem Statment PDF : PDF
Constraints:
1<= N,Q <= 3*10^5
1< A(i)< 10^9
Preprocessing
First you can preprocess A before all queries and generate a table (say times_of) such that when given a number n, one can efficiently obtain the number of times n appears in A through expression like times_of[n]. In the following example assuming A is of type int[N], we use an std::map to implement the table. Its construction costs O(NlogN) time.
auto preprocess(int *begin, int *end)
{
std::map<int, std::size_t> times_of;
while (begin != end)
{
++times_of[*begin];
++begin;
}
return times_of;
}
Let min and max be the minimum and maximum elements of A respectively. Then the following lemma applies:
The minimum number of distinct sorted subsequences is equal to max{0, times_of[min] - times_of[min-1]} + ... + max{0, times_of[max] -
times_of[max-1]}.
A rigorous proof is a bit technical, so I omit it from this answer. Roughly speaking, consider numbers from small to large. If n appears more than n-1, it has to bring extra times_of[n]-times_of[n-1] subsequences.
With this lemma, we can compute initially the minimum number of distinct sorted subsequences result in O(N) time (by iterating through times_of, not by iterating from min to max). The following is a sample code:
std::size_t result = 0;
auto prev = std::make_pair(min - 1, static_cast<std::size_t>(0));
for (auto &cur : times_of)
{
// times_of[cur.first-1] == 0
if (cur.first != prev.first + 1) result += cur.second;
// times_of[cur.first-1] == times_of[prev.first]
else if (cur.second > prev.second) result += cur.second - prev.second;
prev = cur;
}
Queries
To deal with a query A[u] = v, we first update times_of[A[u]] and times_of[v] which costs O(logN) time. Then according to the lemma, we need only to recompute constant (i.e. 4) related terms to update result. Each recomputation costs O(logN) time (to find the previous or next element in times_of), so a query takes O(logN) time in total.
Keep a list of clusters on the first pass. Each has a collection of values, with a minimum and maximum value. These clusters could very well be stored in a segment tree (making it easy to merge in case they ever touch).
Loop through your N numbers, and for each number, either add it to an existing cluster (possibly triggering a merge), or create a new cluster. This may be easier if your clusters store min-1 and max+1.
Now you are done with the initial input of N numbers, and you have several clusters, all of which are likely to be of a reasonable size for radix sort.
However, you do not want to finish the radix sort. Generate the list of counts, then use this to count adjacent clusters. Loop through this, and every time the count decreases, you have found (difference) many of your final distinct sorted subsequences. (Using max+1 pays off again, because the guaranteed zero at the end means you don't have to add a special case after the loop.)

Given a set of n integers, list all possible subsets with sum>=k

Given an unsorted set of integers in the form of array, find all possible subsets whose sum is greater than or equal to a const integer k,
eg:- Our set is {1,2,3} and k=2
Possible subsets:-
{2},
{3},
{1,2},
{1,3},
{2,3},
{1,2,3}
I can only think of a naive algorithm which lists all the subsets of set and checks if sum of subset is >=k or not, but its an exponential algorithm and listing all subsets requires O(2^N). Can I use dynamic programming to solve it in polynomial time?
Listing all the subsets is going to be still O(2^N) because in the worst case you may still have to list all subsets apart from the empty one.
Dynamic programming can help you count the number of sets that have sum >= K
You go bottom-up keeping track of how many subsets summed to some value from range [1..K]. An approach like this will be O(N*K) which is going to be only feasible for small K.
The idea with the dynamic programming solution is best illustrated with an example. Consider this situation. Assume you know that out of all the sets composed of the first i elements you know that t1 sum to 2 and t2 sum to 3. Let's say that the next i+1 element is 4. Given all the existing sets we can build all the new sets by either appending the element i+1 or leaving it out. If we leave it out we get t1 subsets that sum to 2 and t2 subsets that sum to 3. If we append it then we obtain t1 subsets that sum to 6 (2 + 4) and t2 that sum to 7 (3 + 4) and one subset which contains just i+1 which sums to 4. That gives us the numbers of subsets that sum to (2,3,4,6,7) consisting of the first i+1 elements. We continue until N.
In pseudo-code this could look something like this:
int DP[N][K];
int set[N];
//go through all elements in the set by index
for i in range[0..N-1]
//count the one element subset consisting only of set[i]
DP[i][set[i]] = 1
if (i == 0) continue;
//case 1. build and count all subsets that don't contain element set[i]
for k in range[1..K-1]
DP[i][k] += DP[i-1][k]
//case 2. build and count subsets that contain element set[i]
for k in range[0..K-1]
if k + set[i] >= K then break inner loop
DP[i][k+set[i]] += DP[i-1][k]
//result is the number of all subsets - number of subsets with sum < K
//the -1 is for the empty subset
return 2^N - sum(DP[N-1][1..K-1]) - 1
Can I use dynamic programming to solve it in polynomial time?
No. The problem is even harder than #amit (in the comments) mentions. Finding if there exists a subset that sums to a specific k is the subset-sum problem, which is NP-hard. Instead you are asking for how many solutions are equal to a specific k, which is in the much more difficult class of P#. In addition, your exact problem is slightly more difficult since you want to not only count, but enumerate all the possible subsets for k and targets < k.
If k is 0, and every element of the set is positive then you have no choice but to output every possible subset, so the lower-bound to this problem is O(2N) -- the time taken to produce the output.
Unless you know something more about the value k that you haven't told us, there's no faster general solution that to just check every subset.

Algorithm on interview

Recently I was asked the following interview question:
You have two sets of numbers of the same length N, for example A = [3, 5, 9] and B = [7, 5, 1]. Next, for each position i in range 0..N-1, you can pick either number A[i] or B[i], so at the end you will have another array C of length N which consists in elements from A and B. If sum of all elements in C is less than or equal to K, then such array is good. Please write an algorithm to figure out the total number of good arrays by given arrays A, B and number K.
The only solution I've come up is Dynamic Programming approach, when we have a matrix of size NxK and M[i][j] represents how many combinations could we have for number X[i] if current sum is equal to j. But looks like they expected me to come up with a formula. Could you please help me with that? At least what direction should I look for? Will appreciate any help. Thanks.
After some consideration, I believe this is an NP-complete problem. Consider:
A = [0, 0, 0, ..., 0]
B = [b1, b2, b3, ..., bn]
Note that every construction of the third set C = ( A[i] or B[i] for i = 0..n ) is is just the union of some subset of A and some subset of B. In this case, since every subset of A sums to 0, the sum of C is the same as the sum of some subset of B.
Now your question "How many ways can we construct C with a sum less than K?" can be restated as "How many subsets of B sum to less than K?". Solving this problem for K = 1 and K = 0 yields the solution to the subset sum problem for B (the difference between the two solutions is the number of subsets that sum to 0).
By similar argument, even in the general case where A contains nonzero elements, we can construct an array S = [b1-a1, b2-a2, b3-a3, ..., bn-an], and the question becomes "How many subsets of S sum to less than K - sum(A)?"
Since the subset sum problem is NP-complete, this problem must be also. So with that in mind, I would venture that the dynamic programming solution you proposed is the best you can do, and certainly no magic formula exists.
" Please write an algorithm to figure out the total number of good
arrays by given arrays A, B and number K."
Is it not the goal?
int A[];
int B[];
int N;
int K;
int Solutions = 0;
void FindSolutons(int Depth, int theSumSoFar) {
if (theSumSoFar > K) return;
if (Depth >= N) {
Solutions++;
return;
}
FindSolutions(Depth+1,theSumSoFar+A[Depth]);
FindSolutions(Depth+1,theSumSoFar+B[Depth]);
}
Invoke FindSolutions with both arguments set to zero. On return the Solutions will be equal to the number of good arrays;
this is how i would try to solve the problem
(Sorry if its stupid)
think of arrays
A=[3,5,9,8,2]
B=[7,5,1,8,2]
if
elements
0..N-1
number of choices
2^N
C1=0,C2=0
for all A[i]=B[i]
{
C1++
C2+=A[i]+B[i]
}
then create new two arrays like
A1=[3,5,9]
B1=[7,5,1]
also now C2 is 10
now number of all choices are reduced to 2^(N-C1)
now calculate all good numbers
using 'K' as K=K-C2
unfortunately
no matter what method you use, you have
to calculate sum 2^(N-C1) times
So there's 2^N choices, since at each point you either pick from A or from B. In the specific example you give where N happens to be 3 there are 8. For discussion you can characterise each set of decisions as a bit pattern.
So as a brute-force approach would try every single bit pattern.
But what should be obvious is that if the first few bits produce a number too large then every subsequent possible group of tail bits will also produce a number that is too large. So probably a better way to model it is a tree where you don't bother walking down the limbs that have already grown beyond your limit.
You can also compute the maximum totals that can be reached from each bit to the end of the table. If at any point your running total plus the maximum that you can obtain from here on down is less than K then every subtree from where you are is acceptable without any need for traversal. The case, as discussed in the comments, where every single combination is acceptable is a special case of this observation.
As pointed out by Serge below, a related observation is to us minimums and use the converse logic to cancel whole subtrees without traversal.
A potential further optimisation rests behind the observation that, as long as we shuffle each in the same way, changing the order of A and B has no effect because addition is commutative. You can therefore make an effort to ensure either that the maximums grow as quickly as possible or the minimums grow as slowly as possible, to try to get the earliest possible exit from traversal. In practice you'd probably want to apply a heuristic comparing the absolute maximum and minimum (both of which you've computed anyway) to K.
That being the case, a recursive implementation is easiest, e.g. (in C)
/* assume A, B and N are known globals */
unsigned int numberOfGoodArraysFromBit(
unsigned int bit,
unsigned int runningTotal,
unsigned int limit)
{
// have we ended up in an unacceptable subtree?
if(runningTotal > limit) return 0;
// have we reached the leaf node without at any
// point finding this subtree to be unacceptable?
if(bit >= N) return 1;
// maybe every subtree is acceptable?
if(runningTotal + MAXV[bit] <= limit)
{
return 1 << (N - bit);
}
// maybe no subtrees are acceptable?
if(runningTotal + MINV[bit] > limit)
{
return 0;
}
// if we can't prima facie judge the subtreees,
// we'll need specifically to evaluate them
return
numberOfGoodArraysFromBit(bit+1, runningTotal+A[bit], limit) +
numberOfGoodArraysFromBit(bit+1, runningTotal+B[bit], limit);
}
// work out the minimum and maximum values at each position
for(int i = 0; i < N; i++)
{
MAXV[i] = MAX(A[i], B[i]);
MINV[i] = MIN(A[i], B[i]);
}
// hence work out the cumulative totals from right to left
for(int i = N-2; i >= 0; i--)
{
MAXV[i] += MAXV[i+1];
MINV[i] += MINV[i+1];
}
// to kick it off
printf("Total valid combinations is %u", numberOfGoodArraysFromBit(0, 0, K));
I'm just thinking extemporaneously; it's likely better solutions exist.

Generating permutations with sub-linear memory

I am wondering if there is a sufficiently simple algorithm for generating permutations of N elements, say 1..N, which uses less than O(N) memory. It does not have to be compute n-th permutation, but it must be able to compute all permutations.
Of course, this algorithm should be a generator of some kind, or use some internal data structure which uses less than O(N) memory, since return the result as a vector of size N already violates the restriction on sub-linear memory.
Let's assume that the random permutation is being generated one entry at a time. The state of the generator must encode the set of elements remaining (run it to completion) and so, since no possibility can be excluded, the generator state is at least n bits.
Maybe you can, with factoradic numbers. You can extract the resulting permutation from it step by step, so you never have to have the entire result in memory.
But the reason I started with maybe, is that I'm not sure what the growing behaviour of the size of the factoradic number itself is. If it fits in an 32bit integer or something like that, N would be limited to a constant so O(N) would equal O(1), so we have to use an array for it, but I'm unsure how big it will be in terms of N.
I think the answer has to be "no".
Consider the generator for N-element permutations as a state machine: it must contain at a least as many states as there are permutations, else it will start repeating before it finishes generating all of them.
There are N! such permutations, which will require at least ceil(log2(N!)) bits to represent. Stirling's approximation tells us log2(N!) is O(N log N), so we will be unable to create such a generator with sub-linear memory.
The C++ algorithm next_permutation performs an in-place rearrangement of a sequence into its next permutation, or returns false when no further permutations exist. The algorithm is as follows:
template <class BidirectionalIterator>
bool next_permutation(BidirectionalIterator first, BidirectionalIterator last) {
if (first == last) return false; // False for empty ranges.
BidirectionalIterator i = first;
++i;
if (i == last) return false; // False for single-element ranges.
i = last;
--i;
for(;;) {
BidirectionalIterator ii = i--;
// Find an element *n < *(n + 1).
if (*i <*ii) {
BidirectionalIterator j = last;
// Find the last *m >= *n.
while (!(*i < *--j)) {}
// Swap *n and *m, and reverse from m to the end.
iter_swap(i, j);
reverse(ii, last);
return true;
}
// No n was found.
if (i == first) {
// Reverse the sequence to its original order.
reverse(first, last);
return false;
}
}
}
This uses constant space (the iterators) for each permutation generated. Do you consider that linear?
I think that to even store your result (which will be an ordered list of N items) will be O(N) in memory, no?
Anyhow, to answer your later question about picking a permutation at random, here's a technique that will be better than just producing all N! possibilities in a list, say, and then picking an index randomly. If we can just pick the index randomly and generate the associated permutation from it, we're much better off.
We can imagine the dictionary order on your words/permutations, and associate a unique number to these based on the word's/permutation's order of appearance in the dictionary. E.g., words on three characters would be
perm. index
012 <----> 0
021 <----> 1
102 <----> 2
120 <----> 3
201 <----> 4
210 <----> 5
You'll see later why it was easiest to use the numbers we did, but others could be accommodated with a bit more work.
To choose one at random, you could pick its associated index randomly from the range 0 ... N!-1 with a uniform probability (the simplest implementation of this is clearly out of the question for even moderately large N, I know, but I think there are decent workarounds) and then determine its associated permutation. Notice that the list begins with all the permutations of the last N-1 elements, keeping the first digit fixed equal to 0. After those possibilities are exhausted, we generate all those that start with 1. After these next (N-1)! permutations are exhausted, we generate those that start with a 2. Etc. Thus we can determine the leading digit is Floor[R / (N-1)!], where R was the index in the sense shown above. See now why we zero indexed, too?
To generate the remaining N-1 digits in the permutation, let's say that we determined Floor[R/(N-1)!] = a0. Start with the list {0, ..., N-1} - {a0} (set subtraction). We want the Qth permutation of this list, for Q = R mod (N-1)!. Except for accounting for the fact that there's a missing digit, this is just the same as the problem we've just solved. Recurse.

Algorithm for finding all combinations of k elements from stack of n "randomly"

I have a problem resembling the one described here:
Algorithm to return all combinations of k elements from n
I am looking for something similar that covers all possible combinations of k from n. However, I need a subset to vary a lot from the one drawn previously. For example, if I were to draw a subset of 3 elements from a set of 8, the following algorithm wouldn't be useful to me since every subset is very similar to the one previously drawn:
11100000,
11010000,
10110000,
01110000,
...
I am looking for an algorithm thats picks the subsets in a more "random" looking fashion, ie. where the majority of elements in one subset is not reused in the next:
11100000,
00010011,
00101100,
...
Does anyone know of such an algorithm?
I hope my question made sence and that someone can help me out =)
Kind regards,
Christian
How about first generating all possible combinations of k from n, and then rearranging them with help of a random function.
If you have the result in a vector, loop through the vector: for each element let it change place with the element at a random position.
This of course becomes slow for large k and n.
This is not really random, but depending on your needs it might suit you.
Calculate the number of possible combinations. Let's name them N.
Calculate a large number which is coprime to N. Let's name it P.
Order the combinations and give them numbers from 1 to N. Let's name them C1 to CN
Iterate for output combinations. The first one will be VP mod N, the second one will be C2*P mod N, the third one C3*P mod N, etc. In essence, Outputi = Ci*P mod N. Mod is meant as the modulus operator.
If P is picked carefully, you will get seemingly random combinations. Values close to 1 or to N will produce values that differ little. Better pick values close to, say N/4 or N/5. You can also randomize the generation of P for every iteration that you need.
As a follow-up to my comment on this answer, here is some code that allows one to determine the composition of a subset from its "index", in colex order.
Shamelessly stolen from my own assignments.
//////////////////////////////////////
// NChooseK
//
// computes n!
// --------
// k!(n-k)!
//
// using Pascal's identity
// i.e. (n,k) = (n-1,k-1) + (n-1,k)
//
// easily optimizable by memoization
long long NChooseK(int n, int k)
{
if(k >= 0 && k <= n && n >= 1)
{
if( k > n / 2)
k = n - k;
if(k == 0 || n == 0)
return 1;
else
return NChooseK(n-1, k-1) + NChooseK(n-1, k);
}
else
return 0;
}
///////////////////////////////////////////////////////////////////////
// SubsetColexUnrank
// The unranking works by finding each element
// in turn, beginning with the biggest, leftmost one.
// We just have to find, for each element, how many subsets there are
// before the one beginning with the elements we have already found.
//
// It stores its results (indices of the elements present in the subset) into T, in ascending order.
void SubsetColexUnrank(long long r, int * T, int subsetSize)
{
assert( subsetSize >= 1 );
// For each element in the k-subset to be found
for(int i = subsetSize; i >= 1; i--)
{
// T[i] cannot be less than i
int x = i;
// Find the smallest element such that, of all the k-subsets that contain it,
// none has a rank that exceeds r.
while( NChooseK(x, i) <= r )
x++;
// update T with the newly found element
T[i] = x;
// if the subset we have to find is not the first one containing this element
if(r > 0)
{
// finding the next element of our k-subset
// is like finding the first one of the same subset
// divided by {T[i]}
r -= NChooseK(x - 1, i);
}
}
}
Random-in, random-out.
The colex order is such that its unranking function does not need the size of the set from which to pick the elements to work; the number of elements is assumed to be NChooseK(size of the set, size of the subset).
How about randomly choosing the k elements. ie choose the pth where p is random between 1 and n, then reorder what's left and choose the qth where q is between 1 and n-1 etc?
or maybe i misunderstood. do you still want all possibilities? in that case you can always generate them first and then choose random entries from your list
By "random looking" I think you mean lexicographically distant.. does this apply to combination i vs. i-1, or i vs. all previous combinations?
If so, here are some suggestions:
since most of the combinators yield ordered output, there are two options:
design or find a generator which somehow yields non-ordered output
enumerate and store enough/all combinations in a tie'd array file/db
if you decide to go with door #2, then you can just access randomly ordered combinations by random integers between 1 and the # of combinations
Just as a final check, compare the current and previous combination using a measure of difference/distance between combinations, e.g. for an unsigned Bit::Vector in Perl:
$vec1->Lexicompare($vec2) >= $MIN_LEX_DIST
You might take another look behind door #1, since even for moderate values of n and k you can get a big array:
EDIT:
Just saw your comment to AnnK... maybe the lexicompare might still help you skip similar combinations?
Depending on what you are trying to do, you could do something like playing cards. Keep two lists: Source is your source (unused) list; and Used the second is the "already-picked" list. As you randomly pick k items from Source, you move them to your Used list.
If there are k items left in Source when you need to pick again, you pick them all and swap the lists. If there are fewer than k items, you pick j items from Used and add them to Source to make k items in Source, then pick them all and swap the lists.
This is kind of like picking k cards from a deck. You discard them to the used pile. Once you reach the end or need more cards, you shuffle the old ones back into play.
This is just to make sure each set is definitely different from the previous subsets.
Also, this will not really guarantee that all possible subsets are picked before old ones start being repeated.
The good is that you don't need to worry about pre-calculating all the subsets, and your memory requirements are linear with your data (2 n-sized lists).

Resources