Subsets with equal sum

Subsets with equal sum - algorithm

I want to calculate how many pairs of disjoint subsets S1 and S2 (S1 U S2 may not be S) of a set S exists for which sum of elements in S1 = sum of elements in S2.
Say i have calculated all the subset sums for all the possible 2^n subsets.
How do i find how many disjoint subsets have equal sum.
For a sum value A, can we use the count of subsets having sum A/2 to solve this ?
As an example :
S ={1,2,3,4}
Various S1 and S2 sets possible are:
S1 = {1,2} and S2 = {3}
S1 = {1,3} and S2 = {4}
S1 = {1,4} nd S2 = {2,3}
Here is the link to the problem :
http://www.usaco.org/index.php?page=viewproblem2&cpid=139

[EDIT: Fixed stupid complexity mistakes. Thanks kash!]
Actually I believe you'll need to use the O(3^n) algorithm described here to answer this question -- the O(2^n) partitioning algorithm is only good enough to enumerate all pairs of disjoint subsets whose union is the entire ground set.
As described at the answer I linked to, for each element you are essentially deciding whether to:
Put it in the first set,
Put it in the second set, or
Ignore it.
Considering every possible way to do this generates a tree where each vertex has 3 children: hence O(3^n) time. One thing to note is that if you generate a solution (S1, S2) then you should not also count the solution (S2, S1): this can be achieved by always maintaining an asymmetry between the two sets as you build them up, e.g. enforcing that the smallest element in S1 must always be smaller than the smallest element in S2. (This asymmetry enforcement has the nice side-effect of halving the execution time :))
A speedup for a special (but perhaps common in practice) case
If you expect that there will be many small numbers in the set, there is another possible speedup available to you: First, sort all the numbers in the list in increasing order. Choose some maximum value m, the larger the better, but small enough that you can afford an m-size array of integers. We will now break the list of numbers into 2 parts that we will process separately: an initial list of numbers that sum to at most m (this list may be quite small), and the rest. Suppose the first k <= n numbers fit into the first list, and call this first list Sk. The rest of the original list we will call S'.
First, initialise a size-m array d[] of integers to all 0, and solve the problem for Sk as usual -- but instead of only recording the number of disjoint subsets having equal sums, increment d[abs(|Sk1| - |Sk2|)] for every pair of disjoint subsets Sk1 and Sk2 formed from these first k numbers. (Also increment d[0] to count the case when Sk1 = Sk2 = {}.) The idea is that after this first phase has finished, d[i] will record the number of ways that 2 disjoint subsets having a difference of i can be generated from the first k elements of S.
Second, process the remainder (S') as usual -- but instead of only recording the number of disjoint subsets having equal sums, whenever |S1'| - |S2'| <= m, add d[abs(|S1'| - |S2'|)] to the total number of solutions. This is because we know that there are that many ways of building a pair of disjoint subsets from the first k elements having this difference -- and for each of these subset pairs (Sk1, Sk2), we can add the smaller of Sk1 or Sk2 to the larger of S1' or S2', and the other one to the other one, to wind up with a pair of disjoint subsets having equal sum.

Here is a clojure solution.
It defines s to be a set of 1, 2, 3, 4
Then all-subsets is defined to be a list of all sets of size 1 - 3
Once all the subsets are defined, it looks at all pairs of subsets and selects only the pairs that are not equal, do not union to the original set, and whose sum is equal
(require 'clojure.set)
(use 'clojure.math.combinatorics)
(def s #{1, 2, 3, 4})
(def subsets (mapcat #(combinations s %) (take 3 (iterate inc 1))))
(for [x all-subsets y all-subsets
:when (and (= (reduce + x) (reduce + y))
(not= s (clojure.set/union (set x) (set y)))
(not= x y))]
[x y])
Produces the following:
([(3) (1 2)] [(4) (1 3)] [(1 2) (3)] [(1 3) (4)])

Related

Algorithm design manual (Steven Skiena) regarding random samples on Ordered pairs Page 250

given the following explanation
Problem: We need an efficient and unbiased way to generate random
pairs of vertices to perform random vertex swaps. Propose an efficient
algorithm to generate elements from the (n 2) unordered pairs on {1,
. . . , n} uniformly at random.
Solution: Uniformly generating random structures is a surprisingly
subtle problem. Consider the following procedure to generate random
unordered pairs: i = random int(1,n-1); j = random int(i+1,n);
It is clear that this indeed generates unordered pairs, since i < j.
Further, it is clear that all (n 2) unordered pairs can indeed be
generated, assuming that random int generates integers uniformly
between its two arguments.
But are they uniform? The answer is no. What is the probability that
pair (1,2) is generated? There is a 1/(n−1) chance of getting the 1,
and then a 1/(n−1) chance of getting the 2, which yields p(1,2) = 1/(n
− 1)2. But what is the probability of getting (n − 1,n)? Again, there
is a 1/n chance of getting the first number, but now there is only one
possible choice for the second candidate! This pair will occur n times
more often than the first! The problem is that fewer pairs start with
big numbers than little numbers. We could solve this problem by
calculating exactly how unordered pairs start with i (exactly (n − i))
and appropriately bias the probability. The second value could then be
selected uniformly at random from i + 1 to n.
But instead of working through the math, let’s exploit the fact that
randomly generating the n2 ordered pairs uniformly is easy. Just pick
two integers independently of each other. Ignoring the ordering (i.e.
, permuting the ordered pair to unordered pair (x,y) so that x < y)
gives us a 2/n^2 probability of generating each unordered pair of
distinct elements. If we happen to generate a pair (x,x), we discard
it and try again. We will get unordered pairs uniformly at random in
constant expected time using the following algorithm:
in the above paragraph "The problem is that fewer pairs start with
big numbers than little numbers." shouldn't this be more pairs instead of fewer pairs
in the above paragraph "We could solve this problem by calculating exactly how unordered pairs start with i (exactly (n − i))" shouldn't this me how many unordered pairs rather than how unordered pairs
EDIT
in the above paragraph "Ignoring the ordering (i.e.
, permuting the ordered pair to unordered pair (x,y) so that x < y)
gives us a 2/n^2 probability of generating each unordered pair of
distinct elements." how is the probability 2/n^2 derived ?
Thanks

in the above paragraph "The problem is that fewer pairs start with big numbers than little numbers." shouldn't this be more pairs instead of fewer pairs
No, it's fewer.:
n - 1 pairs start with 1 (1 2; 1 3; ...; 1 n)
n - 2 pairs start with 2 (2 3; 2 4; ...; 2 n)
n - 3 pairs start with 3
...
in the above paragraph "We could solve this problem by calculating exactly how unordered pairs start with i (exactly (n − i))" shouldn't this me how many unordered pairs rather than how unordered pairs
Yes, there is a missing "many" there.
in the above paragraph "Ignoring the ordering (i.e. , permuting the ordered pair to unordered pair (x,y) so that x < y) gives us a 2/n^2 probability of generating each unordered pair of distinct elements." how is the probability 2/n^2 derived ?
There are n*n possibilities of generating pairs where order does matter (1 2 and 2 1 are different pairs). Since you then proceed to ignore the ordering, both 1 2 and 2 1 will be the same, so you have two favorable cases.
This does not account for the fact that you discard x x pairs though. Then it would be 2 / (n*(n - 1)), because if you pick x once, you only have n - 1 possibilities for the second pick.

Assuming the indices for your n items are 0..(n-1), and random(n) gives a random number ≥ 0 and < n :
i = random(n)
j = random(n-1)
j = (i+j+1) % n
Now every pair (i,j) with i≠j has exactly probability 1/(n(n-1)). Obviously, swapping (i,j) has the same result as swapping (j,i).
You could also do:
i = random(n)
j = random(n)
And ignore the fact that this may result in (i,i) pairs (swapping them will have no effect).

Divide linked-list into 2 sublists with equal sum

I'm trying to divide a linked-list into 2 sublists with equal sum. These sublists do not need to consist of consecutive elements.
I have a linked list as
Eg.1
LinkedList={1,7,5,5,4}
should be divided into
LinkedList1={1,5,5}
LinkedList2={7,4}
Both have the same sum of elements as 11.
Eg.2
LinkedList={42,2,3,2,2,2,5,20,2,20}
This should be divided into two list of equal sum i.e 50.
LinkedList1={42,3,5}
LinkedList2={2,2,2,2,20,2,20}
Can someone provide some pseudocode to solve this problem?
This is what I've thought so far:
Sum the elements of linked list and divide by 2.
Now till the sum of your linkedlist1 is less than the sum of linkedlist/2 keep pushing elements into linkedlist1.
If not equal and less than linkedlist sum/2 move to the next element and the current element can be pushed to the linkedlist2.
But this would only work if the elements are in a particular order.

This is known as the partition problem.
There are a few approaches to solving the problem, but I'll just mention the most common 2 below (see Wikipedia for more details on either approach or other approaches).
This can be solved with a dynamic programming approach, which basically comes down to, for each element and value, either including or excluding that element, and looking up whether there's a subset summing to the corresponding value. More specifically, we have the following recurrence relation:
p(i, j) is True if a subset of { x1, ..., xj } sums to i and False otherwise.
p(i, j) is True if either p(i, j − 1) is True or if p(i − xj, j − 1) is True
p(i, j) is False otherwise
Then p(N/2, n) tells us whether a subset exists.
The running time is O(Nn) where n is the number of elements in the input set and N is the sum of elements in the input set.
The "approximate" greedy approach (doesn't necessarily find an equal-sum partition) is pretty straight-forward - it just involves putting each element in the set with the smallest sum. Here's the pseudo-code:
INPUT: A list of integers S
OUTPUT: An attempt at a partition of S into two sets of equal sum
1 function find_partition( S ):
2 A ← {}
3 B ← {}
4 sort S in descending order
5 for i in S:
6 if sum(A) <= sum(B)
7 add element i to set A
8 else
9 add element i to set B
10 return {A, B}
The running time is O(n log n).

Given a set of n integers, list all possible subsets with sum>=k

Given an unsorted set of integers in the form of array, find all possible subsets whose sum is greater than or equal to a const integer k,
eg:- Our set is {1,2,3} and k=2
Possible subsets:-
{2},
{3},
{1,2},
{1,3},
{2,3},
{1,2,3}
I can only think of a naive algorithm which lists all the subsets of set and checks if sum of subset is >=k or not, but its an exponential algorithm and listing all subsets requires O(2^N). Can I use dynamic programming to solve it in polynomial time?

Listing all the subsets is going to be still O(2^N) because in the worst case you may still have to list all subsets apart from the empty one.
Dynamic programming can help you count the number of sets that have sum >= K
You go bottom-up keeping track of how many subsets summed to some value from range [1..K]. An approach like this will be O(N*K) which is going to be only feasible for small K.
The idea with the dynamic programming solution is best illustrated with an example. Consider this situation. Assume you know that out of all the sets composed of the first i elements you know that t1 sum to 2 and t2 sum to 3. Let's say that the next i+1 element is 4. Given all the existing sets we can build all the new sets by either appending the element i+1 or leaving it out. If we leave it out we get t1 subsets that sum to 2 and t2 subsets that sum to 3. If we append it then we obtain t1 subsets that sum to 6 (2 + 4) and t2 that sum to 7 (3 + 4) and one subset which contains just i+1 which sums to 4. That gives us the numbers of subsets that sum to (2,3,4,6,7) consisting of the first i+1 elements. We continue until N.
In pseudo-code this could look something like this:
int DP[N][K];
int set[N];
//go through all elements in the set by index
for i in range[0..N-1]
//count the one element subset consisting only of set[i]
DP[i][set[i]] = 1
if (i == 0) continue;
//case 1. build and count all subsets that don't contain element set[i]
for k in range[1..K-1]
DP[i][k] += DP[i-1][k]
//case 2. build and count subsets that contain element set[i]
for k in range[0..K-1]
if k + set[i] >= K then break inner loop
DP[i][k+set[i]] += DP[i-1][k]
//result is the number of all subsets - number of subsets with sum < K
//the -1 is for the empty subset
return 2^N - sum(DP[N-1][1..K-1]) - 1

Can I use dynamic programming to solve it in polynomial time?
No. The problem is even harder than #amit (in the comments) mentions. Finding if there exists a subset that sums to a specific k is the subset-sum problem, which is NP-hard. Instead you are asking for how many solutions are equal to a specific k, which is in the much more difficult class of P#. In addition, your exact problem is slightly more difficult since you want to not only count, but enumerate all the possible subsets for k and targets < k.

If k is 0, and every element of the set is positive then you have no choice but to output every possible subset, so the lower-bound to this problem is O(2N) -- the time taken to produce the output.
Unless you know something more about the value k that you haven't told us, there's no faster general solution that to just check every subset.

Generate a sequence of numbers (powers) in order

I'm looking for an algorithm (or better yet, code!) for a the generation of powers, specifically numbers with an odd exponent greater than 1: third powers, fifth powers, seventh powers, and so forth. My desired output is then
8, 27, 32, 125, 128, 216, 243, 343, 512, 1000
and so forth up to a specified limit.
I don't want to store the powers in a list and sort them, because I'm making too many to fit in memory -- hopefully the limit be 1030 or so, corresponding to a memory requirement of ≈ 1 TB.
My basic idea is to have an array holding the current number (starting at 2) for each exponent, starting with 3 and going up to the binary log of the limit. At each step I loop through the exponent array, finding the one which yields the smallest power (finding either pow(base, exponent) or more likely exponent * log(base), probably memoizing these values). At that point call the 'output' function, which will actually do calculations with the number but of course you don't need to worry about that.
Of course because of the range of the numbers involved, bignums must be used -- built into the language, in a library, or self-rolled. Relevant code or code snippets would be appreciated: I feel that this task is similar to some classic problems (e.g., Hamming's problem of generating numbers that are of the form 2x3y5z) and can be solved efficiently. I'm fairly language-agnostic here: all I'll need for my 'output' function are arrays, subtraction, bignum-word comparison, and a bignum integer square root function.

Your example is missing 64=4^3, and 729=9^3.
You want the set of all { n^m } traversed in numerical order, m odd, n integral and n > 1. We know that (for n > 1) that increasing either n or m will increase this value, but short of calculation we can't compare much else.
There are two obvious "dual" ways to do this: keep track of the highest base n you consider, and for all bases less than that, the next exponent m to consider. Then pick the smallest one, and compare it to n^3. Or, the other way around -- keep track of the highest exponent m, and for each exponent smaller than that, keep track of the highest base used, and find the smallest one, and compare it to adding 2^m.
To make keeping track of these numbers efficiently, you'll want to keep them in a priority queue. Now, you still want to minimize the number of entries in the priority queue at a time, so we'll want to figure out which of these two methods does better job of this. It turns out that much higher n values are required to make it to a given point. At number k, the largest value of m seen will be log_2 of k, whereas the largest value of n seen will be k^(1/3).
So, we have a priority queue with elements (v, n, m), where the value v=n^m.
add_priority_queue(2^3, 2, 3)
for m in 5, 7, ....
v = 2^m
while value(peek(queue)) <= v:
(v1, n1, m1) = pop(queue)
if v1 != v print v1
add_priority_queue((n1+1)^m1, n1+1, m1)
add_priority_queue(2^m, 2, m)
Note that we need to check for v1 = v: we can have 2^9 = 512 = 8^3, and only one should be printed out, right?
A Haskell implementation, with a random priority queue grabbed off of hackage.
import Data.MeldableHeap
dropMin q = maybe empty snd (extractMin q)
numbers = generate_values (insert (2^3, 2, 3) empty) 5
generate_values q m = case findMin q of
Nothing -> []
Just (v1, n1, m1) -> case compare v1 (2^m) of
EQ -> generate_values (insert ((n1+1)^m1, n1+1, m1) (dropMin q)) m
LT -> v1 : generate_values (insert ((n1+1)^m1, n1+1, m1) (dropMin q)) m
GT -> 2^m : generate_values (insert (3^m, 3, m) q) (m + 2)
main = sequence_ (map print numbers)
I have a run currently at 177403008736354688547625 (that's 23 digits) and 1.3 GB plaintext output, after 8 minutes

deque numbers // stores a list of tuples - base number, and current odd power value - sorted by the current odd power value
for i = 2 .. infinity
numbers.push_back (i,i^3) // has to be the highest possible number so far
while numbers.peek_front[1] == i // front always has the lowest next value
print i
node = numbers.pop_front
node[1]=node[1]*(node[0]^2)
// put the node back into the numbers deque sorted by the second value in it - will end up being somewhere in the middle
at 2, numbers will be [2,8]
at 3, numbers will be [2,9], [3, 27]
...
at 8, numbers will be [2,8], [3,27].....[8,8^3]
You'll take off the first node, print it out, then put it back in the middle of numbers with the values [2,32]
I think this will work and has a reasonable memory usage.
There's a special case for 1, since 1^N never changes. This will also print out duplicate values for numbers - 256 for instance - and there are fairly simple ways to slightly alter the algorithm to remove those.
This solution is constant time for checking each number, but requires quite a bit of ram.

Consider k lists for numbers 2 .. k+1 numbers. Each list i represents the powers of number i+1. Since each list is a sorted use k-way merging with min heap to achieve what you need.
Min-heap is constructed with first indices of lists as key and after minimum is extracted we remove first element making second element as key and rearrange the heap to get next minimum.
This procedure is repeated till we get all numbers.

What shuffling algorithms exist besides Fisher-Yates and finding the "next permutation?"

Specifically in the domain of one-dimensional sets of items of the same type, such as a vector of integers.
Say, for example, you had a vector of size 32,768 containing the sorted integers 0 through 32,767.
What I mean by "next permutation" is performing the next permutation in a lexical ordering system.
Wikipedia lists two, and I'm wondering if there are any more (besides something bogo :P)

O(N) implementation
This is based on Eyal Schneider's mapping Zn! -> P(n)
def get_permutation(k, lst):
N = len(lst)
while N:
next_item = k/f(N-1)
lst[N-1], lst[next_item] = lst[next_item], lst[N-1]
k = k - next_item*f(N-1)
N = N-1
return lst
It reduces his O(N^2) algorithm by integrating the conversion step with finding the permutation. It essentially has the same form as Fisher-Yates but replaces a call to random with the next step of the mapping. If the mapping is in fact a bijection (which I'm working to prove) then this is a better algorithm than Fisher-Yates because it only calls out to pseudo random number generator once and so will be more efficient. Note also that this returns the action of permutation (N! - k) rather than permutation k but that's of little consequence because if k is uniform on [0, N!], then so is N! - k.
old answer
This is slightly related to the idea of "next" permutation. If the items can be well ordered, then one can construct lexicographical ordering on the permutations. This allows you to construct a map from the integers into the space of permutations.
Then finding a random permutation is equivalent to choosing a random integer between 0 and N! and constructing the corresponding permutation. This algorithm will be as efficient as (and as difficult to implement) as calculating the n'th permutation of the set in question. This trivially gives a uniform choice of permutation if our choice of n is uniform.
A little more detail about ordering the permutations. given a set S = {a b c d}, mathematicians view the set of permutations of S as a group with the operation of composition. if p is one permutation, lets say (b a c d), then p operates on S by taking b to a, a to c, c to d and d to b. if q is another permutation, lets say (d b c a) then pq is obtained by first applying q and then p which gives (d a b)(c). for example, q takes d to b and p takes b to a so that pq takes d to a. You'll see that pq has two cycles because it takes b to d and fixes c. It's customary to omit 1-cycles but I left it in for clarity.
We're going to use some facts from group theory.
disjoint cycles commute. (a b)(c d) is the same as (c d)(a b)
we can arrange elements in a cycle in any cyclic order. that is (a b c) = (b c a) = (c a b)
So given a permutation, order the cycles so that the largest cycles come first. When two cycles are the same length, arrange their items so that the largest (we can always order a denumerable set, even if arbitrarily so) item comes first. Then we just have a lexicographical ordering first on the length of the cycles, then on their contents. This is well ordered because two permutations that consist of the same cycles must be the same permutation so if p > q and q > p then p = q.
This algorithm can be trivially executed in O(N!logN! + N!) time. just construct all the permutations (EDIT: Just to be clear, I had my mathematician hat on when I proposed this and it was tongue in cheek anyway) , quicksort them and find the n'th. It is a different algorithm than the two you mention though.

Here is an idea on how to improve aaronasterling's answer. It avoids generating all N! permutations and sorting them according to their lexicographic order, and therefore has a much better time complexity.
Internally it uses an unusual permutation representation, that simulates a selection & removal process from a shrinking array. For example, the sequence <0,1,0> represents a permutation resulting from removing item #0 from [0,1,2], then removing item #1 from [1,2], and then removing item #0 from [1]. The resulting permutation is <0,2,1>. With this representation, the first permutation will always be <0,0,...0>, and the last one will always be <N-1,N-2,...0>. I will call this special representation the "array representation".
Clearly, an array representation of size N can be converted to a standard permutation representation in O(N^2) time, by using an array and shrinking it when necessary.
The following function can be used to return the Kth permutation on {0,1,2...,N-1}, in the array representation:
getPermutation(k, N) {
while(N > 0) {
nextItem = floor(k / (N-1)!)
output nextItem
k = k - nextItem * (N-1)!
N = N - 1
}
}
This algorithm works in O(N^2) time (due to the representation conversion), instead of O(N! log N) time.
--Example--
getPermutation(4,3) returns <2,0,0>. This array representation corresponds to <C,A,B>, which is really the permutation at index 4 in the ordered list of permutations on {A,B,C}:
ABC
ACB
BAC
BCA
CAB
CBA

You can adapt merge sort such that it will shuffle the input randomly instead of sorting it.
In particular, when merging two lists, you choose the new head element at random instead of choosing it to be the smallest head element. The probability of choosing the element from the first list must be n/(n+m) where n is the length of the first and m the length of the second list for this to work.
I've written a detailed explanation here: Random Permutations and Sorting.

Another possibility is to build an LFSR or PRNG with a period equal to the number of items you want.

Start with a sorted array. Pick 2 random indexes, switch the elements at those indexes. Repeat O(n lg n) times.
You need to repeat O(n lg n) times to ensure that the distribution approaches uniform. (You need to make sure that each index is picked at least once, which is a balls-in-bins problem.)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio