Bit masks for subsets from two different sets

Bit masks for subsets from two different sets - algorithm

I have two sets -
set1 - {i1,i2,i3...iN1}
set2 - {k1,k2,k3...kN2}
For any single set of n items I can represent all possible subsets using bit masks 0-2^n -1.
Similarly how can i represent -
All possible subset of set1 and set2, where at-least 1 items is from the different set.
for example
{i1,i2,k1} is valid
but {i1,i2} - invalid as it has no item from set2.
I am trying to generate two things -
Kind of a equation which can give me a count of all subsets, like we have 2^n subsets for a single n items set.
Bit encoding/masks using which i can represent above type of subsets.

This will be easier if we introduce a few extra sets of interest. Let's call the two input sets S1 and S2; we'll define sets L, C, and R (for left, center, and right). Think of these as being the Venn diagram. So, define L = S1 \ S2, the elements in S1 that aren't in S2 at all; C = S1 ∩ S2, the elements that are in both S1 and S2, and R = S2 \ S1. Let's also write l, c, and r for the sizes of these sets, respectively.
Now we're ready to answer question 1: how many subsets of S1 ∪ S2 have an element from S1 and have an element from S2? There are two cases to consider: either we have a subset with an element in C (which satisfies both the "an element from S1" and the "an element from S2" clauses), or we have no elements in C and at least one each from L and R. In the first case, there are (2^c - 1) non-empty subsets of C, and 2^(l+r) subsets of the remainder, so there are (2^c - 1)*2^(l+r) sets in that case. In the second case, there are 2^l - 1 non-empty subsets of L and 2^r - 1 non-empty subsets of R, so there are (2^l - 1) * (2^r - 1) subsets in that case. Adding up the two cases, we have a total of 2^(c+l+r) - 2^l - 2^r total subsets satisfying the condition.
If you have a fancy representation of non-empty subsets, this also immediately suggests a data structure for storing these: a single bit tag for which case you're in, plus the appropriate representations of subsets and non-empty subsets in each case.
But I would probably just use a single bitmask of size c+l+r, even though there are a few "invalid" bitmasks: it's very compact, it's easy to check validity, and there are many cheap operations on bitmasks.

Related

Disjoint Sets of Strings - Minimization Problem

There are two sets, s1 and s2, each containing pairs of letters. A pair is only equivalent to another pair if their letters are in the same order, so they're essentially strings (of length 2). The sets s1 and s2 are disjoint, neither set is empty, and each pair of letters only appears once.
Here is an example of what the two sets might look like:
s1 = { ax, bx, cy, dy }
s2 = { ay, by, cx, dx }
The set of all letters in (s1 ∪ s2) is called sl. The set sr is a set of letters of your choice, but must be a subset of sl. Your goal is to define a mapping m from letters in sl to letters in sr, which, when applied to s1 and s2, will generate the sets s1' and s2', which also contain pairs of letters and must also be disjoint.
The most obvious m just maps each letter to itself. In this example (shown below), s1 is equivalent to s1', and s2 is equivalent to s2' (but given any other m, that would not be the case).
a -> a
b -> b
c -> c
d -> d
x -> x
y -> y
The goal is to construct m such that sr (the set of letters on the right-hand side of the mapping) has the fewest number of letters possible. To accomplish this, you can map multiple letters in sl to the same letter in sr. Note that depending on s1 and s2, and depending on m, you could potentially break the rule that s1' and s2' must be disjoint. For example, you would obviously break that rule by mapping every letter in sl to a single letter in sr.
So, given s1 and s2, how can someone construct an m that minimizes sr, while ensuring that s1' and s2' are disjoint?
Here is a simplified visualization of the problem:

This problem is NP-hard, to show this, consider reducing graph coloring to this problem.
Proof:
Let G=(V,E) be the graph for which we want to compute the minimal graph coloring problem. Formally, we want to compute the chromatic number of the graph, which is the lowest k for which G is k colourable.
To reduce the graph coloring problem to the problem described here, define
s1 = { zu : (u,v) \in E }
s2 = { zv : (u,v) \in E }
where z is a magic value unused other than in constructing s1 & s2.
By construction of the sets above, for any mapping m and any edge (u,v) we must have m(u) != m(v), otherwise the disjointedness of s1' and s2' would be violated. Thus, any optimal sr is the set of optimal colors (with the exception of z) to color the graph G and m is the mapping that defines which node is assigned which color. QED.
The proof above may give the intuition that researching graph coloring approximations would be a good start, and indeed it probably would, but there is a confounding factor involved. This confounding factor is that for two elements ab \in s1 and cd \in s2, if m(a) = m(c) then m(b) != m(d). Logically, this is equivalent to the statement m(a) != m(c) or m(b) != m(d). These types of constraints, in isolation, do not map naturally to an analogous graph problem (because of the or statement.)
There are ways to formulate this problem as an (binary) ILP and solve it as such. This would likely give you (slightly) inferior results to a custom designed & tuned branch-and-bound implementation (assuming you want to find the optimal solution) but would work with turn-key solvers.
If you are more interested in approximations (possibly with guaranteed ratios of optimality) I would investigate a SDP relaxation to your problem & appropriate rounding scheme. This level of work would likely be the kind one would invest in a small-to-medium sized research paper.

Count all subsets with given sum - Java

I have an array list of distinct positive integers representing a set L, and an integer S. What's the fastest way to count all subsets of L which have the sum of their elements equal to S, instead of iterating over all subsets and just checking if each subset's sum equal is equal to S?

This can be solved in O(NS) using a simple dynamic programming approach, similar to knapsack problem. Let's for each Q and for each i solve the following problem: how many subsets of first i elements of L exist so that their sum is equal to Q. Let us denote the number of such subsets C[i,Q].
Obviously, C[0,0]=1 and C[0,Q]=0 for Q!=0. (Note that i=0 denotes first 0 elements, that is no elements.)
For a bigger i we have two possibilities: either the last available element (L[i-1]) is taken to our set, then we have C[i-1, Q-L[i-1]] such sets. Either it is not taken, then we have C[i-1, Q] such sets. Therefore, C[i,Q]=C[i-1, Q-L[i-1]]+C[i-1, Q]. Iterating over i and Q, we calculate all Cs.
Note that if all elements in L are non-negative, then you can solve the problem only for non-negative Qs, and the first term disappears if Q<L[i-1]. If negative elements are allowed, then you need to consider negative Qs too.

Finding maximum valued subset in which PartitionProblem algorithm returns true

I´ve got the following assignment.
You have a multiset S with of 1<=N<=22 elements.
Each element has a positive value of up to 10000000.
Assmuming that there are two subsets s1 and s2 of S in which the sum of the values of all the elements of one is equal to the sum of the value of all the elements of the other and it is the highest possible value. I have to return which elements of S would not be included in either of the two subsets.
Its probably been solved before, I think its some variant of the Partition problem but I can´t find it. If anyone could point me in the right direction that´d be great.
EDIT: An element can´t be in both subsets.

This is variation of subset sum, and can be solved similarly, by increasing the dimension of the problem (and the DP matrix), and then applying a solution very similar to the original one for subset-sum, which follows the recursive formula:
D(i,x,y) = D(i-1,x,y) OR D(i-1,x-l[i],y) OR D(i-1,x,y-l[i])
^ ^ ^
not chosen chosen for first set chosen for 2nd set
and base clause:
D(0,0,0) = true
D(0,x,y) = false x!=0 or y!=0
D(i,x,y) = false x<0 or y<0
After done calculating the DP matrix (3d array actyally) for this problem, all you have to do is find if there is any entry D(n,x,x) == true, for some x<= SUM/2 (where SUM is the sum of the entire original set), to find if there is any feasible solution.
Since you want the maximal value, the answer should be the maximal value of such x that D(n,x,x)=true (since there could be more than one)
Finding the elements themselves can be done after finding the solution (the value of x in D(n,x,x)) by following back the DP matrix and retracing your steps as explained for similar problems such as this: How to find which elements are in the bag, using Knapsack Algorithm [and not only the bag's value]?
Total complexity of this solution is O(SUM^2 * n)

Partition S as evenly as possible into T ∪ U (put the extra element, if any, in U). Loop through the three-way partitions of T into A ∪ B ∪ C (≤ 311 = 177,147 of them). Store the item |sum(A) - sum(B)| → C into a map, keeping only the value with the lowest sum in case the key already exists.
Loop through the three-way partitions of U into D ∪ E ∪ F. Look up |sum(D) - sum(E)| in the map; if it exists with value C, then consider C ∪ F as a possibility for the elements left out (the two parts with equal sum are either A ∪ D and B ∪ E, or A ∪ E and B ∪ D).

Subsets with equal sum

I want to calculate how many pairs of disjoint subsets S1 and S2 (S1 U S2 may not be S) of a set S exists for which sum of elements in S1 = sum of elements in S2.
Say i have calculated all the subset sums for all the possible 2^n subsets.
How do i find how many disjoint subsets have equal sum.
For a sum value A, can we use the count of subsets having sum A/2 to solve this ?
As an example :
S ={1,2,3,4}
Various S1 and S2 sets possible are:
S1 = {1,2} and S2 = {3}
S1 = {1,3} and S2 = {4}
S1 = {1,4} nd S2 = {2,3}
Here is the link to the problem :
http://www.usaco.org/index.php?page=viewproblem2&cpid=139

[EDIT: Fixed stupid complexity mistakes. Thanks kash!]
Actually I believe you'll need to use the O(3^n) algorithm described here to answer this question -- the O(2^n) partitioning algorithm is only good enough to enumerate all pairs of disjoint subsets whose union is the entire ground set.
As described at the answer I linked to, for each element you are essentially deciding whether to:
Put it in the first set,
Put it in the second set, or
Ignore it.
Considering every possible way to do this generates a tree where each vertex has 3 children: hence O(3^n) time. One thing to note is that if you generate a solution (S1, S2) then you should not also count the solution (S2, S1): this can be achieved by always maintaining an asymmetry between the two sets as you build them up, e.g. enforcing that the smallest element in S1 must always be smaller than the smallest element in S2. (This asymmetry enforcement has the nice side-effect of halving the execution time :))
A speedup for a special (but perhaps common in practice) case
If you expect that there will be many small numbers in the set, there is another possible speedup available to you: First, sort all the numbers in the list in increasing order. Choose some maximum value m, the larger the better, but small enough that you can afford an m-size array of integers. We will now break the list of numbers into 2 parts that we will process separately: an initial list of numbers that sum to at most m (this list may be quite small), and the rest. Suppose the first k <= n numbers fit into the first list, and call this first list Sk. The rest of the original list we will call S'.
First, initialise a size-m array d[] of integers to all 0, and solve the problem for Sk as usual -- but instead of only recording the number of disjoint subsets having equal sums, increment d[abs(|Sk1| - |Sk2|)] for every pair of disjoint subsets Sk1 and Sk2 formed from these first k numbers. (Also increment d[0] to count the case when Sk1 = Sk2 = {}.) The idea is that after this first phase has finished, d[i] will record the number of ways that 2 disjoint subsets having a difference of i can be generated from the first k elements of S.
Second, process the remainder (S') as usual -- but instead of only recording the number of disjoint subsets having equal sums, whenever |S1'| - |S2'| <= m, add d[abs(|S1'| - |S2'|)] to the total number of solutions. This is because we know that there are that many ways of building a pair of disjoint subsets from the first k elements having this difference -- and for each of these subset pairs (Sk1, Sk2), we can add the smaller of Sk1 or Sk2 to the larger of S1' or S2', and the other one to the other one, to wind up with a pair of disjoint subsets having equal sum.

Here is a clojure solution.
It defines s to be a set of 1, 2, 3, 4
Then all-subsets is defined to be a list of all sets of size 1 - 3
Once all the subsets are defined, it looks at all pairs of subsets and selects only the pairs that are not equal, do not union to the original set, and whose sum is equal
(require 'clojure.set)
(use 'clojure.math.combinatorics)
(def s #{1, 2, 3, 4})
(def subsets (mapcat #(combinations s %) (take 3 (iterate inc 1))))
(for [x all-subsets y all-subsets
:when (and (= (reduce + x) (reduce + y))
(not= s (clojure.set/union (set x) (set y)))
(not= x y))]
[x y])
Produces the following:
([(3) (1 2)] [(4) (1 3)] [(1 2) (3)] [(1 3) (4)])

What shuffling algorithms exist besides Fisher-Yates and finding the "next permutation?"

Specifically in the domain of one-dimensional sets of items of the same type, such as a vector of integers.
Say, for example, you had a vector of size 32,768 containing the sorted integers 0 through 32,767.
What I mean by "next permutation" is performing the next permutation in a lexical ordering system.
Wikipedia lists two, and I'm wondering if there are any more (besides something bogo :P)

O(N) implementation
This is based on Eyal Schneider's mapping Zn! -> P(n)
def get_permutation(k, lst):
N = len(lst)
while N:
next_item = k/f(N-1)
lst[N-1], lst[next_item] = lst[next_item], lst[N-1]
k = k - next_item*f(N-1)
N = N-1
return lst
It reduces his O(N^2) algorithm by integrating the conversion step with finding the permutation. It essentially has the same form as Fisher-Yates but replaces a call to random with the next step of the mapping. If the mapping is in fact a bijection (which I'm working to prove) then this is a better algorithm than Fisher-Yates because it only calls out to pseudo random number generator once and so will be more efficient. Note also that this returns the action of permutation (N! - k) rather than permutation k but that's of little consequence because if k is uniform on [0, N!], then so is N! - k.
old answer
This is slightly related to the idea of "next" permutation. If the items can be well ordered, then one can construct lexicographical ordering on the permutations. This allows you to construct a map from the integers into the space of permutations.
Then finding a random permutation is equivalent to choosing a random integer between 0 and N! and constructing the corresponding permutation. This algorithm will be as efficient as (and as difficult to implement) as calculating the n'th permutation of the set in question. This trivially gives a uniform choice of permutation if our choice of n is uniform.
A little more detail about ordering the permutations. given a set S = {a b c d}, mathematicians view the set of permutations of S as a group with the operation of composition. if p is one permutation, lets say (b a c d), then p operates on S by taking b to a, a to c, c to d and d to b. if q is another permutation, lets say (d b c a) then pq is obtained by first applying q and then p which gives (d a b)(c). for example, q takes d to b and p takes b to a so that pq takes d to a. You'll see that pq has two cycles because it takes b to d and fixes c. It's customary to omit 1-cycles but I left it in for clarity.
We're going to use some facts from group theory.
disjoint cycles commute. (a b)(c d) is the same as (c d)(a b)
we can arrange elements in a cycle in any cyclic order. that is (a b c) = (b c a) = (c a b)
So given a permutation, order the cycles so that the largest cycles come first. When two cycles are the same length, arrange their items so that the largest (we can always order a denumerable set, even if arbitrarily so) item comes first. Then we just have a lexicographical ordering first on the length of the cycles, then on their contents. This is well ordered because two permutations that consist of the same cycles must be the same permutation so if p > q and q > p then p = q.
This algorithm can be trivially executed in O(N!logN! + N!) time. just construct all the permutations (EDIT: Just to be clear, I had my mathematician hat on when I proposed this and it was tongue in cheek anyway) , quicksort them and find the n'th. It is a different algorithm than the two you mention though.

Here is an idea on how to improve aaronasterling's answer. It avoids generating all N! permutations and sorting them according to their lexicographic order, and therefore has a much better time complexity.
Internally it uses an unusual permutation representation, that simulates a selection & removal process from a shrinking array. For example, the sequence <0,1,0> represents a permutation resulting from removing item #0 from [0,1,2], then removing item #1 from [1,2], and then removing item #0 from [1]. The resulting permutation is <0,2,1>. With this representation, the first permutation will always be <0,0,...0>, and the last one will always be <N-1,N-2,...0>. I will call this special representation the "array representation".
Clearly, an array representation of size N can be converted to a standard permutation representation in O(N^2) time, by using an array and shrinking it when necessary.
The following function can be used to return the Kth permutation on {0,1,2...,N-1}, in the array representation:
getPermutation(k, N) {
while(N > 0) {
nextItem = floor(k / (N-1)!)
output nextItem
k = k - nextItem * (N-1)!
N = N - 1
}
}
This algorithm works in O(N^2) time (due to the representation conversion), instead of O(N! log N) time.
--Example--
getPermutation(4,3) returns <2,0,0>. This array representation corresponds to <C,A,B>, which is really the permutation at index 4 in the ordered list of permutations on {A,B,C}:
ABC
ACB
BAC
BCA
CAB
CBA

You can adapt merge sort such that it will shuffle the input randomly instead of sorting it.
In particular, when merging two lists, you choose the new head element at random instead of choosing it to be the smallest head element. The probability of choosing the element from the first list must be n/(n+m) where n is the length of the first and m the length of the second list for this to work.
I've written a detailed explanation here: Random Permutations and Sorting.

Another possibility is to build an LFSR or PRNG with a period equal to the number of items you want.

Start with a sorted array. Pick 2 random indexes, switch the elements at those indexes. Repeat O(n lg n) times.
You need to repeat O(n lg n) times to ensure that the distribution approaches uniform. (You need to make sure that each index is picked at least once, which is a balls-in-bins problem.)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio