Finding least number of bit sequence ORs to achieve all 1's? - algorithm

I'm trying to find anything that may help with this task: I have a variable number of bit sequences (that will all individually be the same length) and I need to find which combination of sequences would OR to all 1's, using as few sequences as possible. I was thinking to start with whichever sequence had the most 1's and try filling in the blanks, but since I haven't worked with bit comparisons really I didn't know if there was some algorithm or property of bit logic that would simplify this. Thanks.

This problem, unfortunately, is NP-hard in the most general case by a reduction from the set cover problem. In the set cover problem, you have a collection of sets of elements, and want to find the smallest number of them whose union contains all the total elements. You can easily reduce the set cover problem to your problem by constructing a bitvector for each set that has a 1 in each position if a given set has that item and a 0 otherwise. The smallest number of bitvectors whose OR gives all 1s is then equivalent to the smallest group of sets whose union contains all elements.
For example, given the sets {a, b, e}, {b, c}, {b, d, f}, and {a, f}, you would get these bitvectors:
{a, b, e} 110010
{b, c} 011000
{b, d, f} 010101
{a, f} 100001
Since the set cover problem is known to be NP-hard, this means that unless P = NP there is no polynomial-time algorithm for your problem. Worse, it is known that you cannot approximate the optimal solution within a factor of O(log n), where n is the number of total elements, in polynomial time. You are probably best off looking for heuristics, or staying content with an O(log n) approximation using the greedy algorithm.
Hope this helps!

I thought a bit about this problem and here's the idea I came up with:
First you create for every bit a List and in every List you'll find every sequence that has a '1' on this bit. This takes O(n*m) beeing n the number of sequences and m the length of a particular sequence
Then you count all occurences of every Bitsequence and throw all these Tuple of [List, Integer] in a structure (AVL Tree or Heap or whatever you like) and sort them. (I mean: the sequence 'a' occurs 15 times over all lists and sequence b 10 times). This takes again O(n*m) because O(nlogn) < O(n*m)
In the next step you use the sequence with the highest priority and remove all lists of step one wich contain this sequence. Then you go back to step 2 until you have eliminated all lists. In the worst case you'll have to do this m times.
So in total we have a time of O(n * m^2)
Correct me if you I misunderstood a part of the question or if I did a mistake ;)
Here is a little example of what I mean:
Bit Strings:
a: 100101
b: 010001
c: 011100
d: 000010
So this will create the Lists:
L1: a
L2: b,c
L3: c
L4: a, c
L5: d
L6: a, b
Then we will count and sort:
a: 3
c: 3
b: 2
d: 1
So we take a in our final list and delete the following Lists:
L1, L4, L6
Now we count again:
c: 2
b: 1
d: 1
so we take c in our list and delete:
L2, L3
so we have only L5 left wich only contains d
So we have found our final minimal set: a, c, d


Knuth's Algorithm X for exact cover with restricted block sizes

The famous algorithm for exact cover problem is given by Donald Knuth called Knuth's Algorithm X.
Input: List of subsets of a Universal sets
Output: All the possible disjoint subset whose union is Universal set
Suppose the input is {ab, ac, cd, c, d, a, b}. Is it possible to make the Knuth's Algorithm X such that it will give output according to some predefined block size. For example if {2, 2} is the block size set, it will give output: {ab, cd}, if {2,1,1} is the block size set, it will give output: {ab, c, d}, {ac, b, d} and {cd, b, a}.
You can (optionally) start with removing all subsets from your input list that does not have size in set of block sizes.
The original Knuth's Algorithm X can be altered with the set of block sizes (for example {2, 1, 1}) as a restriction using extensions in bold as follows:
If A is empty and set of block sizes is empty, the problem is solved; terminate successfully.
Otherwise choose a column, c (deterministically).
Choose a row, r, such that A[r, c] = 1 and number of 1s in row r is in the set of block sizes (nondeterministically).
Include r in the partial solution
Remove number of 1s in row r from set of block sizes
For each j such that A[r, j] = 1,
Delete column j from matrix A;
For each i such that A[i, j] = 1,
Delete row i from matrix A.
Repeat this algorithm recursively on the reduced matrix A and reduced set of block sizes.

Finding maximum valued subset in which PartitionProblem algorithm returns true

I´ve got the following assignment.
You have a multiset S with of 1<=N<=22 elements.
Each element has a positive value of up to 10000000.
Assmuming that there are two subsets s1 and s2 of S in which the sum of the values of all the elements of one is equal to the sum of the value of all the elements of the other and it is the highest possible value. I have to return which elements of S would not be included in either of the two subsets.
Its probably been solved before, I think its some variant of the Partition problem but I can´t find it. If anyone could point me in the right direction that´d be great.
EDIT: An element can´t be in both subsets.
This is variation of subset sum, and can be solved similarly, by increasing the dimension of the problem (and the DP matrix), and then applying a solution very similar to the original one for subset-sum, which follows the recursive formula:
D(i,x,y) = D(i-1,x,y) OR D(i-1,x-l[i],y) OR D(i-1,x,y-l[i])
^ ^ ^
not chosen chosen for first set chosen for 2nd set
and base clause:
D(0,0,0) = true
D(0,x,y) = false x!=0 or y!=0
D(i,x,y) = false x<0 or y<0
After done calculating the DP matrix (3d array actyally) for this problem, all you have to do is find if there is any entry D(n,x,x) == true, for some x<= SUM/2 (where SUM is the sum of the entire original set), to find if there is any feasible solution.
Since you want the maximal value, the answer should be the maximal value of such x that D(n,x,x)=true (since there could be more than one)
Finding the elements themselves can be done after finding the solution (the value of x in D(n,x,x)) by following back the DP matrix and retracing your steps as explained for similar problems such as this: How to find which elements are in the bag, using Knapsack Algorithm [and not only the bag's value]?
Total complexity of this solution is O(SUM^2 * n)
Partition S as evenly as possible into T ∪ U (put the extra element, if any, in U). Loop through the three-way partitions of T into A ∪ B ∪ C (≤ 311 = 177,147 of them). Store the item |sum(A) - sum(B)| → C into a map, keeping only the value with the lowest sum in case the key already exists.
Loop through the three-way partitions of U into D ∪ E ∪ F. Look up |sum(D) - sum(E)| in the map; if it exists with value C, then consider C ∪ F as a possibility for the elements left out (the two parts with equal sum are either A ∪ D and B ∪ E, or A ∪ E and B ∪ D).

Subsets with equal sum

I want to calculate how many pairs of disjoint subsets S1 and S2 (S1 U S2 may not be S) of a set S exists for which sum of elements in S1 = sum of elements in S2.
Say i have calculated all the subset sums for all the possible 2^n subsets.
How do i find how many disjoint subsets have equal sum.
For a sum value A, can we use the count of subsets having sum A/2 to solve this ?
As an example :
S ={1,2,3,4}
Various S1 and S2 sets possible are:
S1 = {1,2} and S2 = {3}
S1 = {1,3} and S2 = {4}
S1 = {1,4} nd S2 = {2,3}
Here is the link to the problem :
[EDIT: Fixed stupid complexity mistakes. Thanks kash!]
Actually I believe you'll need to use the O(3^n) algorithm described here to answer this question -- the O(2^n) partitioning algorithm is only good enough to enumerate all pairs of disjoint subsets whose union is the entire ground set.
As described at the answer I linked to, for each element you are essentially deciding whether to:
Put it in the first set,
Put it in the second set, or
Ignore it.
Considering every possible way to do this generates a tree where each vertex has 3 children: hence O(3^n) time. One thing to note is that if you generate a solution (S1, S2) then you should not also count the solution (S2, S1): this can be achieved by always maintaining an asymmetry between the two sets as you build them up, e.g. enforcing that the smallest element in S1 must always be smaller than the smallest element in S2. (This asymmetry enforcement has the nice side-effect of halving the execution time :))
A speedup for a special (but perhaps common in practice) case
If you expect that there will be many small numbers in the set, there is another possible speedup available to you: First, sort all the numbers in the list in increasing order. Choose some maximum value m, the larger the better, but small enough that you can afford an m-size array of integers. We will now break the list of numbers into 2 parts that we will process separately: an initial list of numbers that sum to at most m (this list may be quite small), and the rest. Suppose the first k <= n numbers fit into the first list, and call this first list Sk. The rest of the original list we will call S'.
First, initialise a size-m array d[] of integers to all 0, and solve the problem for Sk as usual -- but instead of only recording the number of disjoint subsets having equal sums, increment d[abs(|Sk1| - |Sk2|)] for every pair of disjoint subsets Sk1 and Sk2 formed from these first k numbers. (Also increment d[0] to count the case when Sk1 = Sk2 = {}.) The idea is that after this first phase has finished, d[i] will record the number of ways that 2 disjoint subsets having a difference of i can be generated from the first k elements of S.
Second, process the remainder (S') as usual -- but instead of only recording the number of disjoint subsets having equal sums, whenever |S1'| - |S2'| <= m, add d[abs(|S1'| - |S2'|)] to the total number of solutions. This is because we know that there are that many ways of building a pair of disjoint subsets from the first k elements having this difference -- and for each of these subset pairs (Sk1, Sk2), we can add the smaller of Sk1 or Sk2 to the larger of S1' or S2', and the other one to the other one, to wind up with a pair of disjoint subsets having equal sum.
Here is a clojure solution.
It defines s to be a set of 1, 2, 3, 4
Then all-subsets is defined to be a list of all sets of size 1 - 3
Once all the subsets are defined, it looks at all pairs of subsets and selects only the pairs that are not equal, do not union to the original set, and whose sum is equal
(require 'clojure.set)
(use 'clojure.math.combinatorics)
(def s #{1, 2, 3, 4})
(def subsets (mapcat #(combinations s %) (take 3 (iterate inc 1))))
(for [x all-subsets y all-subsets
:when (and (= (reduce + x) (reduce + y))
(not= s (clojure.set/union (set x) (set y)))
(not= x y))]
[x y])
Produces the following:
([(3) (1 2)] [(4) (1 3)] [(1 2) (3)] [(1 3) (4)])

Algorithm to "transfer water from a set of bottles to another one" (metaphorically speaking)

Ok, I have a problem. I have a set "A" of bottles of various sizes, all full of water.
Then I have another set "B" of bottles of various sizes, all empty.
I want to transfer the water from A to B, knowing that the total capacity of each set is the same. (i.e.: Set A contains the same amount of water as set B).
This is of course trivial in itself, just take the first bottle in B, pour it in the first in A until this is full. Then if the bottle from B has still water in it, go on with the second bottle in A, etc.
However, I want to minimize the total number of pours (the action of pouring from a bottle into another, each action counts 1, independently from how much water it involves)
I'd like to find a greedy algorithm to do this, or if not possible at least an efficient one. However, efficiency is secondary to correctness of the algorithm (I don't want a suboptimal solution).
Of course this problem is just a metaphor for a real problem in a computer program to manage personal expenses.
Bad news: this problem is NP-hard by a reduction from subset sum. Given numbers x1, …, xn, S, the object of subset sum is to determine whether or not some subset of the xis sum to S. We make A-bottles with capacities x1, …, xn and B-bottles with capacities S and (x1 + … + xn - S) and determine whether n pours are sufficient.
Good news: any greedy strategy (i.e., choose any nonempty A, choose any unfilled B, pour until we have to stop) is a 2-approximation (i.e., uses at most twice as many pours as optimal). The optimal solution uses at least max(|A|, |B|) pours, and greedy uses at most |A| + |B|, since every time greedy does a pour, either an A is drained or a B is filled and does not need to be poured out of or into again.
There might be an approximation scheme (a (1 + ε)-approximation for any ε > 0). I think now it's more likely that there's an inapproximability result – the usual tricks for obtaining approximation schemes don't seem to apply here.
Here are some ideas that might lead to a practical exact algorithm.
Given a solution, draw a bipartite graph with left vertices A and right vertices B and an (undirected) edge from a to b if and only if a is poured into b. If the solution is optimal, I claim that there are no cycles – otherwise we could eliminate the smallest pour in the cycle and replace the lost volume going around the cycle. For example, if I have pours
a1 -> b1: 1
a1 -> b2: 2
a2 -> b1: 3
a2 -> b3: 4
a3 -> b2: 5
a3 -> b3: 6
then I can eliminate by a1 -> b1 pour like so:
a2 -> b1: 4 (+1)
a2 -> b3: 3 (-1)
a3 -> b3: 7 (+1)
a3 -> b2: 4 (-1)
a1 -> b2: 3 (+1)
Now, since the graph has no cycle, we can count the number of edges (pours) as |A| + |B| - #(connected components). The only variable here is the number of connected components, which we want to maximize.
I claim that the greedy algorithm forms graphs that have no cycle. If we knew what the connected components of an optimal solution were, we could use a greedy algorithm on each one and get an optimal solution.
One way to tackle this subproblem would be to use dynamic programming to enumerate all subset pairs X of A and Y of B such that sum(X) == sum(Y) and then feed these into an exact cover algorithm. Both steps are of course exponential, but they might work well on real data.
Here's my take:
Identify bottles having the exact same size in both sets. This translate to one-to-one pour for these same-size bottles.
Sort the remaining bottles in A in descending order by capacity, and sort remaining bottles in B in ascending order. Compute the number of pours you need when pouring sorted list in A to B.
Update: After each pour in step 2, repeat step 1. (Optimization step suggested by Steve Jessop). Rinse and repeat until all water is transferred.
i think this gives the minimum number of pours:
import bisect
def pours(A, B):
assert sum(A) == sum(B)
count = 0
while A and B:
i = A.pop()
j = B.pop()
if i == j:
count += 1
elif i > j:
bisect.insort(A, i-j)
count += 1
elif i < j:
bisect.insort(B, j-i)
count += 1
return count
print pours(A,B)
# gives 3
print pours(A,B)
# gives 5
in English it reads:
assert that both lists have the same sum (i think the algorithm will still work if sum(A) > sum(B) or sum(A) < sum(B) is true)
take the two lists A and B, sort both them
while A isn't empty and B isn't empty:
take i (the largest) from A and j (the largest) from B
if i equals j, pour i in j and count 1 pour
if i is larger than j, pour i in j, place i-j remainder back in A (using an insertion sort), count 1 pour
if i is smaller than j, pour i in j, place j-i remainder back in B (using an insertion sort), count 1 pour

What shuffling algorithms exist besides Fisher-Yates and finding the "next permutation?"

Specifically in the domain of one-dimensional sets of items of the same type, such as a vector of integers.
Say, for example, you had a vector of size 32,768 containing the sorted integers 0 through 32,767.
What I mean by "next permutation" is performing the next permutation in a lexical ordering system.
Wikipedia lists two, and I'm wondering if there are any more (besides something bogo :P)
O(N) implementation
This is based on Eyal Schneider's mapping Zn! -> P(n)
def get_permutation(k, lst):
N = len(lst)
while N:
next_item = k/f(N-1)
lst[N-1], lst[next_item] = lst[next_item], lst[N-1]
k = k - next_item*f(N-1)
N = N-1
return lst
It reduces his O(N^2) algorithm by integrating the conversion step with finding the permutation. It essentially has the same form as Fisher-Yates but replaces a call to random with the next step of the mapping. If the mapping is in fact a bijection (which I'm working to prove) then this is a better algorithm than Fisher-Yates because it only calls out to pseudo random number generator once and so will be more efficient. Note also that this returns the action of permutation (N! - k) rather than permutation k but that's of little consequence because if k is uniform on [0, N!], then so is N! - k.
old answer
This is slightly related to the idea of "next" permutation. If the items can be well ordered, then one can construct lexicographical ordering on the permutations. This allows you to construct a map from the integers into the space of permutations.
Then finding a random permutation is equivalent to choosing a random integer between 0 and N! and constructing the corresponding permutation. This algorithm will be as efficient as (and as difficult to implement) as calculating the n'th permutation of the set in question. This trivially gives a uniform choice of permutation if our choice of n is uniform.
A little more detail about ordering the permutations. given a set S = {a b c d}, mathematicians view the set of permutations of S as a group with the operation of composition. if p is one permutation, lets say (b a c d), then p operates on S by taking b to a, a to c, c to d and d to b. if q is another permutation, lets say (d b c a) then pq is obtained by first applying q and then p which gives (d a b)(c). for example, q takes d to b and p takes b to a so that pq takes d to a. You'll see that pq has two cycles because it takes b to d and fixes c. It's customary to omit 1-cycles but I left it in for clarity.
We're going to use some facts from group theory.
disjoint cycles commute. (a b)(c d) is the same as (c d)(a b)
we can arrange elements in a cycle in any cyclic order. that is (a b c) = (b c a) = (c a b)
So given a permutation, order the cycles so that the largest cycles come first. When two cycles are the same length, arrange their items so that the largest (we can always order a denumerable set, even if arbitrarily so) item comes first. Then we just have a lexicographical ordering first on the length of the cycles, then on their contents. This is well ordered because two permutations that consist of the same cycles must be the same permutation so if p > q and q > p then p = q.
This algorithm can be trivially executed in O(N!logN! + N!) time. just construct all the permutations (EDIT: Just to be clear, I had my mathematician hat on when I proposed this and it was tongue in cheek anyway) , quicksort them and find the n'th. It is a different algorithm than the two you mention though.
Here is an idea on how to improve aaronasterling's answer. It avoids generating all N! permutations and sorting them according to their lexicographic order, and therefore has a much better time complexity.
Internally it uses an unusual permutation representation, that simulates a selection & removal process from a shrinking array. For example, the sequence <0,1,0> represents a permutation resulting from removing item #0 from [0,1,2], then removing item #1 from [1,2], and then removing item #0 from [1]. The resulting permutation is <0,2,1>. With this representation, the first permutation will always be <0,0,...0>, and the last one will always be <N-1,N-2,...0>. I will call this special representation the "array representation".
Clearly, an array representation of size N can be converted to a standard permutation representation in O(N^2) time, by using an array and shrinking it when necessary.
The following function can be used to return the Kth permutation on {0,1,2...,N-1}, in the array representation:
getPermutation(k, N) {
while(N > 0) {
nextItem = floor(k / (N-1)!)
output nextItem
k = k - nextItem * (N-1)!
N = N - 1
This algorithm works in O(N^2) time (due to the representation conversion), instead of O(N! log N) time.
getPermutation(4,3) returns <2,0,0>. This array representation corresponds to <C,A,B>, which is really the permutation at index 4 in the ordered list of permutations on {A,B,C}:
You can adapt merge sort such that it will shuffle the input randomly instead of sorting it.
In particular, when merging two lists, you choose the new head element at random instead of choosing it to be the smallest head element. The probability of choosing the element from the first list must be n/(n+m) where n is the length of the first and m the length of the second list for this to work.
I've written a detailed explanation here: Random Permutations and Sorting.
Another possibility is to build an LFSR or PRNG with a period equal to the number of items you want.
Start with a sorted array. Pick 2 random indexes, switch the elements at those indexes. Repeat O(n lg n) times.
You need to repeat O(n lg n) times to ensure that the distribution approaches uniform. (You need to make sure that each index is picked at least once, which is a balls-in-bins problem.)
