Algorithm for: All possible ways of splitting a set of elements into two sets? - algorithm

I have n elements in a set U (lets assume represented by an array of size n). I want to find all possible ways of dividing the set U into two sets A and B, where |A| + |B| = n.
So for example, if U = {a,b,c,d}, the combinations would be:
A = {a} -- B = {b,c,d}
A = {b} -- B = {a,c,d}
A = {c} -- B = {a,b,d}
A = {d} -- B = {a,b,c}
A = {a,b} -- B = {c,d}
A = {a,c} -- B = {b,d}
A = {a,d} -- B = {b,c}
Note that the following two cases are considered equal and only one should be computed:
Case 1: A = {a,b} -- B = {c,d}
Case 2: A = {c,d} -- B = {a,b}
Also note that none of the sets A or B can be empty.
The way I'm thinking of implementing it is by just keeping track of indices in the array and moving them step by step. The number of indices will be equal to the number of elements in the set A, and set B will contain all the remaining un-indexed elements.
I was wondering if anyone knew of a better implementation. Im looking for better efficiency because this code will be executed on a fairly large set of data.
Thanks!

Take all the integers from 1 to 2^(n-1), non-inclusive. So if n = 4, the integers from 1 to 7.
Each of these numbers, written in binary, represents the elements present in set A. Set B consists of the remaining elements. Note that since we're only going to 2^(n-1), not 2^n, the high bit is always set for set B; we're always putting the first element in set B, since you want order not to matter.

Related

Algorithm to group items in groups of 3

I am trying to solve a problem where I have pairs like:
A C
B F
A D
D C
F E
E B
A B
B C
E D
F D
and I need to group them in groups of 3 where I must have a triangule of matching from that list. Basically I need a result if its possible or not to group a collection.
So the possible groups are (ACD and BFE), or (ABC and DEF) and this collection is groupable since all letters can be grouped in groups of 3 and no one is left out.
I made a script where I can achieve this for small ammounts of input but for big ammounts it gets too slow.
My logic is:
make nested loop to find first match (looping untill I find a match)
> remove 3 elements from the collection
> run again
and I do this until I am out of letters. Since there can be different combinations I run this multiple times starting on different letters until I find a match.
I can understand that this gives me loops in order at least N^N and can get too slow. Is there a better logic for such problems? can a binary tree be used here?
This problem can be modeled as a graph Clique cover problem. Every letter is a node and every pair is an edge and you want to partition the graph into vertex-disjoint cliques of size 3 (triangles). If you want the partitioning to be of minimum cardinality then you want a minimum clique cover.
Actually this would be a k-clique cover problem, because in the clique cover problem you can have cliques of arbitrary/different sizes.
As Alberto Rivelli already stated, this problem is reducible to the Clique Cover problem, which is NP-hard.
It is also reducible to the problem of finding a clique of particular/maximum size. Maybe there are others, not NP-hard problems to which your particular case could be reduced to, but I didn't think of any.
However, there do exist algorithms which can find the solution in polynomial time, although not always for worst cases. One of them is Bron–Kerbosch algorithm, which is known by far to be the most efficient algorithm for finding the maximum clique and can find a clique in the worst case of O(3^(n/3)). I don't know the size of your inputs, but I hope it will be sufficient for your problem.
Here is the code in Python, ready to go:
#!/usr/bin/python3
# #by DeFazer
# Solution to:
# stackoverflow.com/questions/40193648/algorithm-to-group-items-in-groups-of-3
# Input:
# N P - number of vertices and number of pairs
# P pairs, 1 pair per line
# Output:
# "YES" and groups themselves if grouping is possible, and "NO" otherwise
# Input example:
# 6 10
# 1 3
# 2 6
# 1 4
# 4 3
# 6 5
# 5 2
# 1 2
# 2 3
# 5 4
# 6 4
# Output example:
# YES
# 1-2-3
# 4-5-6
# Output commentary:
# There are 2 possible coverages: 1-2-3*4-5-6 and 2-5-6*1-3-4.
# If required, it can be easily modified to return all possible groupings rather than just one.
# Algorithm:
# 1) List *all* existing triangles (1-2-3, 1-3-4, 2-5-6...)
# 2) Build a graph where vertices represent triangles and edges connect these triangles with no common... vertices. Sorry for ambiguity. :)
# 3) Use [this](en.wikipedia.org/wiki/Bron–Kerbosch_algorithm) algorithm (slightly modified) to find a clique of size N/3.
# The grouping is possible if such clique exists.
N, P = map(int, input().split())
assert (N%3 == 0) and (N>0)
cliquelength = N//3
pairs = {} # {a:{b, d, c}, b:{a, c, f}, c:{a, b}...}
# Get input
# [(0, 1), (1, 3), (3, 2)...]
##pairlist = list(map(lambda ab: tuple(map(lambda a: int(a)-1, ab)), (input().split() for pair in range(P))))
pairlist=[]
for pair in range(P):
a, b = map(int, input().split())
if a>b:
b, a = a, b
a, b = a-1, b-1
pairlist.append((a, b))
pairlist.sort()
for pair in pairlist:
a, b = pair
if a not in pairs:
pairs[a] = set()
pairs[a].add(b)
# Make list of triangles
triangles = []
for a in range(N-2):
for b in pairs.get(a, []):
for c in pairs.get(b, []):
if c in pairs[a]:
triangles.append((a, b, c))
break
def no_mutual_elements(sortedtupleA, sortedtupleB):
# Utility function
# TODO: if too slow, can be improved to O(n) since tuples are sorted. However, there are only 9 comparsions in case of triangles.
return all((a not in sortedtupleB) for a in sortedtupleA)
# Make a graph out of that list
tgraph = [] # if a<b and (b in tgraph[a]), then triangles[a] has no common elements with triangles[b]
T = len(triangles)
for t1 in range(T):
s = set()
for t2 in range(t1+1, T):
if no_mutual_elements(triangles[t1], triangles[t2]):
s.add(t2)
tgraph.append(s)
def connected(a, b):
if a > b:
b, a = a, b
return (b in tgraph[a])
# Finally, the magic algorithm!
CSUB = set()
def extend(CAND:set, NOT:set) -> bool:
# while CAND is not empty and there is no vertex in NOT connected to *all* vertexes in CAND
while CAND and all((any(not connected(n, c) for c in CAND)) for n in NOT):
v = CAND.pop()
CSUB.add(v)
newCAND = {c for c in CAND if connected(c, v)}
newNOT = {n for n in NOT if connected(n, v)}
if (not newCAND) and (not newNOT) and (len(CSUB)==cliquelength): # the last condition is the algorithm modification
return True
elif extend(newCAND, newNOT):
return True
else:
CSUB.remove(v)
NOT.add(v)
if extend(set(range(T)), set()):
print("YES")
# If the clique itself is not needed, it's enough to remove the following 2 lines
for a, b, c in [triangles[c] for c in CSUB]:
print("{}-{}-{}".format(a+1, b+1, c+1))
else:
print("NO")
If this solution is still too slow, perphaps it may be more efficient to solve the Clique Cover problem instead. If that's the case, I can try to find a proper algorithm for it.
Hope that helps!
Well i have implemented the job in JS where I feel most confident. I also tried with 100000 edges which are randomly selected from 26 letters. Provided that they are all unique and not a point such as ["A",A"] it resolves in like 90~500 msecs. The most convoluted part was to obtain the nonidentical groups, those without just the order of the triangles changing. For the given edges data it resolves within 1 msecs.
As a summary the first reduce stage finds the triangles and the second reduce stage groups the disconnected ones.
function getDisconnectedTriangles(edges){
return edges.reduce(function(p,e,i,a){
var ce = a.slice(i+1)
.filter(f => f.some(n => e.includes(n))), // connected edges
re = []; // resulting edges
if (ce.length > 1){
re = ce.reduce(function(r,v,j,b){
var xv = v.find(n => e.indexOf(n) === -1), // find the external vertex
xe = b.slice(j+1) // find the external edges
.filter(f => f.indexOf(xv) !== -1 );
return xe.length ? (r.push([...new Set(e.concat(v,xe[0]))]),r) : r;
},[]);
}
return re.length ? p.concat(re) : p;
},[])
.reduce((s,t,i,a) => t.used ? s
: (s.push(a.map((_,j) => a[(i+j)%a.length])
.reduce((p,c,k) => k-1 ? p.every(t => t.every(n => c.every(v => n !== v))) ? (c.used = true, p.push(c),p) : p
: [p].every(t => t.every(n => c.every(v => n !== v))) ? (c.used = true, [p,c]) : [p])),s)
,[]);
}
var edges = [["A","C"],["B","F"],["A","D"],["D","C"],["F","E"],["E","B"],["A","B"],["B","C"],["E","D"],["F","D"]],
ps = 0,
pe = 0,
result = [];
ps = performance.now();
result = getDisconnectedTriangles(edges);
pe = performance.now();
console.log("Disconnected triangles are calculated in",pe-ps, "msecs and the result is:");
console.log(result);
You may generate random edges in different lengths and play with the code here

Four nested for loops optimization - I promise I searched

I've tried to find a good way to speed up the code for a problem I've been working on. The basic idea of the code is very simple. There are five inputs:
Four 1xm (for some m < n, they can be different sizes) matrices (A, B, C, D) that are pairwise-disjoint subsets of {1,2,...,n} and one nxn symmetric binary matrix (M). The basic idea for the code is to check an inequality for for every combination of elements and if the inequality holds, return the values that cause it to hold, i.e.:
for a = A
for b = B
for c = C
for d = D
if M(a,c) + M(b,d) < M(a,d) + M(b,c)
result = [a b c d];
return
end
end
end
end
end
I know there has to be a better way to do this. First, since it's symmetric, I can cut down half of the items checked since M(a,b) = M(b,a). I've been researching vectorization, found several functions I'd never heard of with MATLAB (since I'm relatively new), but I can't find anything that will particularly help me with this specific problem. I've thought of other ways to approach the problem, but nothing has been perfected, and I just don't know what to do at this point.
For example, I could possibly split this into two cases:
1) The right hand side is 1: then I have to check that both terms on the left side are 0.
2) The right hand side is 2: then I have to check that at least one term on the left hand side is 0.
But, again, I won't be able to avoid nesting.
I appreciate all the help you can offer. Thank you!
You're asking two questions here: (1) is there a more efficient algorithm to perform this search, and (2) how can I vectorize this in MATLAB. The first one is very interesting to think about, but may be a little beyond the scope of this forum. The second one is easier to answer.
As pointed out in the comments below your question, you can vectorize the for loop by enumerating all of the possibilities and checking them all together, and the answers from this question can help:
[a,b,c,d] = ndgrid(A,B,C,D); % Enumerate all combos
a=a(:); b=b(:); c=c(:); d=d(:); % Reshape from 4-D matrices to vectors
ac = sub2ind(size(M),a,c); % Convert subscript pairs to linear indices
bd = sub2ind(size(M),b,d);
ad = sub2ind(size(M),a,d);
bc = sub2ind(size(M),b,c);
mask = (M(ac) + M(bd) < M(ad) + M(bc)); % Test the inequality
results = [a(mask), b(mask), c(mask), d(mask)]; % Select the ones that pass
Again, this isn't an algorithmic change: it still has the same complexity as your nested for loop. The vectorization may cause it to run faster, but it also lacks early termination, so in certain cases it may be slower.
Since M is binary, we can think about this as a graph problem. i,j in {1..n} correspond to nodes, and M(i,j) indicates whether there is an undirected edge connecting them.
Since A,B,C,D are disjoint, that simplifies the problem a bit. We can approach the problem in stages:
Find all (c,d) for which there exists a such that M(a,c) < M(a,d). Let's call this set CD_lt_a, (the subset of C*D such that the "less than" inequality holds for some a).
Find all (c,d) for which there exists a such that M(a,c) <= M(a,d), and call this set CD_le_a.
Repeat for b, forming CD_lt_b for M(b,d) < M(b,c) and CD_le_b for M(b,d)<=M(b,c).
One way to satisfy the overall inequality is for M(a,c) < M(a,d) and M(b,d) <= M(b,c), so we can look at the intersection of CD_lt_a and CD_le_b.
The other way is if M(a,c) <= M(a,d) and M(b,d) < M(b,c), so look at the intersection of CD_le_a and CD_lt_b.
With (c,d) known, we can go back and find the (a,b).
And so my implementation is:
% 0. Some preliminaries
% Get the size of each set
mA = numel(A); mB = numel(B); mC = numel(C); mD = numel(D);
% 1. Find all (c,d) for which there exists a such that M(a,c) < M(a,d)
CA_linked = M(C,A);
AD_linked = M(A,D);
CA_not_linked = ~CA_linked;
% Multiplying these matrices tells us, for each (c,d), how many nodes
% in A satisfy this M(a,c)<M(a,d) inequality
% Ugh, we need to cast to double to use the matrix multiplication
CD_lt_a = (CA_not_linked * double(AD_linked)) > 0;
% 2. For M(a,c) <= M(a,d), check that the converse is false for some a
AD_not_linked = ~AD_linked;
CD_le_a = (CA_linked * double(AD_not_linked)) < mA;
% 3. Repeat for b
CB_linked = M(C,B);
BD_linked = M(B,D);
CD_lt_b = (CB_linked * double(~BD_linked)) > 0;
CD_le_b = (~CB_linked * double(BD_linked)) < mB;
% 4. Find the intersection of CD_lt_a and CD_le_b - this is one way
% to satisfy the inequality M(a,c)+M(b,d) < M(a,d)+M(b,c)
CD_satisfy_ineq_1 = CD_lt_a & CD_le_b;
% 5. The other way to satisfy the inequality is CD_le_a & CD_lt_b
CD_satisfy_ineq_2 = CD_le_a & CD_lt_b;
inequality_feasible = any(CD_satisfy_ineq_1(:) | CD_satisfy_ineq_2(:));
Note that you can stop here if feasibility is your only concern. The complexity is A*C*D + B*C*D, which is better than the worst-case A*B*C*D complexity of the for loop. However, early termination means your nested for loops may still be faster in certain cases.
The next block of code enumerates all the a,b,c,d that satisfy the inequality. It's not very well optimized (it appends to a matrix from within a loop), so it can be pretty slow if there are many results.
% 6. With (c,d) known, find a and b
% We can define these functions to help us search
find_a_lt = #(c,d) find(CA_not_linked(c,:)' & AD_linked(:,d));
find_a_le = #(c,d) find(CA_not_linked(c,:)' | AD_linked(:,d));
find_b_lt = #(c,d) find(CB_linked(c,:)' & ~BD_linked(:,d));
find_b_le = #(c,d) find(CB_linked(c,:)' | ~BD_linked(:,d));
% I'm gonna assume there aren't too many results, so I will be appending
% to an array inside of a for loop. Bad for performance, but maybe a bit
% more readable for a StackOverflow answer.
results = zeros(0,4);
% Find those that satisfy it the first way
[c_list,d_list] = find(CD_satisfy_ineq_1);
for ii = 1:numel(c_list)
c = c_list(ii); d = d_list(ii);
a = find_a_lt(c,d);
b = find_b_le(c,d);
% a,b might be vectors, in which case all combos are valid
% Many ways to find all combos, gonna use ndgrid()
[a,b] = ndgrid(a,b);
% Append these to the growing list of results
abcd = [a(:), b(:), repmat([c d],[numel(a),1])];
results = [results; abcd];
end
% Repeat for the second way
[c_list,d_list] = find(CD_satisfy_ineq_2);
for ii = 1:numel(c_list)
c = c_list(ii); d = d_list(ii);
a = find_a_le(c,d);
b = find_b_lt(c,d);
% a,b might be vectors, in which case all combos are valid
% Many ways to find all combos, gonna use ndgrid()
[a,b] = ndgrid(a,b);
% Append these to the growing list of results
abcd = [a(:), b(:), repmat([c d],[numel(a),1])];
results = [results; abcd];
end
% Remove duplicates
results = unique(results, 'rows');
% And actually these a,b,c,d will be indices into A,B,C,D because they
% were obtained from calling find() on submatrices of M.
if ~isempty(results)
results(:,1) = A(results(:,1));
results(:,2) = B(results(:,2));
results(:,3) = C(results(:,3));
results(:,4) = D(results(:,4));
end
I tested this on the following test case:
m = 1000;
A = (1:m); B = A(end)+(1:m); C = B(end)+(1:m); D = C(end)+(1:m);
M = rand(D(end),D(end)) < 1e-6; M = M | M';
I like to think that first part (see if the inequality is feasible for any a,b,c,d) worked pretty well. The other vectorized answers (that use ndgrid or combvec to enumerate all combinations of a,b,c,d) would require 8 terabytes of memory for a problem of this size!
But I would not recommend running the second part (enumerating all of the results) when there are more than a few hundred c,d that satisfy the inequality, because it will be pretty damn slow.
P.S. I know I answered already, but that answer was about vectorizing such loops in general, and is less specific to your particular problem.
P.P.S. This kinda reminds me of the stable marriage problem. Perhaps some of those references would contain algorithms relevant to your problem as well. I suspect that a true graph-based algorithm could probably achieve the worst-case complexity as this while additionally offering early termination. But I think it would be difficult to implement a graph-based algorithm efficiently in MATLAB.
P.P.P.S. If you only want one of the feasible solutions, you can simplify step 6 to only return a single value, e.g.
find_a_lt = #(c,d) find(CA_not_linked(c,:)' & AD_linked(:,d), 1, 'first');
find_a_le = #(c,d) find(CA_not_linked(c,:)' | AD_linked(:,d), 1, 'first');
find_b_lt = #(c,d) find(CB_linked(c,:)' & ~BD_linked(:,d), 1, 'first');
find_b_le = #(c,d) find(CB_linked(c,:)' | ~BD_linked(:,d), 1, 'first');
if any(CD_satisfy_ineq_1)
[c,d] = find(CD_satisfy_ineq_1, 1, 'first');
a = find_a_lt(c,d);
b = find_a_le(c,d);
result = [A(a), B(b), C(c), D(d)];
elseif any(CD_satisfy_ineq_2)
[c,d] = find(CD_satisfy_ineq_2, 1, 'first');
a = find_a_le(c,d);
b = find_a_lt(c,d);
result = [A(a), B(b), C(c), D(d)];
else
result = zeros(0,4);
end
If you have access to the Neural Network Toolbox, combvec could be helpful here.
running allCombs = combvec(A,B,C,D) will give you a (4 by m1*m2*m3*m4) matrix that looks like:
[...
a1, a1, a1, a1, a1 ... a1... a2... am1;
b1, b1, b1, b1, b1 ... b2... b1... bm2;
c1, c1, c1, c1, c2 ... c1... c1... cm3;
d1, d2, d3, d4, d1 ... d1... d1... dm4]
You can then use sub2ind and Matrix Indexing to setup the two values you need for your inequality:
indices = [sub2ind(size(M),allCombs(1,:),allCombs(3,:));
sub2ind(size(M),allCombs(2,:),allCombs(4,:));
sub2ind(size(M),allCombs(1,:),allCombs(4,:));
sub2ind(size(M),allCombs(2,:),allCombs(3,:))];
testValues = M(indices);
testValues(5,:) = (testValues(1,:) + testValues(2,:) < testValues(3,:) + testValues(4,:))
Your final a,b,c,d indices could be retrieved by saying
allCombs(:,find(testValues(5,:)))
Which would print a matrix with all columns which the inequality was true.
This article might be of some use.

find all indices of multiple value pairs in a matrix

Suppose I have a matrix A, containing possible value pairs and a matrix B, containing all value pairs:
A = [1,1;2,2;3,3];
B = [1,1;3,4;2,2;1,1];
I would like to create a matrix C that contains all pairs that are allowed by A (i.e. C = [1,1;2,2;1,1]).
Using C = ismember(A,B,'rows') will only show the first occurence of 1,1, but I need both.
Currently I use a for-loop to create C, which looks like:
TFtot = false(size(B(:,1,1),1);
for i = 1:size(a(:,1),1)
TF1 = A(i,1) == B(:,1) & A(i,2) = B(:,2);
TFtot = TF1 | TFtot;
end
C = B(TFtot,:);
I would like to create a faster approach, because this loop currently greatly slows down the algorithm.
You're pretty close. You just need to swap B and A, then use this output to index into B:
L = ismember(B, A, 'rows');
C = B(L,:);
How ismember works in this particular case is that it outputs a logical vector that has the same number of rows as B where the ith value in B tells you whether we have found this ith row somewhere in A (logical 1) or if we haven't found this row (logical 0).
You want to select out those entries in B that are seen in A, and so you simply use the output of ismember to slice into B to extract out the affected rows, and grab all of the columns.
We get for C:
>> C
C =
1 1
2 2
1 1
Here's an alternative using bsxfun:
C = B(all(any(bsxfun(#eq, B, permute(A, [3 2 1])),3),2),:);
Or you could use pdist2 (Statistics Toolbox):
B(any(~pdist2(A,B),1),:);
Using matrix-multiplication based euclidean distance calculations -
Bt = B.'; %//'
[m,n] = size(A);
dists = [A.^2 ones(size(A)) -2*A ]*[ones(size(Bt)) ; Bt.^2 ; Bt];
C = B(any(dists==0,1),:);

The Movie Scheduling _Problem_

Currently I'm reading "The Algorithm Design Manual" by Skiena (well, beginning to read)
He asks a problem he calls the "Movie Scheduling Problem":
Problem: Movie Scheduling Problem
Input: A set I of n intervals on the line.
Output: What is the largest subset of mutually non-overlapping intervals which can
be selected from I?
Example: (Each dashed line is a movie, you want to find a set with the highest quantity of movies)
----a---
-----b---- -----c--- ---d---
-----e--- -------f---
--g-- --h--
The algorithm I thought of to solve it was like this:
I could throw out the "worst offender" (intersects with the most other movies) until there are no worst offenders (zero intersections). The only problem I see is that if there is a tie (say two different movies each intersect with 3 other movies) could it matter which one I throw out?
Basically I'm wondering how I go about turning the idea into "math" and how to prove it correct/incorrect.
The algorithm is incorrect. Let's consider the following example:
Counterexample
|----F----| |-----G------|
|-------D-------| |--------E--------|
|-----A------| |------B------| |------C-------|
You can see that there is a solution of size at least 3 because you can pick A, B and C.
Firstly, let's count, for each interval the number of intersections:
A = 2 [F, D]
B = 4 [D, F, E, G]
C = 2 [E, G]
D = 3 [A, B, F]
E = 3 [B, C, G]
F = 3 [A, B, D]
G = 3 [B, C, E]
Now consider a run of your algorithm. In the first step we delete B because it intersects with the most number of invervals and we get:
|----F----| |-----G------|
|-------D-------| |--------E--------|
|-----A------| |------C-------|
It's easy to see that now from {A, D, F} you can choose only one, because each pair intersects. The same case with {G, E, C}, so after deleting B, you can choose at most one from {A, D, F} and at most one from {G, E, C}, to get the total of 2, which is smaller than the size of {A, B, C}.
The conclusion is, that after deleting B which intersects with the most number of invervals, you can't get the maximum number of nonintersecting movies.
Correct solution
The problem is very well known and one solution is to pick the interval which ends first, delete all intervals intersecting with it and continue until there are no intervals to examine. This is an example of a greedy method and you can find or develop a proof that it's correct.
This looks like a dynamic programming problem to me:
Define the following functions:
sched(t) = best schedule starting at time t
next(t) = set of movies that start next after time t
len(m) = length of movie m
next returns a set because there may be more than one movie that starts at the same time.
then sched should be defined as follows:
sched(t) = max { 1 + sched(t + len(m)), sched(t+1) } where m in next(t)
This recursive function selects a movie m from next(t) and compares the largest possible sets that either include or don't include m.
Invoke sched with the time of your first movie and you will get the size of the optimal set. Getting the optimal set itself just requires a little extra logic to remember which movies you select at each invocation.
I think this recursive (as opposed to iterative) algorithm runs in O(n^2) if you use memoization, where n is the number of movies.
It's correct, but I'd have to consult my algorithms textbook to give you an explicit proof, but hopefully this algorithm makes intuitive sense why it is correct.
# go through the database and create a 2-D matrix indexed a..h by a..h. Set each
# element of the matrix to 1 if the row index movie overlaps the column index movie.
mtx = []
for i in range(8):
column = []
for j in range(8):
column.append(0)
mtx.append(column)
# b <> e
mtx[1][4] = 1
mtx[4][1] = 1
# e <> g
mtx[4][6] = 1
mtx[6][4] = 1
# e <> c
mtx[4][2] = 1
mtx[2][4] = 1
# c <> a
mtx[2][0] = 1
mtx[0][2] = 1
# c <> f
mtx[2][5] = 1
mtx[5][2] = 1
# c <> g
mtx[2][6] = 1
mtx[6][2] = 1
# c <> h
mtx[2][7] = 1
mtx[7][2] = 1
# d <> f
mtx[3][5] = 1
mtx[5][3] = 1
# a <> f
mtx[0][5] = 1
mtx[5][0] = 1
# a <> d
mtx[0][3] = 1
mtx[3][0] = 1
# a <> h
mtx[0][7] = 1
mtx[7][0] = 1
# g <> e
mtx[4][7] = 1
mtx[7][4] = 1
# print out contstraints
for line in mtx:
print line
# keep track of which movies are still allowed
allowed = set(range(8))
# loop through in greedy fashion, picking movie that throws out the least
# number of other movies at each step
best = 8
while best > 0:
best_col = None
best_lost = set()
best = 8 # score if move does not overlap with any other
# each step, only try movies still allowed
for col in allowed:
lost = set()
for row in range(8):
# keep track of other movies eliminated by this selection
if mtx[row][col] == 1:
lost.add(row)
# this was the best of all the allowed choices so far
if len(lost) < best:
best_col = col
best_lost = lost
best = len(lost)
# there was a valid selection, process
if best_col > 0:
print 'watch movie: ', str(unichr(best_col+ord('a')))
for row in best_lost:
# now eliminate the other movies you can't now watch
if row in allowed:
print 'throwing out: ', str(unichr(row+ord('a')))
allowed.remove(row)
# also throw out this movie from the allowed list (can't watch twice)
allowed.remove(best_col)
# this is just a greedy algorithm, not guaranteed optimal!
# you could also iterate through all possible combinations of movies
# and simply eliminate all illegal possibilities (brute force search)

Generating a wanted number by bitwise OR

Given N integer intervals [lo_i,hi_i].
From each interval chose a number such that bitwise OR of them become given number X.(It doesn't matter if the result has more 1 bits than X; i.e. if the generated number is Y, (X&Y)==X should hold)
I guess this problem is NP complete, though I haven't found an NP hard problem easily reducible to this.
But for those sets that contain 2^(mostSignificantDigit) - 1, I would do as a heuristic: Firstly, try the number 1...1 (mostSignificantDigit-1 ones), secondly a number with the most significant bit and as many other bits as possible set. This heuristic is only bad in the case that you would have required a number from the set with the most significant bit set and a few different less significant bits.
With this heuristic, you can also pick amongst those sets the largest number 1....1 as a further heuristic.
Let's generalize the problem a little. I'm going to write bitwise operators like OR and AND and SR (shift right).
Given a natural number X, intervals [lo_1, hi_1], ..., [lo_N, hi_N] consisting of natural numbers, and a bit b in {0, 1}, determine whether there exist natural numbers y_1 in [lo_1, hi_1], ..., y_N in [lo_N, hi_N] such that, letting Y = y_1 OR ... OR y_N, it holds that (X AND Y) = X and that there exists i such that x_i <= hi_i - b.
The base case for my recursive algorithm is when lo_1 = hi_1 = lo_2 = ... = hi_n = 0. There exists a solution if and only if X = 0 and b = 0.
Inductively, prepare a subproblem by letting X' = X SR 1 and lo_i' = lo_i SR 1 and hi_i' = hi_i SR 1. Let Odd(i) be true if and only if hi_i AND 1 = 1. Let Odd+(i) be true if and only if Odd(i) and lo_i < hi_i. If X AND 1 = 0:
If there exists i such that Odd+(i), then let b' = 0. Otherwise, let b' = b.
If X AND 1 = 1:
If there exist distinct i and j such that Odd+(i) and Odd(j), then let b' = 0. If there exists no j such that Odd(j), then let b' = 1. Otherwise, let b' = b.
Return the answer for the subproblem.

Resources