Finding certain arrangements of all 2-combinatons for a given list - algorithm

Given a list L of an even number (2k) of elements, I'm looking for an algorithm to produce a list of 2k-1 sublists with the following properties:
each sublist includes exactly k 2-combinations (pairs where the order does not matter) of elements from L,
each sublist includes every elements from L exactly once, and
the union of all elements from all sublists is exactly the set of all possible 2-combinations of the elements from L.
For example, if the input list is L = [a, b, c, d], we have k = 2 with 3 sublists, each including 2 pairs. A possible solution would look like [[ab, cd], [ac, bd], [ad, bc]]. If we ignore the ordering for all elements in the lists (think of all lists as sets), it turns out that this is also the only solution for k = 2.
My aim now is not only to find a single solution but all possible solutions. As the number of involved combinations grows pretty quickly, it would be nice to have all results be constructed in a clever way instead of generating a huge list of candidates and removing the elements from it that don't satisfy the given properties. Such a naïve algorithm could look like the following:
Find the set C of all 2-combinations for L.
Find the set D of all k-combinations for C.
Choose all sets from D that union equals L, call the new set D'.
Find the set E of all (2k-1)-combinations for D'.
Choose all sets from E that union is the set C, and let the new set be the final output.
This algorithm is easy to implement but it's incredibly slow for bigger input lists. So is there a way to construct the result list more efficently?
Edit: Here is the result for L = [a,b,c,d,e,f] with k = 3, calculated by the above algorithm:
[[[ab,cd,ef],[ac,be,df],[ad,bf,ce],[ae,bd,cf],[af,bc,de]],
[[ab,cd,ef],[ac,bf,de],[ad,be,cf],[ae,bc,df],[af,bd,ce]],
[[ab,ce,df],[ac,bd,ef],[ad,be,cf],[ae,bf,cd],[af,bc,de]],
[[ab,ce,df],[ac,bf,de],[ad,bc,ef],[ae,bd,cf],[af,be,cd]],
[[ab,cf,de],[ac,bd,ef],[ad,bf,ce],[ae,bc,df],[af,be,cd]],
[[ab,cf,de],[ac,be,df],[ad,bc,ef],[ae,bf,cd],[af,bd,ce]]]
All properties are satisfied:
each sublist has k = 3 2-combinations,
each sublist only includes each element once, and
the union of all 2k-1 = 5 sublists for one solution is exactly the set of all possible 2-combinations for L.
Edit 2: Based on user58697's answer, I improved the calculation algorithm by using the round-robin tournament scheduling:
Let S be the result set, starting with an empty set, and P be the set of all permutations of L.
Repeat the following until P is empty:
Select an arbitrary permutation from P
Perform full RRT scheduling for this permutation. In each round, the arrangement of elements from L forms a permutation of L. Remove all these 2k permutations from P.
Add the resulting schedule to S.
Remove all lists from S if the union of their sublists has duplicate elements (i.e. doesn't add up to all 2-combinations of L).
This algorithm is much more performant than the first one. I was able to calculate the number of results for k = 4 as 960 and k = 5 as 67200. The fact that there doesn't seem to be an OEIS result for this sequence makes me wonder if the numbers are actually correct, though, i.e. if the algorithm is producing the complete solution set.

It is a round-robin tournament scheduling:
A pair is a match,
A list is a round (each team plays with some other team)
A set of list is an entire tournament (each team plays each other team exactly once).
Take a look here.

This was an interesting question. In the process of answering it (basically after writing the program included below, and looking up the sequence on OEIS), I learned that the problem has a name and rich theory: what you want is to generate all 1-factorizations of the complete graph K2k.
Let's first restate the problem in that language:
You are given a number k, and a list (set) L of size 2k. We can view L as the vertex set of a complete graph K2k.
For example, with k=3, L could be {a, b, c, d, e, f}
A 1-factor (aka perfect matching) is a partition of L into unordered pairs (sets of size 2). That is, it is a set of k pairs, whose disjoint union is L.
For example, ab-cd-ef is a 1-factor of L = {a, b, c, d, e, f}. This means that a is matched with b, c is matched with d, and e is matched with f. This way, L has been partitioned into three sets {a, b}, {c, d}, and {e, f}, whose union is L.
Let S (called C in the question) denote the set of all pairs of elements of L. (In terms of the complete graph, if L is its vertex set, S is its edge set.) Note that S contains (2k choose 2) = k(2k-1) pairs. So for k = 0, 1, 2, 3, 4, 5, 6…, S has size 0, 1, 6, 15, 28, 45, 66….
For example, S = {ab, ac, ad, ae, af, bc, bd, be, bf, cd, ce, cf, de, df, ef} for our L above (k = 3, so |S| = k(2k-1) = 15).
A 1-factorization is a partition of S into sets, each of which is itself a 1-factor (perfect matching). Note that as each of these matchings has k pairs, and S has size k(2k-1), the partition has size 2k-1 (i.e., is made of 2k-1 matchings).
For example, this is a 1-factorization: {ab-cd-ef, ac-be-df, ad-bf-ce, ae-bd-cf, af-bc-de}
In other words, every element of S (every pair) occurs in exactly one element of the 1-factorization, and every element of L occurs exactly once in each element of the 1-factorization.
The problem asks to generate all 1-factorizations.
Let M denote the set of all 1-factors (all perfect matchings) of L. It is easy to prove that M contains (2k)!/(k!2^k) = 1×3×5×…×(2k-1) matchings. For k = 0, 1, 2, 3, 4, 5, 6…, the size of M is 1, 1, 3, 15, 105, 945, 10395….
For example, for our L above, M = {ab-cd-ef, ab-ce-df, ab-cf-de, ac-bd-ef, ac-be-df, ac-bf-de, ad-bc-ef, ad-be-cf, ad-bf-ce, ae-bc-df, ae-bd-cf, ae-bf-cd, af-bc-de, af-bd-ce, af-be-cd} (For k=3 this number 15 is the same as the number of pairs, but this is just a coincidence as you can from the other numbers: this number grows much faster than the number of pairs.)
M is easy to generate:
def perfect_matchings(l):
if len(l) == 0:
yield []
for i in range(1, len(l)):
first_pair = l[0] + l[i]
for matching in perfect_matchings(l[1:i] + l[i+1:]):
yield [first_pair] + matching
For example, calling perfect_matchings('abcdef') yields the 15 elements ['ab', 'cd', 'ef'], ['ab', 'ce', 'df'], ['ab', 'cf', 'de'], ['ac', 'bd', 'ef'], ['ac', 'be', 'df'], ['ac', 'bf', 'de'], ['ad', 'bc', 'ef'], ['ad', 'be', 'cf'], ['ad', 'bf', 'ce'], ['ae', 'bc', 'df'], ['ae', 'bd', 'cf'], ['ae', 'bf', 'cd'], ['af', 'bc', 'de'], ['af', 'bd', 'ce'], ['af', 'be', 'cd'] as expected.
By definition, a 1-factorization is a partition of S into elements from M. Or equivalently, any (2k-1) disjoint elements of M form a 1-factorization. This lends itself to a straightforward backtracking algorithm:
start with an empty list (partial factorization)
for each matching from the list of perfect matchings, try adding it to the current partial factorization, i.e. check whether it's disjoint (it should not contain any pair already used)
if fine, add it to the partial factorization, and try extending
In code:
matching_list = []
pair_used = defaultdict(lambda: False)
known_matchings = [] # Populate this list using perfect_matchings()
def extend_matching_list(r, need):
"""Finds ways of extending the matching list by `need`, using matchings r onwards."""
if need == 0:
use_result(matching_list)
return
for i in range(r, len(known_matchings)):
matching = known_matchings[i]
conflict = any(pair_used[pair] for pair in matching)
if conflict:
continue # Can't use this matching. Some of its pairs have already appeared.
# Else, use this matching in the current matching list.
for pair in matching:
pair_used[pair] = True
matching_list.append(matching)
extend_matching_list(i + 1, need - 1)
matching_list.pop()
for pair in matching:
pair_used[pair] = False
If you call it with extend_matching_list(0, len(l) - 1) (after populating known_matchings), it generates all 1-factorizations. I've put the full program that does this here. With k=4 (specifically, the list 'abcdefgh'), it outputs 6240 1-factorizations; the full output is here.
It was at this point that I fed the sequence 1, 6, 6240 into OEIS, and discovered OEIS A000438, sequence 1, 1, 6, 6240, 1225566720, 252282619805368320,…. It shows that for k=6, the number of solutions ≈2.5×1017 means that we can give up hope of generating all solutions. Even for k=5, the ≈1 billion solutions (recall that we're trying to find 2k-1=9 disjoint sets out of the |M|=945 matchings) will require some carefully optimized programs.
The first optimization (which, embarrassingly, I only realized later by looking closely at trace output for k=4) is that (under natural lexicographic numbering) the index of the first matching chosen in the partition cannot be greater than the number of matchings for k-1. This is because the lexicographically first element of S (like "ab") occurs only in those matchings, and if we start later than this one we'll never find it again in any other matching.
The second optimization comes from the fact that the bottleneck of a backtracking program is usually the testing for whether a current candidate is admissible. We need to test disjointness efficiently: whether a given matching (in our partial factorization) is disjoint with the union of all previous matchings. (Whether any of its k pairs is one of the pairs already covered by earlier matchings.) For k=5, it turns out that the size of S, which is (2k choose 2) = 45, is less than 64, so we can compactly represent a matching (which is after all a subset of S) in a 64-bit integer type: if we number the pairs as 0 to 44, then any matching can be represented by an integer having 1s in the positions corresponding to elements it contains. Then testing for disjointness is a simple bitwise operation on integers: we just check whether the bitwise-AND of the current candidate matching and the cumulative union (bitwise-OR) of previous matchings in our partial factorization is zero.
A C++ program that does this is here, and just the backtracking part (specialized for k=5) does not need any C++ features so it's extracted out as a C program here. It runs in about 4–5 hours on my laptop, and finds all 1225566720 1-factorizations.
Another way to look at this problem is to say that two elements of M have an edge between them if they intersect (have a pair (element of S) in common), and that we're looking for all maximum independent set in M. Again, the simplest way to solve that problem would still probably be backtracking (we'd write the same program).
Our programs can be made quite a lot more efficient by exploiting the symmetry in our problem: for example we could pick any matching as our first 1-factor in the 1-factorization (and then generate the rest by relabelling, being careful not to avoid duplicates). This is how the number of 1-factorizations for K12 (the current record) was calculated.
A note on the wisdom of generating all solutions
In The Art of Computer Programming Volume 4A, at the end of section 7.2.1.2 Generating All Permutations, Knuth has this important piece of advice:
Think twice before you permute. We have seen several attractive algorithms for permutation generation in this section, but many algorithms are known by which permutations that are optimum for particular purposes can be found without running through all possibilities. For example, […] the best way to arrange records on a sequential storage […] takes only O(n log n) steps. […] the assignment problem, which asks how to permute the columns of a square matrix so that the sum of the diagonal elements is maximized […] can be solved in at most O(n3) operations, so it would be foolish to use a method of order n! unless n is extremely small. Even in cases like the traveling salesrep problem, when no efficient algorithm is known, we can usually find a much better approach than to examine every possible solution. Permutation generation is best used when there is good reason to look at each permutation individually.
This is what seems to have happened here (from the comments below the question):
I wanted to calculate all solutions to run different attribute metrics on these and find an optional match […]. As the number of results seems to grow quicker than expected, this is impractical.
Generally, if you're trying to "generate all solutions" and you don't have a very good reason for looking at each one (and one almost never does), there are many other approaches that are preferable, ranging from directly trying to solve an optimization problem, to generating random solutions and looking at them, or generating solutions from some subset (which is what you seem to have done).
Further reading
Following up references from OEIS led to a rich history and theory.
On 1-factorizations of the complete graph and the relationship to round robin schedules, Gelling (M. A. Thesis), 1973
On the number of 1-factorizations of the complete graph, Charles C Lindner, Eric Mendelsohn, Alexander Rosa (1974?) -- this shows that the number of nonisomorphic 1-factorizations on K2n goes to infinity as n goes to infinity.
E. Mendelsohn and A. Rosa. On some properties of 1-factorizations of complete graphs. Congr. Numer, 24 (1979): 739–752
E. Mendelsohn and A. Rosa. One factorizations of the complete graph: A survey. Journal of Graph Theory, 9 (1985): 43–65 (As long ago as 1985, this exact question was studied well-enough to need a survey!)
Via papers of Dinitiz:
D. K. Garnick and J. H. Dinitz, On the number of one-factorizations of the complete graph on 12 points, Congressus Numerantium, 94 (1993), pp. 159-168. They announced they were computing the number of nonisomorphic 1-factorizations of K12. Their algorithm was basically backtracking.
Jeffrey H. Dinitz, David K. Garnick, Brendan D. McKay: There are 526,915,620 nonisomorphic one-factorizations of K12 (also here), Journal of Combinatorial Designs 2 (1994), pp. 273 - 285: They completed the computation, and reported the numbers they found for K12 (526,915,620 nonisomorphic, 252,282,619,805,368,320 total).
Various One-Factorizations of Complete Graphs by Gopal, Kothapalli, Venkaiah, Subramanian (2007). A paper that is relevant to this question, and has many useful references.
W. D. Wallis, Introduction to Combinatorial Designs, Second Edition (2007). Chapter 10 is "One-Factorizations", Chapter 11 is "Applications of One-Factorizations". Both are very relevant and have many useful references.
Charles J. Colbourn and Jeffrey H. Dinitz, Handbook of Combinatorial Designs, Second Edition (2007). A goldmine. See chapters VI.3 Balanced Tournament Designs, VI.51 Scheduling a Tournament, VII.5 Factorizations of Graphs (including its sections 5.4 Enumeration and Tables, 5.5 Some 1-Factorizations of Complete Graphs), VII.6 Computational Methods in Design Theory (6.2 Exhaustive Search). This last chapter references:
[715] How K12 was calculated ("orderly algorithm"), a backtracking -- the Dinitz-Garnick-McKay paper mentioned above
[725] “Contains, among many other subjects related to factorization, a fast algorithm for finding 1-factorizations of K2n.” ("Room squares and related designs", J. H. Dinitz and S. R. Stinson)
[1270] (P. Kaski and P. R. J. Östergård, One-factorizations of regular graphs of order 12, Electron. J. Comb. 12, Research Paper 2, 25 pp. (2005))
[1271] “Contains the 1-factorizations of complete graphs up to order 10 in electronic form.” (P. Kaski and P. R. J. Östergård, Classification Algorithms for Codes and Designs, Springer, Berlin, 2006.)
[1860] “A survey on perfect 1-factorizations of K2n” (E. S. Seah, Perfect one-factorizations of the complete graph—A survey, Bull. Inst. Combin. Appl. 1 (1991) 59–70)
[2107] “A survey of 1-factorizations of complete graphs including most of the material of this chapter.” W. D. Wallis, One-factorizations of complete graphs, in Dinitz and Stinson (ed), Contemporary Design Theory, 1992
[2108] “A book on 1-factorizations of graphs.” W. D. Wallis, "One-Factorizations", Kluwer, Dordrecht, 1997
Some other stuff:
*Factors and Factorizations of Graphs by Jin Akiyama and Mikio Kano (2007). This looks like a great book. “Frank Harary predicted that graph theory will grow so much that each chapter of his book Graph Theory will eventually expand to become a book on its own. He was right. This book is an expansion of his Chapter 9, Factorization.” There's not much about this particular topic (1-factorizations of complete graphs), but there is a proof in Chapter 4 (Theorem 4.1.1) that K2n always has a 1-factorization.
Papers on special types of 1-factorizations:
[Symmetry Groups Of] Some Perfect 1-Factorizations Of Complete Graphs, B. A. Anderson, 1977 (1973). Considers 1-factorizations that are in fact "perfect", having the property that the union of any two 1-factors (matchings) is a Hamiltonian cycle. (There's one up to isomorphism for K2k k ≤ 5, and two for K12.)
On 4-semiregular 1-factorizations of complete graphs and complete bipartite graphs.
Low Density MDS Codes and Factors of Complete Graphs -- also about perfect 1-factorizations
Self-invariant 1-Factorizations of Complete Graphs and Finite Bol Loops of Exponent 2
See also OEIS index entry for [sequences related to tournaments].
AMS feature column: Mathematics and Sports (April 2010) -- despite the overly broad name, is quite related.

Related

Distinct sub sequences summing to given number in an array

During my current preparation for interview, I encountered a question for which I am having some difficulty to get optimal solution,
We are given an array A and an integer Sum, we need to find all distinct sub sequences of A whose sum equals Sum.
For eg. A={1,2,3,5,6} Sum=6 then answer should be
{1,2,3}
{1,5}
{6}
Presently I can think of two ways of doing this,
Use Recursion ( which I suppose should be last thing to consider for an interview question)
Use Integer Partitioning to partition Sum and check whether the elements of partition are present in A
Please guide my thoughts.
I agree with Jason. This solution comes to mind:
(complexity is O(sum*|A|) if you represent the map as an array)
Call the input set A and the target sum sum
Have a map of elements B, with each element being x:y, where x (the map key) is the sum and y (the map value) is the number of ways to get to it.
Starting of, add 0:1 to the map - there is 1 way to get to 0 (obviously by using no elements)
For each element a in A, consider each element x:y in B.
If x+a > sum, don't do anything.
If an element with the key x+a already exists in B, say that element is x+a:z, modify it to x+a:y+z.
If an element with the key doesn't exist, simply add x+a:y to the set.
Look up the element with key sum, thus sum:x - x is our desired value.
If B is sorted (or an array), you can simply skip the rest of the elements in B during the "don't do anything" step.
Tracing it back:
The above just gives the count, this will modify it to give the actual subsequences.
At each element in B, instead of the sum, store all the source elements and the elements used to get there (so have a list of pairs at each element in B).
For 0:1 there is no source elements.
For x+a:y, the source element is x and the element to get there is a.
During the above process, if an element with the key already exists, enqueue the pair x/a to the element x+a (enqueue is an O(1) operation).
If an element with the key doesn't exist, simply create a list with one pair x/a at the element x+a.
To reconstruct, simply start at sum and recursively trace your way back.
We have to be careful of duplicate sequences (do we?) and sequences with duplicate elements here.
Example - not tracing it back:
A={1,2,3,5,6}
sum = 6
B = 0:1
Consider 1
Add 0+1
B = 0:1, 1:1
Consider 2
Add 0+2:1, 1+2:1
B = 0:1, 1:1, 2:1, 3:1
Consider 3
Add 0+3:1 (already exists -> add 1 to it), 1+3:1, 2+1:1, 3+1:1
B = 0:1, 1:1, 2:1, 3:2, 4:1, 5:1, 6:1
Consider 5
B = 0:1, 1:1, 2:1, 3:2, 4:1, 5:2, 6:2
Generated sums thrown away = 7:1, 8:2, 9:1, 10:1, 11:1
Consider 6
B = 0:1, 1:1, 2:1, 3:2, 4:1, 5:2, 6:3
Generated sums thrown away = 7:1, 8:1, 9:2, 10:1, 11:2, 12:2
Then, from 6:3, we know we have 3 ways to get to 6.
Example - tracing it back:
A={1,2,3,5,6}
sum = 6
B = 0:{}
Consider 1
B = 0:{}, 1:{0/1}
Consider 2
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2}
Consider 3
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2,0/3}, 4:{1/3}, 5:{2/3}, 6:{3/3}
Consider 5
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2,0/3}, 4:{1/3}, 5:{2/3,0/5}, 6:{3/3,1/5}
Generated sums thrown away = 7, 8, 9, 10, 11
Consider 6
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2,0/3}, 4:{1/3}, 5:{2/3,0/5}, 6:{3/3,1/5,0/6}
Generated sums thrown away = 7, 8, 9, 10, 11, 12
Then, tracing back from 6: (not in {} means an actual element, in {} means a map entry)
{6}
{3}+3
{1}+2+3
{0}+1+2+3
1+2+3
Output {1,2,3}
{0}+3+3
3+3
Invalid - 3 is duplicate
{1}+5
{0}+1+5
1+5
Output {1,5}
{0}+6
6
Output {6}
This is a variant of the subset-sum problem. The subset-sum problem asks if there is a subset that sums to given a value. You are asking for all of the subsets that sum to a given value.
The subset-sum problem is hard (more precisely, it's NP-Complete) which means that your variant is hard too (it's not NP-Complete, because it's not a decision problem, but it is NP-Hard).
The classic approach to the subset-sum problem is either recursion or dynamic programming. It's obvious how to modify the recursive solution to the subset-sum problem to answer your variant. I suggest that you also take a look at the dynamic programming solution to subset-sum and see if you can modify it for your variant (tbc: I do not know if this is actually possible). That would certainly be a very valuable learning exercise whether or not is possible as it would certainly enhance your understanding of dynamic programming either way.
It would surprise me though, if the expected answer to your question is anything but the recursive solution. It's easy to come up with, and an acceptable approach to the problem. Asking for the dynamic programming solution on-the-fly is a bit much to ask.
You did, however, neglect to mention a very naïve approach to this problem: generate all subsets, and for each subset check if it sums to the given value or not. Obviously that's exponential, but it does solve the problem.
I assumed that given array contains distinct numbers.
Let's define function f(i, s) - which means we used some numbers in range [1, i] and the sum of used numbers is s.
Let's store all values in 2 dimensional matrix i.e. in cell (i, j) we will have value for f(i, j). Now if have already calculated values for cells which are located upper or lefter the cell (i, s) we can calculate value for f(i, s) i.e. f(i, s) = f(i - 1, s);(not to take i indexed number) and if(s >= a[i]) f(i, s) += f(i - 1, s - a[i]). And we can use bottom-up approach to fill all the matrix, setting [f(0, 0) = 1; f(0, i) = 0; 1 <= i <= s], [f(i, 0) = 1;1<=i<=n;]. If we calculated all the matrix then we have answer in cell f(n,S); Thus we have total time complexity O(n*s) and memory complexity O(n*s);
We can improve memory complexity if we note that in every iteration we need only information from previous row, it means that we can store matrix of size 2xS not nxS. We reduced memory complexity up to linear to S. This problem is NP complete thus we don't have polynomial algorithm for this and this approach is the best thing.

Algorithm to separate items of the same type

I have a list of elements, each one identified with a type, I need to reorder the list to maximize the minimum distance between elements of the same type.
The set is small (10 to 30 items), so performance is not really important.
There's no limit about the quantity of items per type or quantity of types, the data can be considered random.
For example, if I have a list of:
5 items of A
3 items of B
2 items of C
2 items of D
1 item of E
1 item of F
I would like to produce something like:
A, B, C, A, D, F, B, A, E, C, A, D, B, A
A has at least 2 items between occurences
B has at least 4 items between occurences
C has 6 items between occurences
D has 6 items between occurences
Is there an algorithm to achieve this?
-Update-
After exchanging some comments, I came to a definition of a secondary goal:
main goal: maximize the minimum distance between elements of the same type, considering only the type(s) with less distance.
secondary goal: maximize the minimum distance between elements on every type. IE: if a combination increases the minimum distance of a certain type without decreasing other, then choose it.
-Update 2-
About the answers.
There were a lot of useful answers, although none is a solution for both goals, specially the second one which is tricky.
Some thoughts about the answers:
PengOne: Sounds good, although it doesn't provide a concrete implementation, and not always leads to the best result according to the second goal.
Evgeny Kluev: Provides a concrete implementation to the main goal, but it doesn't lead to the best result according to the secondary goal.
tobias_k: I liked the random approach, it doesn't always lead to the best result, but it's a good approximation and cost effective.
I tried a combination of Evgeny Kluev, backtracking, and tobias_k formula, but it needed too much time to get the result.
Finally, at least for my problem, I considered tobias_k to be the most adequate algorithm, for its simplicity and good results in a timely fashion. Probably, it could be improved using Simulated annealing.
First, you don't have a well-defined optimization problem yet. If you want to maximized the minimum distance between two items of the same type, that's well defined. If you want to maximize the minimum distance between two A's and between two B's and ... and between two Z's, then that's not well defined. How would you compare two solutions:
A's are at least 4 apart, B's at least 4 apart, and C's at least 2 apart
A's at least 3 apart, B's at least 3 apart, and C's at least 4 apart
You need a well-defined measure of "good" (or, more accurately, "better"). I'll assume for now that the measure is: maximize the minimum distance between any two of the same item.
Here's an algorithm that achieves a minimum distance of ceiling(N/n(A)) where N is the total number of items and n(A) is the number of items of instance A, assuming that A is the most numerous.
Order the item types A1, A2, ... , Ak where n(Ai) >= n(A{i+1}).
Initialize the list L to be empty.
For j from k to 1, distribute items of type Ak as uniformly as possible in L.
Example: Given the distribution in the question, the algorithm produces:
F
E, F
D, E, D, F
D, C, E, D, C, F
B, D, C, E, B, D, C, F, B
A, B, D, A, C, E, A, B, D, A, C, F, A, B
This sounded like an interesting problem, so I just gave it a try. Here's my super-simplistic randomized approach, done in Python:
def optimize(items, quality_function, stop=1000):
no_improvement = 0
best = 0
while no_improvement < stop:
i = random.randint(0, len(items)-1)
j = random.randint(0, len(items)-1)
copy = items[::]
copy[i], copy[j] = copy[j], copy[i]
q = quality_function(copy)
if q > best:
items, best = copy, q
no_improvement = 0
else:
no_improvement += 1
return items
As already discussed in the comments, the really tricky part is the quality function, passed as a parameter to the optimizer. After some trying I came up with one that almost always yields optimal results. Thank to pmoleri, for pointing out how to make this a whole lot more efficient.
def quality_maxmindist(items):
s = 0
for item in set(items):
indcs = [i for i in range(len(items)) if items[i] == item]
if len(indcs) > 1:
s += sum(1./(indcs[i+1] - indcs[i]) for i in range(len(indcs)-1))
return 1./s
And here some random result:
>>> print optimize(items, quality_maxmindist)
['A', 'B', 'C', 'A', 'D', 'E', 'A', 'B', 'F', 'C', 'A', 'D', 'B', 'A']
Note that, passing another quality function, the same optimizer could be used for different list-rearrangement tasks, e.g. as a (rather silly) randomized sorter.
Here is an algorithm that only maximizes the minimum distance between elements of the same type and does nothing beyond that. The following list is used as an example:
AAAAA BBBBB CCCC DDDD EEEE FFF GG
Sort element sets by number of elements of each type in descending order. Actually only largest sets (A & B) should be placed to the head of the list as well as those element sets that have one element less (C & D & E). Other sets may be unsorted.
Reserve R last positions in the array for one element from each of the largest sets, divide the remaining array evenly between the S-1 remaining elements of the largest sets. This gives optimal distance: K = (N - R) / (S - 1). Represent target array as a 2D matrix with K columns and L = N / K full rows (and possibly one partial row with N % K elements). For example sets we have R = 2, S = 5, N = 27, K = 6, L = 4.
If matrix has S - 1 full rows, fill first R columns of this matrix with elements of the largest sets (A & B), otherwise sequentially fill all columns, starting from last one.
For our example this gives:
AB....
AB....
AB....
AB....
AB.
If we try to fill the remaining columns with other sets in the same order, there is a problem:
ABCDE.
ABCDE.
ABCDE.
ABCE..
ABD
The last 'E' is only 5 positions apart from the first 'E'.
Sequentially fill all columns, starting from last one.
For our example this gives:
ABFEDC
ABFEDC
ABFEDC
ABGEDC
ABG
Returning to linear array we have:
ABFEDCABFEDCABFEDCABGEDCABG
Here is an attempt to use simulated annealing for this problem (C sources): http://ideone.com/OGkkc.
I believe you could see your problem like a bunch of particles that physically repel eachother. You could iterate to a 'stable' situation.
Basic pseudo-code:
force( x, y ) = 0 if x.type==y.type
1/distance(x,y) otherwise
nextposition( x, force ) = coined?(x) => same
else => x + force
notconverged(row,newrow) = // simplistically
row!=newrow
row=[a,b,a,b,b,b,a,e];
newrow=nextposition(row);
while( notconverged(row,newrow) )
newrow=nextposition(row);
I don't know if it converges, but it's an idea :)
I'm sure there may be a more efficient solution, but here is one possibility for you:
First, note that it is very easy to find an ordering which produces a minimum-distance-between-items-of-same-type of 1. Just use any random ordering, and the MDBIOST will be at least 1, if not more.
So, start off with the assumption that the MDBIOST will be 2. Do a recursive search of the space of possible orderings, based on the assumption that MDBIOST will be 2. There are a number of conditions you can use to prune branches from this search. Terminate the search if you find an ordering which works.
If you found one that works, try again, under the assumption that MDBIOST will be 3. Then 4... and so on, until the search fails.
UPDATE: It would actually be better to start with a high number, because that will constrain the possible choices more. Then gradually reduce the number, until you find an ordering which works.
Here's another approach.
If every item must be kept at least k places from every other item of the same type, then write down items from left to right, keeping track of the number of items left of each type. At each point put down an item with the largest number left that you can legally put down.
This will work for N items if there are no more than ceil(N / k) items of the same type, as it will preserve this property - after putting down k items we have k less items and we have put down at least one of each type that started with at ceil(N / k) items of that type.
Given a clutch of mixed items you could work out the largest k you can support and then lay out the items to solve for this k.

Seating people in a movie theater

This is based on an article I read about puzzles and interview questions asked by large software companies, but it has a twist...
General question:
What is an algorithm to seat people in a movie theater so that they sit directly beside their friends but not beside their enemies.
Technical question:
Given an N by M grid, fill the grid with N * M - 1 items. Each item has an association Boolean value for each of the other N * M - 2 items. In each row of N, items directly beside other items should have a positive association value for the other. Columns however do not matter, i.e. an item can be "enemies" with the item in front of it. Note: If item A has a positive association value for B, then that means B also has a positive association value for A. It works the same for negative association values. An item is guarenteed to have a positive association with atleast one other item. Also, you have access to all of the items and their association values before you start placing them in the grid.
Comments:
I have been researching this problem and thinking about it since yesterday, from what I have found it reminds me of the bin packing problem with some added requirements. In some free time I attempted to implement it, but large groups of "enemies" were sitting next to each other. I am sure that most situations will have to have atleast one pair of enemies sitting next to each other, but my solution was far from optimal. It actually looked as if I had just randomized it.
As far as my implementation went, I made N = 10, M = 10, the number of items = 99, and had an array of size 99 for EACH item that had a randomized Boolean value that referred to the friendship of the corresponding item number. This means that each item had a friendship value that corresponded with their self as well, I just ignored that value.
I plan on trying to reimplement this again later and I will post the code. Can anyone figure out a "good" way to do this to minimize seating clashes between enemies?
This problem is NP-Hard.
Define L={(G,n,m)|there is a legal seating for G in m×m matrix,(u,v) in E if u is friend of v} L is a formal definition of this problem as a language.
Proof:
We will show Hamiltonian-Problem ≤ (p) 2-path ≤ (p) This-problem in 2 steps [Hamiltonian and 2-path defined below], and thus we conclude this problem is NP-Hard.
(1) We will show that finding two paths covering all vertices without using any vertex twice is NP-Hard [let's call such a path: 2-path and this problem as 2-path problem]
A reduction from Hamiltonian Path problem:
input: a graph G=(V,E)
Output: a graph G'=(V',E) where V' = V U {u₀}.
Correctness:
if G has Hamiltonian Path: v₁→v₂→...→vn, then G' has 2-path:
v₁→v₂→...→vn,u₀
if G' has 2-path, since u₀ is isolated from the rest vertices, there is a
path: v₁→...→vn, which is Hamiltonian in G.
Thus: G has Hamiltonian path 1 ⇔ G' has 2-path, and thus: 2-path problem is NP-Hard.
(2)We will now show that our problem [L] is also NP-Hard:
We will show a reduction from the 2-path problem, defined above.
input: a graph G=(V,E)
output: (G,|V|+1,1) [a long row with |V|+1 sits].
Correctness:
If G has 2-path, then we can seat the people, and use the 1 sit gap to
use as a 'buffer' between the two paths, this will be a legal perfect seating
since if v₁ is sitting next to v₂, then v₁ v₁→v₂ is in the path, and thus
(v₁,v₂) is in E, so v₁,v₂ are friends.
If (G,|V|+1,1) is legal seat:[v₁,...,vk,buffer,vk+1,...,vn] , there is a 2-path in G,
v₁→...→vk, vk+1→...→vn
Conclusion: This problem is NP-Hard, so there is not known polynomial solution for it.
Exponential solution:
You might want to use backtracking solution: which is basically: create all subsets of E with size |V|-2 or less, check which is best.
static best <- infinity
least_enemies(G,used):
if |used| <= |V|-2:
val <- evaluate(used)
best <- min(best,val)
if |used| == |V|-2:
return
for each edge e in E-used: //E without used
least_enemies(G,used + e)
in here we assume evaluate(used) gives the 'score' for this solution. if this solution is completely illegal [i.e. a vertex appear twice], evaluate(used)=infinity. an optimization can of course be made, trimming these cases. to get the actual sitting we can store the currently best solution.
(*)There are probably better solutions, this is just a simple possible solution to start with, the main aim in this answer is proving this problem is NP-Hard.
EDIT: simpler solution:
Create a graph G'=(V U { u₀ } ,E U {(u₀,v),(v,u₀) | for each v in V}) [u₀ is a junk vertex for the buffer] and a weight function for edges:
w((u,v)) = 1 u is friend of v
w((u,v)) = 2 u is an enemy v
w((u0,v)) = w ((v,u0)) = 0
Now you got yourself a classic TSP, which can be solved in O(|V|^2 * 2^|V|) using dynamic programming.
Note that this solution [using TSP] is for one lined theatre, but it might be a good lead to find a solution for the general case.
One algorithm used for large "search spaces" such as this is simulated annealing

Algorithm/Data Structure for finding combinations of minimum values easily

I have a symmetric matrix like shown in the image attached below.
I've made up the notation A.B which represents the value at grid point (A, B). Furthermore, writing A.B.C gives me the minimum grid point value like so: MIN((A,B), (A,C), (B,C)).
As another example A.B.D gives me MIN((A,B), (A,D), (B,D)).
My goal is to find the minimum values for ALL combinations of letters (not repeating) for one row at a time e.g for this example I need to find min values with respect to row A which are given by the calculations:
A.B = 6
A.C = 8
A.D = 4
A.B.C = MIN(6,8,6) = 6
A.B.D = MIN(6, 4, 4) = 4
A.C.D = MIN(8, 4, 2) = 2
A.B.C.D = MIN(6, 8, 4, 6, 4, 2) = 2
I realize that certain calculations can be reused which becomes increasingly important as the matrix size increases, but the problem is finding the most efficient way to implement this reuse.
Can point me in the right direction to finding an efficient algorithm/data structure I can use for this problem?
You'll want to think about the lattice of subsets of the letters, ordered by inclusion. Essentially, you have a value f(S) given for every subset S of size 2 (that is, every off-diagonal element of the matrix - the diagonal elements don't seem to occur in your problem), and the problem is to find, for each subset T of size greater than two, the minimum f(S) over all S of size 2 contained in T. (And then you're interested only in sets T that contain a certain element "A" - but we'll disregard that for the moment.)
First of all, note that if you have n letters, that this amounts to asking Omega(2^n) questions, roughly one for each subset. (Excluding the zero- and one-element subsets and those that don't include "A" saves you n + 1 sets and a factor of two, respectively, which is allowed for big Omega.) So if you want to store all these answers for even moderately large n, you'll need a lot of memory. If n is large in your applications, it might be best to store some collection of pre-computed data and do some computation whenever you need a particular data point; I haven't thought about what would work best, but for example computing data only for a binary tree contained in the lattice would not necessarily help you anything beyond precomputing nothing at all.
With these things out of the way, let's assume you actually want all the answers computed and stored in memory. You'll want to compute these "layer by layer", that is, starting with the three-element subsets (since the two-element subsets are already given by your matrix), then four-element, then five-element, etc. This way, for a given subset S, when we're computing f(S) we will already have computed all f(T) for T strictly contained in S. There are several ways that you can make use of this, but I think the easiest might be to use two such subset S: let t1 and t2 be two different elements of T that you may select however you like; let S be the subset of T that you get when you remove t1 and t2. Write S1 for S plus t1 and write S2 for S plus t2. Now every pair of letters contained in T is either fully contained in S1, or it is fully contained in S2, or it is {t1, t2}. Look up f(S1) and f(S2) in your previously computed values, then look up f({t1, t2}) directly in the matrix, and store f(T) = the minimum of these 3 numbers.
If you never select "A" for t1 or t2, then indeed you can compute everything you're interested in while not computing f for any sets T that don't contain "A". (This is possible because the steps outlined above are only interesting whenever T contains at least three elements.) Good! This leaves just one question - how to store the computed values f(T). What I would do is use a 2^(n-1)-sized array; represent each subset-of-your-alphabet-that-includes-"A" by the (n-1) bit number where the ith bit is 1 whenever the (i+1)th letter is in that set (so 0010110, which has bits 2, 4, and 5 set, represents the subset {"A", "C", "D", "F"} out of the alphabet "A" .. "H" - note I'm counting bits starting at 0 from the right, and letters starting at "A" = 0). This way, you can actually iterate through the sets in numerical order and don't need to think about how to iterate through all k-element subsets of an n-element set. (You do need to include a special case for when the set under consideration has 0 or 1 element, in which case you'll want to do nothing, or 2 elements, in which case you just copy the value from the matrix.)
Well, it looks simple to me, but perhaps I misunderstand the problem. I would do it like this:
let P be a pattern string in your notation X1.X2. ... .Xn, where Xi is a column in your matrix
first compute the array CS = [ (X1, X2), (X1, X3), ... (X1, Xn) ], which contains all combinations of X1 with every other element in the pattern; CS has n-1 elements, and you can easily build it in O(n)
now you must compute min (CS), i.e. finding the minimum value of the matrix elements corresponding to the combinations in CS; again you can easily find the minimum value in O(n)
done.
Note: since your matrix is symmetric, given P you just need to compute CS by combining the first element of P with all other elements: (X1, Xi) is equal to (Xi, X1)
If your matrix is very large, and you want to do some optimization, you may consider prefixes of P: let me explain with an example
when you have solved the problem for P = X1.X2.X3, store the result in an associative map, where X1.X2.X3 is the key
later on, when you solve a problem P' = X1.X2.X3.X7.X9.X10.X11 you search for the longest prefix of P' in your map: you can do this by starting with P' and removing one component (Xi) at a time from the end until you find a match in your map or you end up with an empty string
if you find a prefix of P' in you map then you already know the solution for that problem, so you just have to find the solution for the problem resulting from combining the first element of the prefix with the suffix, and then compare the two results: in our example the prefix is X1.X2.X3, and so you just have to solve the problem for
X1.X7.X9.X10.X11, and then compare the two values and choose the min (don't forget to update your map with the new pattern P')
if you don't find any prefix, then you must solve the entire problem for P' (and again don't forget to update the map with the result, so that you can reuse it in the future)
This technique is essentially a form of memoization.

Algorithm for solving set problem

If I have a set of values (which I'll call x), and a number of subsets of x:
What is the best way to work out all possible combinations of subsets whose union is equal to x, but none of whom intersect with each other.
An example might be:
if x is the set of the numbers 1 to 100, and I have four subsets:
a = 0-49
b = 50-100
c = 50-75
d = 76-100
then the possible combinations would be:
a + b
a + c + d
What you describe is called the Exact cover problem. The general solution is Knuth's Algorithm X, with the Dancing Links algorithm being a concrete implementation.
Given a well-order on the elements of x (make one up if necessary, this is always possible for finite or countable sets):
Let "sets chosen so far" be empty. Consider the smallest element of x. Find all sets which contain x and which do not intersect with any of the sets chosen so far. For each such set in turn recurse, adding the chosen set to "sets chosen so far", and looking at the smallest element of x not in any chosen set. If you reach a point where there is no element of x left, then you've found a solution. If you reach a point where there is no unchosen set containing the element you're looking for, and which does not intersect with any of the sets that you already have selected, then you've failed to find a solution, so backtrack.
This uses stack proportional to the number of non-intersecting subsets, so watch out for that. It also uses a lot of time - you can be far more efficient if, as in your example, the subsets are all contiguous ranges.
here's a bad way (recursive, does a lot of redundant work). But at least its actual code and is probably halfway to the "efficient" solution.
def unique_sets(sets, target):
if not sets and not target:
yield []
for i, s in enumerate(sets):
intersect = s.intersection(target) and not s.difference(target)
sets_without_s = sets[:i] + sets[i+1:]
if intersect:
for us in unique_sets(sets_without_s, target.difference(s)):
yield us + [s]
else:
for us in unique_sets(sets_without_s, target):
yield us
class named_set(set):
def __init__(self, items, name):
set.__init__(self, items)
self.name = name
def __repr__(self):
return self.name
a = named_set(range(0, 50), name='a')
b = named_set(range(50, 100), name='b')
c = named_set(range(50, 75), name='c')
d = named_set(range(75, 100), name='d')
for s in unique_sets([a,b,c,d], set(range(0, 100))):
print s
A way (may not be the best way) is:
Create a set of all the pairs of subsets which overlap.
For every combination of the original subsets, say "false" if the combination contains one or more of the pairs listed in Step 1, else say "true" if the union of the subsets equals x (e.g. if the total number of elements in the subsets is x)
The actual algorithm seems largely dependent on the choice of subsets, product operation, and equate operation. For addition (+), it seems like you could find a summation to suit your needs (the sum of 1 to 100 is similar to your a + b example). If you can do this, your algorithm is obviously O(1).
If you have a tougher product or equate operator (let's say taking a product of two terms means summing the strings and finding the SHA-1 hash), you may be stuck doing nested loops, which would be O(n^x) where x is the number of terms/variables.
Depending on the subsets you have to work with, it might be advantageous to use a more naive algorithm. One where you don't have to compare the entire subset, but only upper and lower bounds.
If you are talking random subsets, not necesserily a range, then Nick Johnson's suggestion will probably be the best choice.

Resources