Algorithm to separate items of the same type - algorithm

I have a list of elements, each one identified with a type, I need to reorder the list to maximize the minimum distance between elements of the same type.
The set is small (10 to 30 items), so performance is not really important.
There's no limit about the quantity of items per type or quantity of types, the data can be considered random.
For example, if I have a list of:
5 items of A
3 items of B
2 items of C
2 items of D
1 item of E
1 item of F
I would like to produce something like:
A, B, C, A, D, F, B, A, E, C, A, D, B, A
A has at least 2 items between occurences
B has at least 4 items between occurences
C has 6 items between occurences
D has 6 items between occurences
Is there an algorithm to achieve this?
-Update-
After exchanging some comments, I came to a definition of a secondary goal:
main goal: maximize the minimum distance between elements of the same type, considering only the type(s) with less distance.
secondary goal: maximize the minimum distance between elements on every type. IE: if a combination increases the minimum distance of a certain type without decreasing other, then choose it.
-Update 2-
About the answers.
There were a lot of useful answers, although none is a solution for both goals, specially the second one which is tricky.
Some thoughts about the answers:
PengOne: Sounds good, although it doesn't provide a concrete implementation, and not always leads to the best result according to the second goal.
Evgeny Kluev: Provides a concrete implementation to the main goal, but it doesn't lead to the best result according to the secondary goal.
tobias_k: I liked the random approach, it doesn't always lead to the best result, but it's a good approximation and cost effective.
I tried a combination of Evgeny Kluev, backtracking, and tobias_k formula, but it needed too much time to get the result.
Finally, at least for my problem, I considered tobias_k to be the most adequate algorithm, for its simplicity and good results in a timely fashion. Probably, it could be improved using Simulated annealing.

First, you don't have a well-defined optimization problem yet. If you want to maximized the minimum distance between two items of the same type, that's well defined. If you want to maximize the minimum distance between two A's and between two B's and ... and between two Z's, then that's not well defined. How would you compare two solutions:
A's are at least 4 apart, B's at least 4 apart, and C's at least 2 apart
A's at least 3 apart, B's at least 3 apart, and C's at least 4 apart
You need a well-defined measure of "good" (or, more accurately, "better"). I'll assume for now that the measure is: maximize the minimum distance between any two of the same item.
Here's an algorithm that achieves a minimum distance of ceiling(N/n(A)) where N is the total number of items and n(A) is the number of items of instance A, assuming that A is the most numerous.
Order the item types A1, A2, ... , Ak where n(Ai) >= n(A{i+1}).
Initialize the list L to be empty.
For j from k to 1, distribute items of type Ak as uniformly as possible in L.
Example: Given the distribution in the question, the algorithm produces:
F
E, F
D, E, D, F
D, C, E, D, C, F
B, D, C, E, B, D, C, F, B
A, B, D, A, C, E, A, B, D, A, C, F, A, B

This sounded like an interesting problem, so I just gave it a try. Here's my super-simplistic randomized approach, done in Python:
def optimize(items, quality_function, stop=1000):
no_improvement = 0
best = 0
while no_improvement < stop:
i = random.randint(0, len(items)-1)
j = random.randint(0, len(items)-1)
copy = items[::]
copy[i], copy[j] = copy[j], copy[i]
q = quality_function(copy)
if q > best:
items, best = copy, q
no_improvement = 0
else:
no_improvement += 1
return items
As already discussed in the comments, the really tricky part is the quality function, passed as a parameter to the optimizer. After some trying I came up with one that almost always yields optimal results. Thank to pmoleri, for pointing out how to make this a whole lot more efficient.
def quality_maxmindist(items):
s = 0
for item in set(items):
indcs = [i for i in range(len(items)) if items[i] == item]
if len(indcs) > 1:
s += sum(1./(indcs[i+1] - indcs[i]) for i in range(len(indcs)-1))
return 1./s
And here some random result:
>>> print optimize(items, quality_maxmindist)
['A', 'B', 'C', 'A', 'D', 'E', 'A', 'B', 'F', 'C', 'A', 'D', 'B', 'A']
Note that, passing another quality function, the same optimizer could be used for different list-rearrangement tasks, e.g. as a (rather silly) randomized sorter.

Here is an algorithm that only maximizes the minimum distance between elements of the same type and does nothing beyond that. The following list is used as an example:
AAAAA BBBBB CCCC DDDD EEEE FFF GG
Sort element sets by number of elements of each type in descending order. Actually only largest sets (A & B) should be placed to the head of the list as well as those element sets that have one element less (C & D & E). Other sets may be unsorted.
Reserve R last positions in the array for one element from each of the largest sets, divide the remaining array evenly between the S-1 remaining elements of the largest sets. This gives optimal distance: K = (N - R) / (S - 1). Represent target array as a 2D matrix with K columns and L = N / K full rows (and possibly one partial row with N % K elements). For example sets we have R = 2, S = 5, N = 27, K = 6, L = 4.
If matrix has S - 1 full rows, fill first R columns of this matrix with elements of the largest sets (A & B), otherwise sequentially fill all columns, starting from last one.
For our example this gives:
AB....
AB....
AB....
AB....
AB.
If we try to fill the remaining columns with other sets in the same order, there is a problem:
ABCDE.
ABCDE.
ABCDE.
ABCE..
ABD
The last 'E' is only 5 positions apart from the first 'E'.
Sequentially fill all columns, starting from last one.
For our example this gives:
ABFEDC
ABFEDC
ABFEDC
ABGEDC
ABG
Returning to linear array we have:
ABFEDCABFEDCABFEDCABGEDCABG
Here is an attempt to use simulated annealing for this problem (C sources): http://ideone.com/OGkkc.

I believe you could see your problem like a bunch of particles that physically repel eachother. You could iterate to a 'stable' situation.
Basic pseudo-code:
force( x, y ) = 0 if x.type==y.type
1/distance(x,y) otherwise
nextposition( x, force ) = coined?(x) => same
else => x + force
notconverged(row,newrow) = // simplistically
row!=newrow
row=[a,b,a,b,b,b,a,e];
newrow=nextposition(row);
while( notconverged(row,newrow) )
newrow=nextposition(row);
I don't know if it converges, but it's an idea :)

I'm sure there may be a more efficient solution, but here is one possibility for you:
First, note that it is very easy to find an ordering which produces a minimum-distance-between-items-of-same-type of 1. Just use any random ordering, and the MDBIOST will be at least 1, if not more.
So, start off with the assumption that the MDBIOST will be 2. Do a recursive search of the space of possible orderings, based on the assumption that MDBIOST will be 2. There are a number of conditions you can use to prune branches from this search. Terminate the search if you find an ordering which works.
If you found one that works, try again, under the assumption that MDBIOST will be 3. Then 4... and so on, until the search fails.
UPDATE: It would actually be better to start with a high number, because that will constrain the possible choices more. Then gradually reduce the number, until you find an ordering which works.

Here's another approach.
If every item must be kept at least k places from every other item of the same type, then write down items from left to right, keeping track of the number of items left of each type. At each point put down an item with the largest number left that you can legally put down.
This will work for N items if there are no more than ceil(N / k) items of the same type, as it will preserve this property - after putting down k items we have k less items and we have put down at least one of each type that started with at ceil(N / k) items of that type.
Given a clutch of mixed items you could work out the largest k you can support and then lay out the items to solve for this k.

Related

How do I randomly equalize unequal values?

Say I have multiple unequal values a, b, c, d, e. Is it possible to turn these unequal values into equal values just by using random number generation?
Example: a=100, b=140, c=200, d=2, e=1000. I want the algorithm to randomly target these sets such that the largest value is targeted most often and the smallest value is left alone for the most parts.
Areas where I've run into problems: if I just use non-unique random number generation, then value e will end up going under the other values. If I use unique number generation, then the ration between the values doesn't change even if their absolute values do. I've tried using sets where a certain range of numbers have to be hit a certain number of times before the value changes. I haven't tried using a mix of unique/non-unique random numbers yet.
I want the ratio between the values to gradually approach 1 as the algorithm runs.
Another way to think about the problem: say these values a, b, c, d, e, are all equal. If we randomly choose one, each is as likely to be chosen as any other. After we choose one, we add 1 to that value. Then we run this process again. This time, the value that was picked last time is 1-larger than any other value so it's more likely to be picked than any one other value. This creates a snowball effect where the value picked first is likely to keep getting picked and achieve runaway growth. I'm looking for the opposite of this algorithm where we start after these originally-equal values have diverged and we bring them back to the originally-equal state.
I think this process is impossible because of entropy and the inherent one-way nature of existence.
Well, there is a technique called Inverse Weights, where you sample items inverse proportional to their previous appearance. Each time we sample a, b, c, d or e, we update their appearance numbers and recalculate probabilities. Simple python code, I sample numbers [0...4] as a, b, c, d, e and start with what you listed as appearances. After 100,000 samples they looks to be equidistributed
import numpy as np
n = np.array([100, 140, 200, 2, 1000])
for k in range(1, 100000):
p = (1.0 / n) # make probabilities inverse to weights
p /= np.sum(p) # normalization
a = np.random.choice(5, p = p) # sampling numbers in the range [0...5)
n[a] += 1 # update weights
print(n)
Output
[20260 20194 20290 20305 20392]

Distinct sub sequences summing to given number in an array

During my current preparation for interview, I encountered a question for which I am having some difficulty to get optimal solution,
We are given an array A and an integer Sum, we need to find all distinct sub sequences of A whose sum equals Sum.
For eg. A={1,2,3,5,6} Sum=6 then answer should be
{1,2,3}
{1,5}
{6}
Presently I can think of two ways of doing this,
Use Recursion ( which I suppose should be last thing to consider for an interview question)
Use Integer Partitioning to partition Sum and check whether the elements of partition are present in A
Please guide my thoughts.
I agree with Jason. This solution comes to mind:
(complexity is O(sum*|A|) if you represent the map as an array)
Call the input set A and the target sum sum
Have a map of elements B, with each element being x:y, where x (the map key) is the sum and y (the map value) is the number of ways to get to it.
Starting of, add 0:1 to the map - there is 1 way to get to 0 (obviously by using no elements)
For each element a in A, consider each element x:y in B.
If x+a > sum, don't do anything.
If an element with the key x+a already exists in B, say that element is x+a:z, modify it to x+a:y+z.
If an element with the key doesn't exist, simply add x+a:y to the set.
Look up the element with key sum, thus sum:x - x is our desired value.
If B is sorted (or an array), you can simply skip the rest of the elements in B during the "don't do anything" step.
Tracing it back:
The above just gives the count, this will modify it to give the actual subsequences.
At each element in B, instead of the sum, store all the source elements and the elements used to get there (so have a list of pairs at each element in B).
For 0:1 there is no source elements.
For x+a:y, the source element is x and the element to get there is a.
During the above process, if an element with the key already exists, enqueue the pair x/a to the element x+a (enqueue is an O(1) operation).
If an element with the key doesn't exist, simply create a list with one pair x/a at the element x+a.
To reconstruct, simply start at sum and recursively trace your way back.
We have to be careful of duplicate sequences (do we?) and sequences with duplicate elements here.
Example - not tracing it back:
A={1,2,3,5,6}
sum = 6
B = 0:1
Consider 1
Add 0+1
B = 0:1, 1:1
Consider 2
Add 0+2:1, 1+2:1
B = 0:1, 1:1, 2:1, 3:1
Consider 3
Add 0+3:1 (already exists -> add 1 to it), 1+3:1, 2+1:1, 3+1:1
B = 0:1, 1:1, 2:1, 3:2, 4:1, 5:1, 6:1
Consider 5
B = 0:1, 1:1, 2:1, 3:2, 4:1, 5:2, 6:2
Generated sums thrown away = 7:1, 8:2, 9:1, 10:1, 11:1
Consider 6
B = 0:1, 1:1, 2:1, 3:2, 4:1, 5:2, 6:3
Generated sums thrown away = 7:1, 8:1, 9:2, 10:1, 11:2, 12:2
Then, from 6:3, we know we have 3 ways to get to 6.
Example - tracing it back:
A={1,2,3,5,6}
sum = 6
B = 0:{}
Consider 1
B = 0:{}, 1:{0/1}
Consider 2
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2}
Consider 3
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2,0/3}, 4:{1/3}, 5:{2/3}, 6:{3/3}
Consider 5
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2,0/3}, 4:{1/3}, 5:{2/3,0/5}, 6:{3/3,1/5}
Generated sums thrown away = 7, 8, 9, 10, 11
Consider 6
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2,0/3}, 4:{1/3}, 5:{2/3,0/5}, 6:{3/3,1/5,0/6}
Generated sums thrown away = 7, 8, 9, 10, 11, 12
Then, tracing back from 6: (not in {} means an actual element, in {} means a map entry)
{6}
{3}+3
{1}+2+3
{0}+1+2+3
1+2+3
Output {1,2,3}
{0}+3+3
3+3
Invalid - 3 is duplicate
{1}+5
{0}+1+5
1+5
Output {1,5}
{0}+6
6
Output {6}
This is a variant of the subset-sum problem. The subset-sum problem asks if there is a subset that sums to given a value. You are asking for all of the subsets that sum to a given value.
The subset-sum problem is hard (more precisely, it's NP-Complete) which means that your variant is hard too (it's not NP-Complete, because it's not a decision problem, but it is NP-Hard).
The classic approach to the subset-sum problem is either recursion or dynamic programming. It's obvious how to modify the recursive solution to the subset-sum problem to answer your variant. I suggest that you also take a look at the dynamic programming solution to subset-sum and see if you can modify it for your variant (tbc: I do not know if this is actually possible). That would certainly be a very valuable learning exercise whether or not is possible as it would certainly enhance your understanding of dynamic programming either way.
It would surprise me though, if the expected answer to your question is anything but the recursive solution. It's easy to come up with, and an acceptable approach to the problem. Asking for the dynamic programming solution on-the-fly is a bit much to ask.
You did, however, neglect to mention a very naïve approach to this problem: generate all subsets, and for each subset check if it sums to the given value or not. Obviously that's exponential, but it does solve the problem.
I assumed that given array contains distinct numbers.
Let's define function f(i, s) - which means we used some numbers in range [1, i] and the sum of used numbers is s.
Let's store all values in 2 dimensional matrix i.e. in cell (i, j) we will have value for f(i, j). Now if have already calculated values for cells which are located upper or lefter the cell (i, s) we can calculate value for f(i, s) i.e. f(i, s) = f(i - 1, s);(not to take i indexed number) and if(s >= a[i]) f(i, s) += f(i - 1, s - a[i]). And we can use bottom-up approach to fill all the matrix, setting [f(0, 0) = 1; f(0, i) = 0; 1 <= i <= s], [f(i, 0) = 1;1<=i<=n;]. If we calculated all the matrix then we have answer in cell f(n,S); Thus we have total time complexity O(n*s) and memory complexity O(n*s);
We can improve memory complexity if we note that in every iteration we need only information from previous row, it means that we can store matrix of size 2xS not nxS. We reduced memory complexity up to linear to S. This problem is NP complete thus we don't have polynomial algorithm for this and this approach is the best thing.

Algorithm/Data Structure for finding combinations of minimum values easily

I have a symmetric matrix like shown in the image attached below.
I've made up the notation A.B which represents the value at grid point (A, B). Furthermore, writing A.B.C gives me the minimum grid point value like so: MIN((A,B), (A,C), (B,C)).
As another example A.B.D gives me MIN((A,B), (A,D), (B,D)).
My goal is to find the minimum values for ALL combinations of letters (not repeating) for one row at a time e.g for this example I need to find min values with respect to row A which are given by the calculations:
A.B = 6
A.C = 8
A.D = 4
A.B.C = MIN(6,8,6) = 6
A.B.D = MIN(6, 4, 4) = 4
A.C.D = MIN(8, 4, 2) = 2
A.B.C.D = MIN(6, 8, 4, 6, 4, 2) = 2
I realize that certain calculations can be reused which becomes increasingly important as the matrix size increases, but the problem is finding the most efficient way to implement this reuse.
Can point me in the right direction to finding an efficient algorithm/data structure I can use for this problem?
You'll want to think about the lattice of subsets of the letters, ordered by inclusion. Essentially, you have a value f(S) given for every subset S of size 2 (that is, every off-diagonal element of the matrix - the diagonal elements don't seem to occur in your problem), and the problem is to find, for each subset T of size greater than two, the minimum f(S) over all S of size 2 contained in T. (And then you're interested only in sets T that contain a certain element "A" - but we'll disregard that for the moment.)
First of all, note that if you have n letters, that this amounts to asking Omega(2^n) questions, roughly one for each subset. (Excluding the zero- and one-element subsets and those that don't include "A" saves you n + 1 sets and a factor of two, respectively, which is allowed for big Omega.) So if you want to store all these answers for even moderately large n, you'll need a lot of memory. If n is large in your applications, it might be best to store some collection of pre-computed data and do some computation whenever you need a particular data point; I haven't thought about what would work best, but for example computing data only for a binary tree contained in the lattice would not necessarily help you anything beyond precomputing nothing at all.
With these things out of the way, let's assume you actually want all the answers computed and stored in memory. You'll want to compute these "layer by layer", that is, starting with the three-element subsets (since the two-element subsets are already given by your matrix), then four-element, then five-element, etc. This way, for a given subset S, when we're computing f(S) we will already have computed all f(T) for T strictly contained in S. There are several ways that you can make use of this, but I think the easiest might be to use two such subset S: let t1 and t2 be two different elements of T that you may select however you like; let S be the subset of T that you get when you remove t1 and t2. Write S1 for S plus t1 and write S2 for S plus t2. Now every pair of letters contained in T is either fully contained in S1, or it is fully contained in S2, or it is {t1, t2}. Look up f(S1) and f(S2) in your previously computed values, then look up f({t1, t2}) directly in the matrix, and store f(T) = the minimum of these 3 numbers.
If you never select "A" for t1 or t2, then indeed you can compute everything you're interested in while not computing f for any sets T that don't contain "A". (This is possible because the steps outlined above are only interesting whenever T contains at least three elements.) Good! This leaves just one question - how to store the computed values f(T). What I would do is use a 2^(n-1)-sized array; represent each subset-of-your-alphabet-that-includes-"A" by the (n-1) bit number where the ith bit is 1 whenever the (i+1)th letter is in that set (so 0010110, which has bits 2, 4, and 5 set, represents the subset {"A", "C", "D", "F"} out of the alphabet "A" .. "H" - note I'm counting bits starting at 0 from the right, and letters starting at "A" = 0). This way, you can actually iterate through the sets in numerical order and don't need to think about how to iterate through all k-element subsets of an n-element set. (You do need to include a special case for when the set under consideration has 0 or 1 element, in which case you'll want to do nothing, or 2 elements, in which case you just copy the value from the matrix.)
Well, it looks simple to me, but perhaps I misunderstand the problem. I would do it like this:
let P be a pattern string in your notation X1.X2. ... .Xn, where Xi is a column in your matrix
first compute the array CS = [ (X1, X2), (X1, X3), ... (X1, Xn) ], which contains all combinations of X1 with every other element in the pattern; CS has n-1 elements, and you can easily build it in O(n)
now you must compute min (CS), i.e. finding the minimum value of the matrix elements corresponding to the combinations in CS; again you can easily find the minimum value in O(n)
done.
Note: since your matrix is symmetric, given P you just need to compute CS by combining the first element of P with all other elements: (X1, Xi) is equal to (Xi, X1)
If your matrix is very large, and you want to do some optimization, you may consider prefixes of P: let me explain with an example
when you have solved the problem for P = X1.X2.X3, store the result in an associative map, where X1.X2.X3 is the key
later on, when you solve a problem P' = X1.X2.X3.X7.X9.X10.X11 you search for the longest prefix of P' in your map: you can do this by starting with P' and removing one component (Xi) at a time from the end until you find a match in your map or you end up with an empty string
if you find a prefix of P' in you map then you already know the solution for that problem, so you just have to find the solution for the problem resulting from combining the first element of the prefix with the suffix, and then compare the two results: in our example the prefix is X1.X2.X3, and so you just have to solve the problem for
X1.X7.X9.X10.X11, and then compare the two values and choose the min (don't forget to update your map with the new pattern P')
if you don't find any prefix, then you must solve the entire problem for P' (and again don't forget to update the map with the result, so that you can reuse it in the future)
This technique is essentially a form of memoization.

Generating Balls in Boxes

Given two sorted vectors a and b, find all vectors which are sums of a and some permutation of b, and which are unique once sorted.
You can create one of the sought vectors in the following way:
Take vector a and a permutation of vector b.
Sum them together so c[i]=a[i]+b[i].
Sort c.
I'm interested in finding the set of b-permutations that yield the entire set of unique c vectors.
Example 0: a='ccdd' and b='xxyy'
Gives the summed vectors: 'cycydxdx', 'cxcxdydy', 'cxcydxdy'.
Notice that the permutations of b: 'xyxy' and 'yxyx' are equal, because in both cases the "box c" and the "box d" both get exactly one 'x' and one 'y'.
I guess this is similar to putting M balls in M boxes (one in each) with some groups of balls and boxes being identical.
Update: Given a string a='aabbbcdddd' and b='xxyyzzttqq' your problem will be 10 balls in 4 boxes. There are 4 distinct boxes of size 2, 3, 1 and 4. The balls are pair wise indistinguishable.
Example 1: Given strings are a='xyy' and b='kkd'.
Possible solution: 'kkd', 'dkk'.
Reason: We see that all unique permutations of b are 'kkd', 'kdk' and 'dkk'. However with our restraints, the two first permutations are considered equal as the indices on which the differ maps to the same char 'y' in string a.
Example 2: Given strings are a='xyy' and b='khd'.
Possible solution: 'khd', 'dkh', 'hkd'.
Example 3: Given strings are a='xxxx' and b='khhd'.
Possible solution: 'khhd'.
I can solve the problem of generating unique candidate b permutations using Narayana Pandita's algorithm as decribed on Wikipedia/Permutation.
The second part seams harder. My best shot is to join the two strings pairwise to a list, sort it and use it as a key in a lookup set. ('xx'+'hd' join→'xh','xd' sort→'xd','xh').
As my M is often very big, and as similarities in the strings are common, I currently generate way more b permutations than actually goes through the set filter. I would love to have an algorithm generating the correct ones directly. Any improvement is welcome.
To generate k-combinations of possibly repeated elements (multiset), the following could be useful: A Gray Code for Combinations of a Multiset (1995).
For a recursive solution you try the following:
Count the number of times each character appears. Say they are x1 x2 ... xm, corresponding to m distinct characters.
Then you need to find all possible ordered pairs (y1 y2 ... ym) such that
0 <= yi <= xi
and Sum yi = k.
Here yi is the number of times character i appears.
The idea is, fix the number of times char 1 appears (y1). Then recursively generate all combinations of k-y1 from the remaining.
psuedocode:
List Generate (int [] x /* array index starting at 1*/,
int k /* size of set */) {
list = List.Empty;
if (Sum(x) < k) return list;
for (int i = 0; i <= x[1], i++) {
// Remove first element and generate subsets of size k-i.
remaining = x.Remove(1);
list_i = Generate(remaining, k-i);
if (list_i.NotEmpty()) {
list = list + list_i;
} else {
return list;
}
}
return list;
}
PRIOR TO EDITS:
If I understood it correctly, you need to look at string a, see the symbols that appear exactly once. Say there are k such symbols. Then you need to generate all possible permutations of b, which contain k elements and map to those symbols at the corresponding positions. The rest you can ignore/fill in as you see fit.
I remember posting C# code for that here: How to find permutation of k in a given length?
I am assuming xxyy will give only 1 unique string and the ones that appear exactly once are the 'distinguishing' points.
Eg in case of a=xyy, b=add
distinguishing point is x
So you select permuations of 'add' of length 1. Those gives you a and d.
Thus add and dad (or dda) are the ones you need.
For a=xyyz b=good
distinguishing points are x and z
So you generate permutations of b of length 2 giving
go
og
oo
od
do
gd
dg
giving you 7 unique permutations.
Does that help? Is my understanding correct?
Ok, I'm sorry I never was able to clearly explain the problem, but here is a solution.
We need two functions combinations and runvector(v). combinations(s,k) generates the unique combinations of a multiset of a length k. For s='xxyy' these would be ['xx','xy','yy']. runvector(v) transforms a multiset represented as a sorted vector into a more simple structure, the runvector. runvector('cddeee')=[1,2,3].
To solve the problem, we will use recursive generators. We run through all the combinations that fits in box1 and the recourse on the rest of the boxes, banning the values we already chose. To accomplish the banning, combinations will maintain a bitarray across of calls.
In python the approach looks like this:
def fillrest(banned,out,rv,b,i):
if i == len(rv):
yield None
return
for comb in combinations(b,rv[i],banned):
out[i] = comb
for rest in fillrest(banned,out,rv,b,i+1):
yield None
def balls(a,b):
rv = runvector(a)
banned = [False for _ in b]
out = [None for _ in rv]
for _ in fill(out,rv,0,b,banned):
yield out[:]
>>> print list(balls('abbccc','xyyzzz'))
[['x', 'yy', 'zzz'],
['x', 'yz', 'yzz'],
['x', 'zz', 'yyz'],
['y', 'xy', 'zzz'],
['y', 'xz', 'yzz'],
['y', 'yz', 'xzz'],
['y', 'zz', 'xyz'],
['z', 'xy', 'yzz'],
['z', 'xz', 'yyz'],
['z', 'yy', 'xzz'],
['z', 'yz', 'xyz'],
['z', 'zz', 'xyy']]
The output are in 'box' format, but can easily be merged back to simple strings: 'xyyzzzz', 'xyzyzz'...

Algorithm to merge two lists lacking comparison between them

I am looking for an algorithm to merge two sorted lists,
but they lack a comparison operator between elements of one list and elements of the other.
The resulting merged list may not be unique, but any result which satisfies the relative sort order of each list will do.
More precisely:
Given:
Lists A = {a_1, ..., a_m}, and B = {b_1, ..., b_n}. (They may be considered sets, as well).
A precedence operator < defined among elements of each list such that
a_i < a_{i+1}, and b_j < b_{j+1} for 1 <= i <= m and 1 <= j <= n.
The precedence operator is undefined between elements of A and B:
a_i < b_j is not defined for any valid i and j.
An equality operator = defined among all elements of either A or B
(it is defined between an element from A and an element from B).
No two elements from list A are equal, and the same holds for list B.
Produce:
A list C = {c_1, ..., c_r} such that:
C = union(A, B); the elements of C are the union of elements from A and B.
If c_p = a_i, c_q = a_j, and a_i < a_j, then c_p < c_q. (The order of elements
of the sublists of C corresponding to sets A and B should be preserved.
There exist no i and j such that c_i = c_j.
(all duplicated elements between A and B are removed).
I hope this question makes sense and that I'm not asking something either terribly obvious,
or something for which there is no solution.
Context:
A constructible number can be represented exactly in finitely many quadratic extensions to the field of rational numbers (using a binary tree of height equal to the number of field extensions).
A representation of a constructible number must therefore "know" the field it is represented in.
Lists A and B represent successive quadratic extensions of the rational numbers.
Elements of A and B themselves are constructible numbers, which are defined in the context
of previous smaller fields (hence the precedence operator). When adding/multiplying constructible numbers,
the quadratically extended fields must first be merged so that the binary arithmetic
operations can be performed; the resulting list C is the quadratically extended field which
can represent numbers representable by both fields A and B.
(If anyone has a better idea of how to programmatically work with constructible numbers, let me know. A question concerning constructible numbers has arisen before, and also here are some interesting responses about their representation.)
Before anyone asks, no, this question does not belong on mathoverflow; they hate algorithm (and generally non-graduate-level math) questions.
In practice, lists A and B are linked lists (stored in reverse order, actually).
I will also need to keep track of which elements of C corresponded to which in A and B, but that is a minor detail.
The algorithm I seek is not the merge operation in mergesort,
because the precedence operator is not defined between elements of the two lists being merged.
Everything will eventually be implemented in C++ (I just want the operator overloading).
This is not homework, and will eventually be open sourced, FWIW.
I don't think you can do it better than O(N*M), although I'd be happy to be wrong.
That being the case, I'd do this:
Take the first (remaining) element of A.
Look for it in (what's left of) B.
If you don't find it in B, move it to the output
If you do find it in B, move everything from B up to and including the match, and drop the copy from A.
Repeat the above until A is empty
Move anything left in B to the output
If you want to detect incompatible orderings of A and B, then remove "(what's left of)" from step 2. Search the whole of B, and raise an error if you find it "too early".
The problem is that given a general element of A, there is no way to look for it in B in better than linear time (in the size of B), because all we have is an equality test. But clearly we need to find the matches somehow and (this is where I wave my hands a bit, I can't immediately prove it) therefore we have to check each element of A for containment in B. We can avoid a bunch of comparisons because the orders of the two sets are consistent (at least, I assume they are, and if not there's no solution).
So, in the worst case the intersection of the lists is empty, and no elements of A are order-comparable with any elements of B. This requires N*M equality tests to establish, hence the worst-case bound.
For your example problem A = (1, 2, c, 4, 5, f), B = (a, b, c, d, e, f), this gives the result (1,2,a,b,c,4,5,d,e,f), which seems good to me. It performs 24 equality tests in the process (unless I can't count): 6 + 6 + 3 + 3 + 3 + 3. Merging with A and B the other way around would yield (a,b,1,2,c,d,e,4,5,f), in this case with the same number of comparisons, since the matching elements just so happen to be at equal indices in the two lists.
As can be seen from the example, the operation can't be repeated. merge(A,B) results in a list with an order inconsistent with that of merge(B,A). Hence merge((merge(A,B),merge(B,A)) is undefined. In general, the output of a merge is arbitrary, and if you go around using arbitrary orders as the basis of new complete orders, you will generate mutually incompatible orders.
This sounds like it would use a degenerate form of topological sorting.
EDIT 2:
Now with a combined routine:
import itertools
list1 = [1, 2, 'c', 4, 5, 'f', 7]
list2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
ibase = 0
result = []
for n1, i1 in enumerate(list1):
for n2, i2 in enumerate(itertools.islice(list2, ibase, None, 1)):
if i1 == i2:
result.extend(itertools.islice(list2, ibase, ibase + n2))
result.append(i2)
ibase = n2 + ibase + 1
break
else:
result.append(i1)
result.extend(itertools.islice(list2, ibase, None, 1))
print result
Would concatenating the two lists be sufficient? It does preserve the relative sortedness of elements from a and elements from b.
Then it's just a matter of removing duplicates.
EDIT: Alright, after the comment discussion (and given the additional condition that a_i=b_i & a_j=b_j & a_i<a_j => b_i<b-J), here's a reasonable solution:
Identify entries common to both lists. This is O(n2) for the naive algorithm - you might be able to improve it.
(optional) verify that the common entries are in the same order in both lists.
Construct the result list: All elements of a that are before the first shared element, followed by all elements of b before the first shared element, followed by the first shared element, and so on.
Given the problem as you have expressed it, I have a feeling that the problem may have no solution. Suppose that you have two pairs of elements {a_1, b_1} and {a_2, b_2} where a_1 < a_2 in the ordering of A, and b_1 > b_2 in the ordering of B. Now suppose that a_1 = b_1 and a_2 = b_2 according to the equality operator for A and B. In this scenario, I don't think you can create a combined list that satisfies the sublist ordering requirement.
Anyway, there's an algorithm that should do the trick. (Coded in Java-ish ...)
List<A> alist = ...
List<B> blist = ...
List<Object> mergedList = new SomeList<Object>(alist);
int mergePos = 0;
for (B b : blist) {
boolean found = false;
for (int i = mergePos; i < mergedList.size(); i++) {
if (equals(mergedList.get(i), b)) {
found = true; break;
}
}
if (!found) {
mergedList.insertBefore(b, mergePos);
mergePos++;
}
}
This algorithm is O(N**2) in the worst case, and O(N) in the best case. (I'm skating over some Java implementation details ... like combining list iteration and insertion without a major complexity penalty ... but I think it can be done in this case.)
The algorithm neglects the pathology I mentioned in the first paragraph and other pathologies; e.g. that an element of B might be "equal to" multiple elements of A, or vice versa. To deal with these, the algorithm needs to check each b against all elements of the mergedList that are not instances of B. That makes the algorithm O(N**2) in the best case.
If the elements are hashable, this can be done in O(N) time where N is the total number of elements in A and B.
def merge(A, B):
# Walk A and build a hash table mapping its values to indices.
amap = {}
for i, a in enumerate(A):
amap[a] = i
# Now walk B building C.
C = []
ai = 0
bi = 0
for i, b in enumerate(B):
if b in amap:
# b is in both lists.
new_ai = amap[b]
assert new_ai >= ai # check for consistent input
C += A[ai:new_ai] # add non-shared elements from A
C += B[bi:i] # add non-shared elements from B
C.append(b) # add the shared element b
ai = new_ai + 1
bi = i + 1
C += A[ai:] # add remaining non-shared elements from A
C += B[bi:] # from B
return C
A = [1, 2, 'c', 4, 5, 'f', 7]
B = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
print merge(A, B)
(This is just an implementation of Anon's algorithm. Note that you can check for inconsistent input lists without hurting performance and that random access into the lists is not necessary.)

Resources