Finding all possible combinations of row in a matrix where sum of columns represents a specific row vector - algorithm

I need to find out all possible combinations of row in a matrix where sum of columns represents a specific row matrix.
Example:
Consider the following matrix
| 0 0 2 |
| 1 1 0 |
| 0 1 2 |
| 1 1 2 |
| 0 1 0 |
| 2 1 2 |
I need to get the following row matrix from where sum of columns:
| 2 2 2 |
The possible combination were:
1.
| 1 1 0 |
| 1 1 2 |
2.
| 0 1 0 |
| 2 1 2 |
What is the best way to find out that.

ALGORITHM
One option is to turn this into the subset sum problem by choosing a base b and treating each row as a number in base b.
For example, with a base of 10 your initial problem turns into:
Consider the list of numbers
002
110
012
112
010
212
Find all subsets that sum to 222
This problem is well known and is solvable via dynamic programming (see the wikipedia page).
If all your entries are nonnegative, then you can use David Psinger's linear time algorithm which has complexity O(nC) where C is the target number and n is the length of your list.
CHOICE OF BASE
The complexity of the algorithm is determined by the choice of the base b.
For the algorithm to be correct you need to choose the base larger than the sum of all the digits in each column. (This is needed to avoid solving the problem due to an overflow from one digit into the next.)
However, note that if you choose a smaller base you will still get all the correct solutions, plus some incorrect solutions. It may be worth considering using a smaller base (which will make the subset sum algorithm work much faster), followed by a postprocessing stage that checks all the solutions found and discards any incorrect ones.
Too small a base will produce an exponential number of incorrect solutions to discard, so the best size of base will depend on the details of your problem.
EXAMPLE CODE
Python code to implement this algorithm.
from collections import defaultdict
A=[ [0, 0, 2],
[1, 1, 0],
[0, 1, 2],
[1, 1, 2],
[0, 1, 0],
[2, 1, 2] ]
target = [2,2,2]
b=10
def convert2num(a):
t=0
for d in a:
t+=b*t+d
return t
B = [convert2num(a) for a in A]
M=defaultdict(list)
for v,a in zip(B,A):
M[v].append(a) # Store a reverse index to allow us to look up rows
# First build the DP array
# Map from number to set of previous numbers
DP = defaultdict(set)
DP[0] = set()
for v in B:
for old_value in DP.keys():
new_value = old_value+v
if new_value<=target:
DP[new_value].add(v)
# Then search for solutions
def go(goal,sol):
if goal==0:
# Double check
assert map(sum,zip(*sol[:]))==target
print sol
return
for v in DP[goal]:
for a in M[v]:
sol.append(a)
go(goal-v,sol)
sol.pop()
go(convert2num(target),[])
This code assumes that b has been chosen large enough to avoid overflow.

Related

Fast calculation of probability distribution in board game Da Vinci Code

I'm interested in efficiently calculating the probability distribution over possible secret numbers given what one can observe of the opponents' hand (and your own hand) in the board game Da Vinci Code. A link to the game here: https://boardgamegeek.com/boardgame/8946/da-vinci-code
I have abstracted the problem into the following:
You are given an array A of length N and a finite set of numbers Si for each index i of the array. Now,
we are to place a number from Si at each index i to fill the entire array A;
while ensuring that the number is unique across the entire array A;
and for 3 disjoint subarrays A1, A2, A3 of A such that concat(A1, A2, A3) = A, the numbers in each subarray must follow a strictly increasing order;
given all the possible numbers to form A that satisfy the above constraints, what is the probability ditribution over each number at each index?
Here I provide an example below:
Assuming we have the following array of length 5 with each column representing Si at the index of the column
| 6 6 | 6 6 | 6 |
| 5 | 5 | |
| 4 4 | | 4 |
| | 3 3 | |
| 2 | 2 2 | |
| 1 1 | | |
| ___ | __ | _ |
| A1 | A2 | A3|
The set of all possible arrays are:
14236
14256
14356
15234
15236
15264
15364
16234
16254
16354
24356
25364
26354
45236
Therefore the probability distribution over each number [1-6] at each index is:
6 0 4/14 0 3/14 6/14
5 0 6/14 0 6/14 0
4 1/14 4/14 0 0 8/14
3 0 0 6/14 5/14 0
2 3/14 0 8/14 0 0
1 10/14 0 0 0 0
___________ __________ ______
A1 A2 A3
Brute forcing this problem is obviously doable but I have a gut feeling that there must be some more efficient algorithms for this.
The reason why I think so is due to the fact that one can derive the probability distribution from the set of all possibilities but not the other way around, so the distribution itself must contain less information than the set of all possibilities have. Therefore, I believe that we do not need to generate all possibilites just to obtain the probability distribution.
Hence, I am wondering if there is any smart matrix operation we could use for this problem or even fixed-point iteration/density evolution to approximate the end probability distribution? Some other potentially more efficient approaches to this problem are also appreciated.
Edit: By brute-force, I mean specifically enumerating all possibilities with constraint propagation like in sudoku. My hope is to obtain an accurate solution, or a approximate solution that approximates well (better than plain monte carlo), that works better than CP in terms of running time.
Edit2: The better solution I desire should have the characteristic that it does not need to generate all possibilities to obtain or approximate the probability distribution.
Did you consider Constraint Propagation?
When you assign a number to a position, that number cannot appear in any other position, so exclude that number from the remaining positions
When you assign a number in the first column of a subarray, the second column must contain a larger value, so exclude all values that are lower or equal
With a BF approach in your example the code would generate and check 4 * 4 * 3 * 4 * 2 = 384 possibilities; with the CP approach we only generate 65 possibilities.
Here is a sample Python implementation:
from dataclasses import dataclass, field
from typing import Dict, List
#dataclass
class DaVinci:
grid : List[List[int]]
top : int
lastcol : int = 0
solved : List = field(default_factory=list)
count : int = 0
distrib : List[Dict[int,int]] = field(init=False)
def __post_init__(self):
self.lastcol = len(self.grid)-1
self.distrib = [{x:0 for x in range(1,self.top+1)} for y in range(len(self.grid))]
self.solve_next(current = 0, even = True, blocked = [], minval = 0, solving = [])
self.count = len(self.solved)
def solve_next(self, current, even, blocked, minval, solving):
found = False
for n in self.grid[current]:
if n not in blocked and n > minval:
if current != self.lastcol:
self.solve_next(current + 1, not even, blocked + [n], n * even, solving + [n])
else:
for col in range(self.lastcol):
self.distrib[col][solving[col]] += 1
self.distrib[self.lastcol][n] += 1
self.solved.append(solving + [n])
def show_solved(self):
for sol in self.solved:
print(''.join(map(str,sol)))
def show_distrib(self):
for i in range(1, self.top+1):
print(i, end = ' ')
for col in range(len(self.grid)):
print(f'{self.distrib[col][i]:2d}/{self.count}', end = ' ')
print()
dv = DaVinci([[1,2,4,6],[1,4,5,6],[2,3,6],[2,3,5,6],[4,6]], 6)
dv.show_solved()
14236
14256
14356
15234
15236
15264
15364
16234
16254
16354
24356
25364
26354
45236
dv.show_distrib()
1 10/14 0/14 0/14 0/14 0/14
2 3/14 0/14 8/14 0/14 0/14
3 0/14 0/14 6/14 5/14 0/14
4 1/14 4/14 0/14 0/14 8/14
5 0/14 6/14 0/14 6/14 0/14
6 0/14 4/14 0/14 3/14 6/14
A simple idea to get an approximation for the distribution is to use a Monte Carlo approach.
Set a variable total: = 0 and a matrix M[N][Q] with all entries initially set to zero (Q is the total of numbers allowed).
Fix a positive integer K. Perform K iterations. At each iteration, for each i in [1..N], take a random element from Si and fill the array A. When the array A is all filled, verify in O(N) if it satisfies your conditions. If so, increment by one the variable total and iterate through the array, incrementing the matrix entries M[i][A[i]] by one, for i in [1..N].
In the end, iterate through all the elements of the matrix M in O(N Q) and divide its elements by total to get an approximation for the distribution.
Total time complexity is O(N (K + Q)).
You can also precalculate stuff to make the approximation more precise. For example, you can precalculate all increasing sequences in the groups A1, A2 and A3. Put them in arrays I1, I2, I3. Then, at each iteration, instead of taking random elements from each Si, you take random sequences from I1, I2 and I3 and verify if the concatenation has no repeated elements (in O(N)). If so, proceed as before. The total time complexity (apart from the expensive precalculation) remains O(N (K + Q)).
Start by converting all legal subarray selections into bitvectors.
E.g., for A2 we have [2,3], [2,5], [2,6], [3,5], [3,6]
[2,3] as a bitvector is 000110
[3,5] is 010100
Next, arrange your three subarrays by the number of bitvectors they have.
Next, put these in a hash for each subarray/member combination except the smallest subarray. Use the smallest set bit as the key.
E.g. For [2,3] in A2, we'd have {2 => 000110}
Note that the values of the map to be in an array since there will be multiple bitvectors for each index/element combo.
Finally,
For every bitvec of subarray_small:
For every non-set bit of that bitvec
Find the list that has that bit as a key in subarray_medium
For every bitvec in this list
Check if the inverse of (bitvec_small | bitvec_medium) is in the hash for subarray_large.
If it is, we have a valid arrangement; update your frequency counts.

Practical algorithms for permuting external memory

On a spinning disk, I have N records that I want to permute. In RAM, I have an array of N indices that contain the desired permutation. I also have enough RAM to hold n records at a time. What algorithm can I use to execute the permutation on disk as quickly as possible, taking into account the fact that sequential disk access is a lot faster?
I have plenty of excess disk to use for intermediate files, if desired.
This is a known problem. Find the cycles in your permutation order. For instance, given five records to permute [1, 0, 3, 4, 2], you have cycles (0, 1) and (2, 3, 4). You do this by picking an unused starting position; follow the index pointers until you return to your starting point. The sequence of pointers describes a cycle.
You then permute the records with an internal temporary variable, one record long.
temp = disk[0]
disk[0] = disk[1]
disk[1] = temp
temp = disk[2]
disk[2] = disk[3]
disk[3] = disk[4]
disk[4] = temp
Note that you can also perform the permutation as you traverse the pointers. You will also need some method to recall which positions have already been permuted, such as clearing the permutation index (set it to -1).
Can you see how to generalize that?
This is an problem with interval coordination. I'll simplify the notation slightly by changing the memory available to M records -- having upper- and lower-case N is a little confusing.
First, we re-cast the permutations as a series of intervals, the rotational span during which a record needs to reside in RAM. If a record needs to be written to a lower-numbered position, we increase the endpoint by the list size, to indicate the wraparound -- have to wait for the next disk rotation. For instance, using my earlier example, we expand the list:
[1, 0, 3, 4, 2]
0 -> 1
1 -> 0+5
2 -> 3
3 -> 4
4 -> 2+5
Now, we apply standard greedy scheduling resolution. First, sort by endpoint:
[0, 1]
[2, 3]
[3, 4]
[1, 5]
[4, 7]
Now, apply the algorithm for M-1 "lanes"; the extra one is needed for swap space. We fill each lane, appending the interval with the earliest endpoint, whose start-point doesn't overlap:
[0, 1] [2, 3] [3, 4] [4, 7]
[1, 5]
We can do this in a total of 7 "ticks" if M >= 3. If M=2, we defer the second lane by 2 rotations to [11, 15].
Sneftal's nice example gives us more troubles, with deeper overlap:
[0, 4]
[1, 5]
[2, 6]
[3, 7]
[4, 0+8]
[5, 1+8]
[6, 2+8]
[7, 3+8]
This requires 4 "lanes" if available, deferring lanes as needed if M < 5.
The pathological case is where every record in the permutation needs to be copied back one position, such as [3, 0, 1, 2], with M=2.
[0, 3]
[1, 4]
[2, 5]
[3, 6]
In this case, we walk through the deferral cycle multiple times. At the end of every rotation, we have to defer all remaining intervals by one rotation, resulting in
[0, 3] [3, 6] [2+4, 5+4] [1+4+4, 4+4+4]
Does that get you moving, or do you need more detail?
I have an idea, which might need further improvement. But here it goes:
suppose the hdd has the following structure:
5 4 1 2 3
And we want to write out this permutation:
2 3 5 1 4
Since hdd is a circular buffer, and assuming it can only rotate in one direction, we can write the above permutation using shifts as such:
5 >> 2
4 >> 3
1 >> 1
2 >> 2
3 >> 2
So let's put that in an array, and since we know it is a circular array, lets put its mirrors side by side:
| 2 3 1 2 2 | 2 3 1 2 2| 2 3 1 2 2 | 2 3 1 2 2 |... Inf
Since we want to favor sequential reads, (or writes) we can put a cost function to the above series. Let the cost function be linear, i. e:
0 1 2 3 4 5 6 7 8 9 10 ... Inf
Now, let us add the cost function to the above series, but how to select the starting point?
The idea is to select the starting point such that you get the maximum congruent monotonically increasing sequence.
For example, if you select the 0 point to be on "3", you'll get
(1) | - 3 2 4 5 | 6 8 7 9 10 | ...
If you select the 0 point to be on "2", the one just right of "1", you'll get:
(2) | - - - 2 3 | 4 6 5 7 8 | ...
Since we are trying to favor consecutive reads, lets define our read-write function to work as such:
f():
At any currently pointed hdd location, function will read the currently pointed hdd file, into available RAM. (namely, total space - 1, because we want to save 1 for swap)
If no available space is left on RAM for read, the function will assert and program will halt.
At any current hdd location, if ram holds the value that we want to be written in that hdd location, function reads the current file into swap space, writes the wanted value from the ram to hdd, and destroys the value in ram.
If a value is placed into hdd, function will check if the sequence is completed. If it is, program will return with success.
Now, we should note that if the following holds:
shift amount <= n - 1 (n : available memory we can hold)
We can traverse the hard disk in once pass using the above function. For example:
current: 4 5 6 7 0 1 2 3
we want: 0 1 2 3 4 5 6 7
n : 5
We can start anywhere we want, say from the initial "4". We read 4 items sequentially, (n has 4 items now) and we start placing from 0 1 2 3, (we can because n = 5 total, and 4 is used. 1 is used for swap). So the total operations is 4 consecutive reads, and then r-w operations for 8 times.
Using that analogy, it becomes clear that if we subtract "n-1" from equations (1) and (2), the positions which have value "<= 0" will be a better suit for initial position because the ones higher than zero will definitely require another pass.
So we select eq. (2) and subtract, for let's say "n = 3", we subtract 2 from eq. (2):
(2) | - - - 0 1 | 2 4 3 5 6 | ...
Now it is clear that, using f(), and starting from 0, assuming n = 3, we will have a starting operation as such: r, r, r-w, r-w, ...
So, how do we do the rest and find minimum cost? We will place an array with initial minimum cost, just below equation (2). The positions in that array will signify where we want f() to be executed.
| - - - 0 1 | 2 4 3 5 6 | ...
| - - - 1 1 | 1 1 1 1 1 | ...
The second array, the ones with 1's and 0's tell the program where to execute f(). Note that, if we assumed those locations wrong, f() will assert.
Before we start actually placing files into hdd, we of course want to see if the f() positions are correct. We check if there are assertions, we we will try to minimize cost whilst removing all assertions. So, e.g:
(1) 1111000000000000001111
(2) 1111111000000000000000
(1) obviously has higher cost that (2). So the question simplifies on finding the 1-0 array.
Some ideas on finding the best array:
Simplest solution is to write out all 1's and turn assertions into 0's. (essentially it's a skip). This method is guaranteed to work.
Brute force: write an array of as shown in (2) and start shifting 1's to right, in such an order that tries out every permutation available:
1111111100000000
1111111010000000
1111110110000000
...
Full random approach: Plug in mt1997 and start permuting. Whenever you see a sharp drop in cost, stop executing and implement hdd copy-paste. You won't find the global minimum, but you'll get a nice trade-off.
Genetic algorithms: For permutations where "shift count is much lower than n - 1", the methodology provided in this answer should (?) provide a global minimum and smooth gradients. This allows one to use genetic algorithms without relying on mutations too much.
One advantage I find in this approach is that, since OP mentioned that this is a real life problem, the method provides an easy(ier?) way to change cost functions. It is easier to detect the effect of say, having lots of contigous small files to be copied vs. having a single huge file. Or perhaps rrwwrrww is better than rrrrwwww?
Does any of this even make sense? We will have to try out ...

hash for particular array

I have a very particular problem that I want to solve efficiently.
A geometry is defined by V volumes, numbered from 0 to V-1.
Each volume is bounded by different surfaces, numbered from 0 to N-1).
Volume | Surfaces
--------------------
Geometry A (V=2, N=7): 0 | [0 3 5 6 2]
1 | [5 4 2 1]
2 | [4 0 1 3 6]
Note that a surface will only appear once in a volume.
Also, a surface is at most in 2 volumes of a geometry.
Here is the problem:
I have two different descriptions of the same underlying geometry and I want to find which volume in Geometry A correspond to which volume in Geometry B. In other words, I have the same N surfaces, but the V volumes are defined differently.
Here is a Geometry B that could correspond to Geometry A above:
Volume | Surfaces
--------------------
Geometry B (V=2, N=7): 0 | [1 5 4 2]
1 | [3 6 5 0 2]
2 | [0 1 3 6 4]
Given Geometry A and B, I want to be able to bind each volume of Geometry A to its corresponding volume in Geometry B, the most efficiently as possible.
A 0 1 2
B 1 0 2
Draft of solution:
Sort each array of surfaces in ascending or descending order, than sort each volume following the lexicographic order of their surfaces. The problem is easily and robustly solved this way.
Better solution:
Compute a quick, unique hash for each array, than sort volumes following this hash. The hash should not depend on the order of surfaces in the array.
Why do I think a hash can be a good solution ?
Take hash(Volume) = min([Surfaces])
This hash already has at most 1 collision, because a surface can only appear in 2 volumes !
Now, if I take hash(Volume) = min([Surfaces]) + max([Sufaces])*N, I still have at most 1 collision, but the probability becomes very small when there is a lot of volumes and surfaces.
As mentioned, your solution is a good approximation for what you want. However, if you seek a perfect hash function, you can use the following method:
suppose p_i is the i-th prime number such that p_0 = 2, p_1 = 3, p_2 = 5, p_3 = 7, p_4 = 11, p_5 = 13, p_6 = 17, p_7 = 19 .... We can define a hash function on x_0, x_1, ..., x_k from an array such that h(x_0, ..., x_k) = p_{x_0} p_{x_1} ... p_{x_k}. Also, for the repeated numbers, we can apply the number of repetition as a power of the p_{x_i}. It means, for example, if x_i is repeated 3 times, the power of p_{x_i} in h would be p_{x_i}^3. if number of repetition of x_i is a_i we will have h(x_0, ..., x_k) = p_{x_0}^{a_0} p_{x_1}^{a_1} ... p_{x_k}^{a_k}.
Hence, for geometry A we have:
Volume | Surfaces | Hash
----------------------------------
geometry A 0 | [0, 3, 5, 6, 2] | 2 * 7 * 13 * 17 * 5 = 15470
1 | [5, 4, 2, 1] | 13 * 11 * 5 * 3 = 2145
2 | [4, 0, 1, 3, 6] | 11 * 2 * 3 * 7 * 17 = 7854
And the similar way for geometry B. As this function returns a unique value for each array (without concern with the order) you can arrange the surfaces using the correspondence hash value. If the value of N is not big, you can use the precomputed list of prime values.
I found a pretty good hash function, that should almost never have collisions:
V: [S_0 S_1 S_2 S_3...S_N-1]
u64 hash(V) = 0;
for i in {0..N-1} :
hash(V) = hash(V) ^ (1<<(S_i & 63))
end
This gives a unique 64 bit number, and all numbers are possible (unlike Omg's solution, where most numbers are impossible to get given that there is no repetition in the list of surface)
In the extreme case where there is a collision (which I will see after sorting), I will compare the arrays lexicographically in a stupid manner.

Getting unique numbers efficiently?

The problem in question is to check an equation, example:
a * b / c + d = x
Where a - d are unique numbers from 1 - 4.
Now obviously you could run through all the possibilities, then skip when it has duplicates, but this would be O(n^n), when you should be able to solve this with O(n!) but I can't figure out how.
Is there an algorithm for this?
Based on your wording ("a - d are unique numbers from 1 - 4") it seems that you want is an algorithm for generating all possible permutations of a set of numbers - in your case the set {1, 2, 3, 4}.
Given the size of the set, the algorithm is best implemented recursively: for every element in the set (from left to right) generate all the permutations of the remaining elements. Note that once you get to the last element, there's obviously only one possible order.
This approach basically reduces the problem of finding the permutations of N items to finding the permutations of N-1 items.
Here's how it would look like on the set {1, 2, 3} which we expect will have six permutations:
{1, 2, 3}
1 | {2, 3}
1 | 2 | {3}
1 | 2 | 3
1 | 3 | {2}
1 | 3 | 2
2 | {1, 3}
2 | 1 | {3}
2 | 1 | 3
2 | 3 | {1}
2 | 3 | 1
3 | {1, 2}
3 | 1 | {2}
3 | 1 | 2
3 | 2 | {1}
3 | 2 | 1
So this algorithm gives us the following six permutations:
123, 132, 213, 231, 312, 321
You can find a lot of information on the Wikipedia article on Permutations, including permutation generation algorithms.
I don´t see any algorythm different from a factorial formula to efficiently solve this. And you not mentioned if B and C should be avoided in the repetition too.
Anyway, I guess if won´t have to consider D with the same values of A, since, unavoidable, there are similar results between them.
I believe the best way to solve this witout a factorial formula is really using a LOOKUP TABLE (try a BYTE ARRAY to speed), where you need to find each repeated value to discard it.
Again: I´m not considering repetitions between B and C.

How can I maximally partition a set?

I'm trying to solve one of the Project Euler problems. As a consequence, I need an algorithm that will help me find all possible partitions of a set, in any order.
For instance, given the set 2 3 3 5:
2 | 3 3 5
2 | 3 | 3 5
2 | 3 3 | 5
2 | 3 | 3 | 5
2 5 | 3 3
and so on. Pretty much every possible combination of the members of the set. I've searched the net of course, but haven't found much that's directly useful to me, since I speak programmer-ese not advanced-math-ese.
Can anyone help me out with this? I can read pretty much any programming language, from BASIC to Haskell, so post in whatever language you wish.
Have you considered a search tree? Each node would represent a choice of where to put an element and the leaf nodes are answers. I won't give you code because that's part of the fun of Project Euler ;)
Take a look at:
The Art of Computer Programming, Volume 4, Fascicle 3: Generating All Combinations and Partitions
7.2.1.5. Generating all set partitions
In general I would look at the structure of the recursion used to compute the number of configurations, and build a similar recursion for enumerating them. Best is to compute a one-to-one mapping between integers and configurations. This works well for permutations, combinations, etc. and ensures that each configuration is enumerated only once.
Now even the recursion for the number of partitions of some identical items is rather complicated.
For partitions of multisets the counting amounts to solving the generalization of Project Euler problem 181 to arbitrary multisets.
Well, the problem has two aspects.
Firsty, the items can be arranged in any order. So for N items, there are N! permutations (assuming the items are treated as unique).
Secondly, you can envision the grouping as a bit flag between each item indicating a divide. There would be N-1 of these flags, so for a given permutation there would be 2^(N-1) possible groupings.
This means that for N items, there would be a total of N!*(2^(N-1)) groupings/permutations, which gets big very very fast.
In your example, the top four items are groupings of one permutation. The last item is a grouping of another permutation. Your items can be viewed as :
2 on 3 off 3 off 5
2 on 3 on 3 off 5
2 on 3 off 3 on 5
2 on 3 on 3 on 5
2 off 5 on 3 off 3
The permutations (the order of display) can be derived by looking at them like a tree, as mentioned by the other two. This would almost certainly involve recursion, such as here.
The grouping is independent of them in many ways. Once you have all the permutations, you can link them with the groupings if needed.
Here is the code you need for this part of your problem:
def memoize(f):
memo={}
def helper(x):
if x not in memo:
memo[x]=f(x)
return memo[x]
return helper
#memoize
def A000041(n):
if n == 0: return 1
S = 0
J = n-1
k = 2
while 0 <= J:
T = A000041(J)
S = S+T if k//2%2!=0 else S-T
J -= k if k%2!=0 else k//2
k += 1
return S
print A000041(100) #the 100's number in this series, as an example
I quickly whipped up some code to do this. However, I left out separating every possible combination of the given list, because I wasn't sure it was actually needed, but it should be easy to add, if necessary.
Anyway, the code runs quite well for small amounts, but, as CodeByMoonlight already mentioned, the amount of possibilities gets really high really fast, so the runtime increases accordingly.
Anyway, here's the python code:
import time
def separate(toseparate):
"Find every possible way to separate a given list."
#The list of every possibility
possibilities = []
n = len(toseparate)
#We can distribute n-1 separations in the given list, so iterate from 0 to n
for i in xrange(n):
#Create a copy of the list to avoid modifying the already existing list
copy = list(toseparate)
#A boolean list indicating where a separator is put. 'True' indicates a separator
#and 'False', of course, no separator.
#The list will contain i separators, the rest is filled with 'False'
separators = [True]*i + [False]*(n-i-1)
for j in xrange(len(separators)):
#We insert the separators into our given list. The separators have to
#be between two elements. The index between two elements is always
#2*[index of the left element]+1.
copy.insert(2*j+1, separators[j])
#The first possibility is, of course, the one we just created
possibilities.append(list(copy))
#The following is a modification of the QuickPerm algorithm, which finds
#all possible permutations of a given list. It was modified to only permutate
#the spaces between two elements, so it finds every possibility to insert n
#separators in the given list.
m = len(separators)
hi, lo = 1, 0
p = [0]*m
while hi < m:
if p[hi] < hi:
lo = (hi%2)*p[hi]
copy[2*lo+1], copy[2*hi+1] = copy[2*hi+1], copy[2*lo+1]
#Since the items are non-unique, some possibilities will show up more than once, so we
#avoid this by checking first.
if not copy in possibilities:
possibilities.append(list(copy))
p[hi] += 1
hi = 1
else:
p[hi] = 0
hi += 1
return possibilities
t1 = time.time()
separations = separate([2, 3, 3, 5])
print time.time()-t1
sepmap = {True:"|", False:""}
for a in separations:
for b in a:
if sepmap.has_key(b):
print sepmap[b],
else:
print b,
print "\n",
It's based on the QuickPerm algorithm, which you can read more about here: QuickPerm
Basically, my code generates a list containing n separations, inserts them into the given list and then finds all possible permutations of the separations in the list.
So, if we use your example we would get:
2 3 3 5
2 | 3 3 5
2 3 | 3 5
2 3 3 | 5
2 | 3 | 3 5
2 3 | 3 | 5
2 | 3 3 | 5
2 | 3 | 3 | 5
In 0.000154972076416 seconds.
However, I read through the problem description of the problem you are doing and I see how you are trying to solve this, but seeing how quickly the runtime increases I don't think that it would work as fast you would expect. Remember that Project Euler's problems should solve in around a minute.

Resources