Python: Combinations - algorithm

I am looking to solve a problem that involves different permutations of an array. I would like a function that checks if the array under scrutiny matches a condition, but if it does not, generates a new permutation to check, and so on, so forth. I believe this involves a while statement, so my question lies more in how to create such an algorithm to generate a unique (but not random so as to avoid duplicates) permutation upon each iteration. There exists a restriction: the array will contain at least 2 but no more than 10 elements. Additionally, if the condition matches no permutations, the return should be False I have no code thus far, as I cannot come up with the algorithm I would like to persue yet. Any thoughts would be helpful.

Why do you need to reinvent the wheel? Since you've tagged python, you should know there are a ton of libraries that help you do useful things like this. One such library is itertools, more specifically the itertools.permutations function:
>>> from itertools import permutations
>>> x = [1, 2, 3, 4, 5, 6]
>>> for p in permutations(x):
... print(p)
...
(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(2, 3, 1)
(3, 1, 2)
(3, 2, 1)
If you must write an algorithm yourself, then you should learn about the Johnson-Trotter Algorithm for generating permutations. It is quite intuitive, and generates permutations in O(n!) time.

Related

Most common subpath in a collection of paths

There is numerous literature on the Web for the longest common subsequence problem but I have a slightly different problem and was wondering if anyone knows of a fast algorithm.
Say, you have a collection of paths:
[1,2,3,4,5,6,7], [2,3,4,9,10], [3,4,6,7], ...
We see that subpath [3,4] is the most common.
Know of a neat algorithm to find this? For my case there are tens of thousands of paths!
Assuming that a "path" has to encompass at least two elements, then the most common path will obviously have two elements (although there could also be a path with more than two elements that's equally common -- more on this later). So you can just iterate all the lists and count how often each pair of consecutive numbers appears in the different lists and remember those pairs that appear most often. This requires iterating each list once, which is the minimum amount you'd have to do in any case.
If you are interested in the longest most common path, then you can start the same way, finding the most common 2-segment-paths, but additionally to the counts, also record the position of each of those segments (e.g. {(3,4): [2, 1, 0], ...} in your example, the numbers in the list indicating the position of the segment in the different paths). Now, you can take all the most-common length-2-paths and see if for any of those, the next element is also the same for all the occurrences of that path. In this case you have a most-common length-3-path that is equally common as the prior length-2 path (it can not be more common, obviously). You can repeat this for length-4, length-5, etc. until it can no longer be expanded without making the path "less common". This part requires extra work of n*k for each expansion, with n being the number of candidates left and k how often those appear.
(This assumes that frequency beats length, i.e. if there is a length-2 path appearing three times, you prefer this over a length-3 path appearing twice. The same apprach can also be used for a different starting length, e.g. requiring at least length-3 paths, without changing the basic algorithm or the complexity.)
Here's a simple example implementation in Python to demonstrate the algorithm. This only goes up to length-3, but could easily be extended to length-4 and beyond with a loop. Also, it does not check any edge-cases (array-out-of-bounds etc.)
# example data
data = [[1,2, 4,5,6,7, 9],
[1,2,3,4,5,6, 8,9],
[1,2, 4,5,6,7,8 ]]
# step one: count how often and where each pair appears
from collections import defaultdict
pairs = defaultdict(list)
for i, lst in enumerate(data):
for k, pair in enumerate(zip(lst, lst[1:])):
pairs[pair].append((i,k))
# step two: find most common pair and filter
most = max([len(lst) for lst in pairs.values()])
pairs = {k: v for k, v in pairs.items() if len(v) == most}
print(pairs)
# {(1, 2): [(0, 0), (1, 0), (2, 0)], (4, 5): [(0, 2), (1, 3), (2, 2)], (5, 6): [(0, 3), (1, 4), (2, 3)]}
# step three: expand pairs to triplets, triplets to quadruples, etc.
triples = [k + (data[v[0][0]][v[0][1]+2],)
for k, v in pairs.items()
if len(set(data[i][k+2] for (i,k) in v)) == 1]
print(triples)
# [(4, 5, 6)]

Find all sets doesn't contain any sub sets

I have a problem and I'm researching the fastest algorithm to find a set that is subset of original set (S) and doesn't contain any subsets (S1, ... Sn) of S. The set I want to find can contain some elements of Si but doesn't contain the whole.
For example, original set: S = (1, 2, 3, 4, 5), S1 = (1, 2), S2 = (1, 3)
=> longest set: (2, 3, 4, 5); other sets: (1, 4, 5), (2, 4, 5), (3, 4, 5), (1, 4),...
Anybody can give me a suggestion? Thanks!
Bad news
Consider the problem of choosing which elements to NOT include.
If we choose to NOT include element 1, we satisfy the constraints for S1 and S2.
If we choose to NOT include element 2, we satisfy the constraints for S1.
If we choose to NOT include element 3, we satisfy the constraints for S1 and S3.
So 1 gives {S1,S2}, 2 gives {S1}, 3 gives {S3}.
Your problem can be expressed as finding the minimum number of elements to NOT include such that the union of the satisfied sets (e.g. {S1,S2}) covers all of the given sets.
This is exactly the set cover problem which is NP complete.
Good news
In practice, you will probably do quite well by simply choosing the elements to NOT include based on whichever ends up covering the most sets.
This is an easy to implement greedy algorithm (although it will not always give the optimal answer).

SWI-Prolog list manipulation

As part of a program I'm writing I need to make sure a variable does not equal any number that is the result of multiplying 2 numbers in a given list. For example: I've got a list Primes = [2, 3, 5, 7, 11] and I need to make sure that X does not equal any two of those numbers multiplied together such as 6 (2*3) or 55 (5*11) etc...
The code I have is as follows:
list(Numbers):-
Numbers = [X, Y, Sum],
between(3,6,Y),
between(3,6,X),
Primes = [2, 3, 5, 7, 11],
Sum is X+Y,
(Code i need help with)
The above code wiill type out results of [3,3,6], [4,3,7], [5,3,8] and so on. Now what I want is to be able to identify when sum is equal to a prime * prime and exclude that from the results. Something like Sum \= prime * prime. However, I don't know how to loop through the elements in Prime in order to multiply two elements together and then do that for all element in the list.
Hope this makes sense; im not great at explaining things.
Thanks in advance.
This is inefficient, but easy to code:
...
forall((nth(I,Primes,X),nth(J,Primes,Y),J>I), Sum =\= X*Y).
I think you could use that loop to initialize a list of precomputed factors, then use memberchk/2.
In SWI-Prolog use nth1/3 instead of nth/3

Sum of cells with several possibilities

I'm programming a Killer Sudoku Solver in Ruby and I try to take human strategies and put them into code. I have implemented about 10 strategies but I have a problem on this one.
In killer sudoku, we have "zones" of cells and we know the sum of these cells and we know possibilities for each cell.
Example :
Cell 1 can be 1, 3, 4 or 9
Cell 2 can be 2, 4 or 5
Cell 3 can be 3, 4 or 9
The sum of all cells must be 12
I want my program to try all possibilities to eliminate possibilities. For instance, here, cell 1 can't be 9 because you can't make 3 by adding two numbers possible in cells 2 and 3.
So I want that for any number of cells, it removes the ones that are impossible by trying them and seeing it doesn't work.
How can I get this working ?
There's multiple ways to approach the general problem of game solving, and emulating human strategies is not always the best way. That said, here's how you can solve your question:
1st way, brute-forcy
Basically, we want to try all possibilities of the combinations of the cells, and pick the ones that have the correct sum.
cell_1 = [1,3,4,9]
cell_2 = [2,4,5]
cell_3 = [3,4,9]
all_valid_combinations = cell_1.product(cell_2,cell_3).select {|combo| combo.sum == 12}
# => [[1, 2, 9], [3, 5, 4], [4, 4, 4], [4, 5, 3]]
#.sum isn't a built-in function, it's just used here for convenience
to pare this down to individual cells, you could do:
cell_1 = all_valid_combinations.map {|combo| combo[0]}.uniq
# => [1, 3, 4]
cell_2 = all_valid_combinations.map {|combo| combo[1]}.uniq
# => [2, 5, 4]
. . .
if you don't have a huge large set of cells, this way is easier to code. it can get a bit inefficienct though. For small problems, this is the way I'd use.
2nd way, backtracking search
Another well known technique takes the problem from the other approach. Basically, for each cell, ask "Can this cell be this number, given the other cells?"
so, starting with cell 1, can the number be 1? to check, we see if cells 2 and 3 can sum to 11. (12-1)
* can cell 2 have the value 2? to check, can cell 3 sum to 9 (11-1)
and so on. In very large cases, where you could have many many valid combinations, this will be slightly faster, as you can return 'true' on the first time you find a valid number for a cell. Some people find recursive algorithms a bit harder to grok, though, so your mileage may vary.

How to determine best combinations from 2 lists

I'm looking for a way to make the best possible combination of people in groups. Let me sketch the situation.
Say we have persons A, B, C and D. Furthermore we have groups 1, 2, 3, 4 and 5. Both are examples and can be less or more. Each person gives a rating to each other person. So for example A rates B a 3, C a 2, and so on. Each person also rates each group. (Say ratings are 0-5). Now I need some sort of algorithm to distribute these people evenly over the groups while keeping them as happy as possible (as in: They should be in a highrated group, with highrated people). Now I know it's not possible for the people to be in the best group (the one they rated a 5) but I need them to be in the best possible solution for the entire group.
I think this is a difficult question, and I would be happy if someone could direct me to some more information about this types of problems, or help me with the algo I'm looking for.
Thanks!
EDIT:
I see a lot of great answers but this problem is too great for me too solve correctly. However, the answers posted so far give me a great starting point too look further into the subject. Thanks a lot already!
after establishing this is NP-Hard problem, I would suggest as a heuristical solution: Artificial Intelligence tools.
A possible approach is steepest ascent hill climbing [SAHC]
first, we will define our utility function (let it be u). It can be the sum of total happiness in all groups.
next,we define our 'world': S is the group of all possible partitions.
for each legal partition s of S, we define:
next(s)={all possibilities moving one person to a different group}
all we have to do now is run SAHC with random restarts:
1. best<- -INFINITY
2. while there is more time
3. choose a random partition as starting point, denote it as s.
4. NEXT <- next(s)
5. if max{ U(NEXT) } < u(s): //s is the top of the hill
5.1. if u(s) > best: best <- u(s) //if s is better then the previous result - store it.
5.2. go to 2. //restart the hill climbing from a different random point.
6. else:
6.1. s <- max{ NEXT }
6.2. goto 4.
7. return best //when out of time, return the best solution found so far.
It is anytime algorithm, meaning it will get a better result as you give it more time to run, and eventually [at time infinity] it will find the optimal result.
The problem is NP-hard: you can reduce from Maximum Triangle Packing, that is, finding at least k vertex-disjoint triangles in a graph, to the version where there are k groups of size 3, no one cares about which group he is in, and likes everyone for 0 or for 1. So even this very special case is hard.
To solve it, I would try using an ILP: have binary variables g_ik indicating that person i is in group k, with constraints to ensure a person is only in one group and a group has an appropriate size. Further, binary variables t_ijk that indicate that persons i and j are together in group k (ensured by t_ijk <= 0.5 g_ik + 0.5 g_jk) and binary variables t_ij that indicate that i and j are together in any group (ensured by t_ij <= sum_k t_ijk). You can then maximize the happiness function under these constraints.
This ILP has very many variables, but modern solvers are pretty good and this approach is very easy to implement.
This is an example of an optimization problem. It is a very well
studied type of problems with very good methods to solve them. Read
Programming Collective Intelligence which explains it much better
than me.
Basically, there are three parts to any kind of optimization problem.
The input to the problem solving function.
The solution outputted by the problem solving function.
A scoring function that evaluates how optimal the solution is by
scoring it.
Now the problem can be stated as finding the solution that produces
the highest score. To do that, you first need to come up with a format
to represent a possible solution that the scoring function can then
score. Assuming 6 persons (0-5) and 3 groups (0-2), this python data structure
would work and would be a possible solution:
output = [
[0, 1],
[2, 3],
[4, 5]
]
Person 0 and 1 is put in group 0, person 2 and 3 in group 1 and so
on. To score this solution, we need to know the input and the rules for
calculating the output. The input could be represented by this data
structure:
input = [
[0, 4, 1, 3, 4, 1, 3, 1, 3],
[5, 0, 1, 2, 1, 5, 5, 2, 4],
[4, 1, 0, 1, 3, 2, 1, 1, 1],
[2, 4, 1, 0, 5, 4, 2, 3, 4],
[5, 5, 5, 5, 0, 5, 5, 5, 5],
[1, 2, 1, 4, 3, 0, 4, 5, 1]
]
Each list in the list represents the rating the person gave. For
example, in the first row, the person 0 gave rating 0 to person 0 (you
can't rate yourself), 4 to person 1, 1 to person 2, 3 to 3, 4 to 4 and
1 to person 5. Then he or she rated the groups 0-2 3, 1 and 3
respectively.
So above is an example of a valid solution to the given input. How do
we score it? That's not specified in the question, only that the
"best" combination is desired therefore I'll arbitrarily decide that
the score for a solution is the sum of each persons happiness. Each
persons happiness is determined by adding his or her rating of the
group with the average of the rating for each person in the group,
excluding the person itself.
Here is the scoring function:
N_GROUPS = 3
N_PERSONS = 6
def score_solution(input, output):
tot_score = 0
for person, ratings in enumerate(input):
# Check what group the person is a member of.
for group, members in enumerate(output):
if person in members:
# Check what rating person gave the group.
group_rating = ratings[N_PERSONS + group]
# Check what rating the person gave the others.
others = list(members)
others.remove(person)
if not others:
# protect against zero division
person_rating = 0
else:
person_ratings = [ratings[o] for o in others]
person_rating = sum(person_ratings) / float(len(person_ratings))
tot_score += group_rating + person_rating
return tot_score
It should return a score of 37.0 for the given solution. Now what
we'll do is to generate valid outputs while keeping track of which one
is best until we are satisfied:
from random import choice
def gen_solution():
groups = [[] for x in range(N_GROUPS)]
for person in range(N_PERSONS):
choice(groups).append(person)
return groups
# Generate 10000 solutions
solutions = [gen_solution() for x in range(10000)]
# Score them
solutions = [(score_solution(input, sol), sol) for sol in solutions]
# Sort by score, take the best.
best_score, best_solution = sorted(solutions)[-1]
print 'The best solution is %s with score %.2f' % (best_solution, best_score)
Running this on my computer produces:
The best solution is [[0, 1], [3, 5], [2, 4]] with score 47.00
Obviously, you may think it is a really stupid idea to randomly just
generate solutions to throw at the problem, and it is. There are much
more sophisticated methods to generate solutions such as simulated
annealing or genetic optimization. But they all build upon the same
framework as given above.

Resources