Generating Binary Permutations With Constraints - algorithm

I am working on physician scheduling application, we are using linear programming and solvers like cplex/lindo to solve our model. Due to some modelling limitations we need to generate binary patterns for just night shifts.
Typically we generate one month schedule so lets consider we need to generate patterns for 30 days for night shift.
Night shift have some constraints like if a person is coming on consecutive night shifts then physician could not work for next five days. So bellow are some examples of constrains.
111000001111100000111110000011 Valid
111000001100000000111110000011 Valid
111010001111101000111110000011 Invalid
Also there are other constraints like number of ones in pattern should be less than some defined value, number of consecutive ones should be less than some defined value etc.
First i tried simple algorithm which starts form 0 and use bitwise operator and add one to get next permutation and check the next permutations against all constraints if not valid get the next permutation and ignore the invalid pattern. As this pattern is of length 30 bits (230 = 1073741824) so number of patterns are huge to check go my simple algorithm. I guess it will take more then 24 hours to find out all valid patterns.
Now my questions are
Which algorithm shall i use for the given problem which find all permutation with constraints applied in time efficient way?
Is this problem a exact cover problem? Can i apply algorithms like dancing links to the problem i am facing?
Kindly provide some links to read about the solution you propose for this problem?

I have found very good solution in "The Art of Computer Programming - Volume 4A – Combinatorial Algorithms, Part 1 by Donald Knuth" section "7.2.1.2 Algorithm G(General permutation generator). In it the author describes technique Bypassing unwanted blocks. I am implementing the algorithm with incremental generation a tree of feasible region and bypass any infeasible path. I am going to implement in a way like for example starting with node with value 0 and this node has two children 0 and 1 and again every node has children 0 and 1 on every addition of new child node we will check with our constraints set if it fails to comply with constraint do not add the child node for example if algorithm is going to add node at level 5 and the resultant string at level 5 is 11101 and because of night shift constraint "101" at end of 11101 does not comply with night shift rule then don't add level 5 node of value 1. Keep adding the child node until we have root nodes. So eventually we will have only feasible region as we have bypass the unwanted blocks. In this way i will never touch infeasible region.

Related

How can this bipartite matching solution be improved?

I'm working through codefights and am attempting the busyHolidays challenge from the Instacart company challenges.
The challenge provides three arrays. Shoppers contains strings representing the start and end times of their shifts. Orders contains strings representing the start and end times of the orders, and leadTime contains integers representing the number of minutes it takes to complete the job.
The goal is to determine if the orders can be matched to shoppers such that each shopper has only one order and each order has a shopper. An order may only be matched to a shopper if the shopper can both begin and complete it within the order time.
I have a solution that passes 19/20 tests, but since I can't see the last test I have no idea what's going wrong. I originally spent a couple days trying to learn algorithms like Edmond's Algorithm and the Hungarian Algorithm, but my lack of CS background and weakness in math kind of bit me in the ass and I can't seem to wrap my head around how to actually implement those methodologies, so I came up with a solution that involves weighting each node on each side of the graph according to its number of possible connections. I would appreciate it if anyone could help me take a look at my solution and either point out where it might be messing up or suggest a more standard solution to the problem in a way that might be easier for someone without formal training in algorithms to understand. Thanks in advance.
I'll put the code in a gist since it's fairly length
Code: https://gist.github.com/JakeTompkins/7e1afc4722fb828f26f8f6a964774a25
Well, I don't see any reason to think that the algorithm you're writing is actually going to work so the question about how you might be messing it up doesn't seem to be relevant.
You have correctly identified this as an instance of the assignment problem. More specifically this is the "maximum bipartite matching" problem, and the Edmonds-Karp algorithm is the simpliest way to solve it (https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm)
However, this is an algorithm for finding the maximum flow in a network, which is a larger problem than simple bipartite matching, and the explanations of this algorithm are really a lot more complicated than you need. It's understandable that you had some trouble implementing this from the literature, but actually when the problem is reduced to simple (unweighted) bipartite matching, the algorithm is easy to understand:
Make an initial assignment
Try to find an improvement
Repeat until no more improvements can be found.
For bipartite matching, an "improvement" always has the same form, which is what makes this problem easy to solve. To find an improvement, you have to find a path that connects an unassigned shopper to an unassigned order, following these rules:
The path can go from any shopper to any order he/she could fulfill but does not
The path can go from any order only to the shopper that is fulfilling it in the current assignment.
You use bread-first search to find the shortest path, which will correspond to the improvement that changes the smallest number of existing assignments.
The path you find will necessarily have an odd number of edges, and the even-numbered edges will be assignments. To implement the improvement, you remove those assignments and replace them with the odd-numbered edges. There's one more of those, which is what makes it an improvement. It looks like this:
PREVIOUS PATH FOUND IMPROVED ASSIGNMENT
1 1 1
/ /
A A A
\ \
2 2 2
/ /
B B B
\ \
3 3 3
/ /
C C C

Generating minimal/irreducible Sudokus

A Sudoku puzzle is minimal (also called irreducible) if it has a unique solution, but removing any digit would yield a puzzle with multiple solutions. In other words, every digit is necessary to determine the solution.
I have a basic algorithm to generate minimal Sudokus:
Generate a completed puzzle.
Visit each cell in a random order. For each visited cell:
Tentatively remove its digit
Solve the puzzle twice using a recursive backtracking algorithm. One solver tries the digits 1-9 in forward order, the other in reverse order. In a sense, the solvers are traversing a search tree containing all possible configurations, but from opposite ends. This means that the two solutions will match iff the puzzle has a unique solution.
If the puzzle has a unique solution, remove the digit permanently; otherwise, put it back in.
This method is guaranteed to produce a minimal puzzle, but it's quite slow (100 ms on my computer, several seconds on a smartphone). I would like to reduce the number of solves, but all the obvious ways I can think of are incorrect. For example:
Adding digits instead of removing them. The advantage of this is that since minimal puzzles require at least 17 filled digits, the first 17 digits are guaranteed to not have a unique solution, reducing the amount of solving. Unfortunately, because the cells are visited in a random order, many unnecessary digits will be added before the one important digit that "locks down" a unique solution. For instance, if the first 9 cells added are all in the same column, there's a great deal of redundant information there.
If no other digit can replace the current one, keep it in and do not solve the puzzle. Because checking if a placement is legal is thousands of times faster than solving the puzzle twice, this could be a huge time-saver. However, just because there's no other legal digit now doesn't mean there won't be later, once we remove other digits.
Since we generated the original solution, solve only once for each cell and see if it matches the original. This doesn't work because the original solution could be anywhere within the search tree of possible solutions. For example, if the original solution is near the "left" side of the tree, and we start searching from the left, we will miss solutions on the right side of the tree.
I would also like to optimize the solving algorithm itself. The hard part is determining if a solution is unique. I can make micro-optimizations like creating a bitmask of legal placements for each cell, as described in this wonderful post. However, more advanced algorithms like Dancing Links or simulated annealing are not designed to determine uniqueness, but just to find any solution.
How can I optimize my minimal Sudoku generator?
I have an idea on the 2nd option your had suggested will be better for that provided you add 3 extra checks for the 1st 17 numbers
find a list of 17 random numbers between 1-9
add each item at random location provided
these new number added dont fail the 3 basic criteria of sudoku
there is no same number in same row
there is no same number in same column
there is no same number in same 3x3 matrix
if condition 1 fails move to the next column or row and check for the 3 basic criteria again.
if there is no next row (ie at 9th column or 9th row) add to the 1st column
once the 17 characters are filled, run you solver logic on this and look for your unique solution.
Here are the main optimizations I implemented with (highly approximate) percentage increases in speed:
Using bitmasks to keep track of which constraints (row, column, box) are satisfied in each cell. This makes it much faster to look up whether a placement is legal, but slower to make a placement. A complicating factor in generating puzzles with bitmasks, rather than just solving them, is that digits may have to be removed, which means you need to keep track of the three types of constraints as distinct bits. A small further optimization is to save the masks for each digit and each constraint in arrays. 40%
Timing out the generation and restarting if it takes too long. See here. The optimal strategy is to increase the timeout period after each failed generation, to reduce the chance that it goes on indefinitely. 30%, mainly from reducing the worst-case runtimes.
mbeckish and user295691's suggestions (see the comments to the original post). 25%

Optimal selection election algorithm

Given a bunch of sets of people (similar to):
[p1,p2,p3]
[p2,p3]
[p1]
[p1]
Select 1 from each set, trying to minimize the maximum number of times any one person is selected.
For the sets above, the max number of times a given person MUST be selected is 2.
I'm struggling to get an algorithm for this. I don't think it can be done with a greedy algorithm, more thinking along the lines of a dynamic programming solution.
Any hints on how to go about this? Or do any of you know any good websites about this stuff that I could have a look at?
This is neither dynamic nor greedy. Let's look at a different problem first -- can it be done by selecting every person at most once?
You have P people and S sets. Create a graph with S+P vertices, representing sets and people. There is an edge between person pi and set si iff pi is an element of si. This is a bipartite graph and the decision version of your problem is then equivalent to testing whether the maximum cardinality matching in that graph has size S.
As detailed on that page, this problem can be solved by using a maximum flow algorithm (note: if you don't know what I'm talking about, then take your time to read it now, as you won't understand the rest otherwise): first create a super-source, add an edge linking it to all people with capacity 1 (representing that each person may only be used once), then create a super-sink and add edges linking every set to that sink with capacity 1 (representing that each set may only be used once) and run a suitable max-flow algorithm between source and sink.
Now, let's consider a slightly different problem: can it be done by selecting every person at most k times?
If you paid attention to the remarks in the last paragraph, you should know the answer: just change the capacity of the edges leaving the super-source to indicate that each person may be used more than once in this case.
Therefore, you now have an algorithm to solve the decision problem in which people are selected at most k times. It's easy to see that if you can do it with k, then you can also do it with any value greater than k, that is, it's a monotonic function. Therefore, you can run a binary search on the decision version of the problem, looking for the smallest k possible that still works.
Note: You could also get rid of the binary search by testing each value of k sequentially, and augmenting the residual network obtained in the last run instead of starting from scratch. However, I decided to explain the binary search version as it's conceptually simpler.

Algorithm for filling a matrix of item, item pairs

Hey guys, I have a sort of speed dating type application (not used for dating, just a similar concept) that compares users and matches them in a round based event.
Currently I am storing each user to user comparison (using cosine similarity) and then finding a round in which both users are available. My current set up works fine for smaller scale but I seem to be missing a few matchings in larger data sets.
For example with a setup like so (assuming 6 users, 3 from each group)
Round (User1, User2)
----------------------------
1 (x1,y1) (x2,y2) (x3,y3)
2 (x1,y2) (x2,y3) (x3,y1)
3 (x1,y3) (x2,y1) (x3,y2)
My approach works well right now to ensure I have each user meeting the appropriate user without having overlaps so a user is left out, just not with larger data sets.
My current algorithm
I store a comparison of each user from x to each user from y like so
Round, user1, user2, similarity
And to build the event schedule I simply sort the comparisons by similarity and then iterate over the results, finding an open round for both users, like so:
event.user_maps.all(:order => 'similarity desc').each do |map|
(1..event.rounds).each do |round|
if user_free_in_round?(map.user1) and user_free_in_round?(map.user2)
#creates the pairing and breaks from the loop
end
end
end
This isn't exact code but the general algorithm to build the schedule. Does anyone know a better way of filling in a matrix of item pairings where no one item can be in more than one place in the same slot?
EDIT
For some clarification, the issue I am having is that in larger sets my algorithm of placing highest similarity matches first can sometimes result in collisions. What I mean by that is that the users are paired in such a way that they have no other user to meet with.
Like so:
Round (User1, User2)
----------------------------
1 (x1,y1) (x2,y2) (x3,y3)
2 (x1,y3) (x2,nil) (x3,y1)
3 (x1,y2) (x2,y1) (x3,y2)
I want to be able to prevent this from happening while preserving the need for higher similar users given higher priority in scheduling.
In real scenarios there are far more matches than there are available rounds and an uneven number of x users to y users and in my test cases instead of getting every round full I will only have about 90% or so of them filled while collisions like the above are causing problems.
I think the question still needs clarification even after edit, but I could be missing something.
As far as I can tell, what you want is that each new round should start with the best possible matching (defined as sum of the cosine similarities of all the matched pairs). After any pair (x_i,y_j) have been matched in a round, they are not eligible for the next round.
You could do this by building a bipartite graph where your Xs are nodes in one side and Ys are nodes in another side, and the edge weight is cosine similarity. Then you find the max weighted match in this graph. For the next rounds, you eliminate the edges that have already been used in previous round and run the matching algorithm again. For details on how to code max weight matching in bipartite graph, see here.
BTW, this solution is not optimum since we are proceeding from one round to next in a greedy fashion. I have a feeling that getting the optimum solution would be NP hard, but I don't have a proof so can't be sure.
I agree that the question still needs clarification. As Amit expressed, I have a gut feeling that this is an NP hard problem, so I am assuming that you are looking for an approximate solution.
That said, I would need more information on the tradeoffs you would be willing to make (and perhaps I'm just missing something in your question). What are the explicit goals of the algorithm?
Is there a lower threshold for similarity below which you don't want a pairing to happen? I'm still a bit confused as to why there would be individuals which could not be paired up at all during a given round...
Essentially, you are performing a search over the space of possible pairings, correct? Maybe you could use backtracking or some form of constraint-based algorithm to make sure that you can obtain a complete solution for a given round...?

Looking for a multidimensional optimization algorithm

Problem description
There are different categories which contain an arbitrary amount of elements.
There are three different attributes A, B and C. Each element does have an other distribution of these attributes. This distribution is expressed through a positive integer value. For example, element 1 has the attributes A: 42 B: 1337 C: 18. The sum of these attributes is not consistent over the elements. Some elements have more than others.
Now the problem:
We want to choose exactly one element from each category so that
We hit a certain threshold on attributes A and B (going over it is also possible, but not necessary)
while getting a maximum amount of C.
Example: we want to hit at least 80 A and 150 B in sum over all chosen elements and want as many C as possible.
I've thought about this problem and cannot imagine an efficient solution. The sample sizes are about 15 categories from which each contains up to ~30 elements, so bruteforcing doesn't seem to be very effective since there are potentially 30^15 possibilities.
My model is that I think of it as a tree with depth number of categories. Each depth level represents a category and gives us the choice of choosing an element out of this category. When passing over a node, we add the attributes of the represented element to our sum which we want to optimize.
If we hit the same attribute combination multiple times on the same level, we merge them so that we can stripe away the multiple computation of already computed values. If we reach a level where one path has less value in all three attributes, we don't follow it anymore from there.
However, in the worst case this tree still has ~30^15 nodes in it.
Does anybody of you can think of an algorithm which may aid me to solve this problem? Or could you explain why you think that there doesn't exist an algorithm for this?
This question is very similar to a variation of the knapsack problem. I would start by looking at solutions for this problem and see how well you can apply it to your stated problem.
My first inclination to is try branch-and-bound. You can do it breadth-first or depth-first, and I prefer depth-first because I think it's cleaner.
To express it simply, you have a tree-walk procedure walk that can enumerate all possibilities (maybe it just has a 5-level nested loop). It is augmented with two things:
At every step of the way, it keeps track of the cost at that point, where the cost can only increase. (If the cost can also decrease, it becomes more like a minimax game tree search.)
The procedure has an argument budget, and it does not search any branches where the cost can exceed the budget.
Then you have an outer loop:
for (budget = 0; budget < ... ; budget++){
walk(budget);
// if walk finds a solution within the budget, halt
}
The amount of time it takes is exponential in the budget, so easier cases will take less time. The fact that you are re-doing the search doesn't matter much because each level of the budget takes as much or more time than all the previous levels combined.
Combine this with some sort of heuristic about the order in which you consider branches, and it may give you a workable solution for typical problems you give it.
IF that doesn't work, you can fall back on basic heuristic programming. That is, do some cases by hand, and pay attention to how you did it. Then program it the same way.
I hope that helps.

Resources