Generating minimal/irreducible Sudokus - algorithm

A Sudoku puzzle is minimal (also called irreducible) if it has a unique solution, but removing any digit would yield a puzzle with multiple solutions. In other words, every digit is necessary to determine the solution.
I have a basic algorithm to generate minimal Sudokus:
Generate a completed puzzle.
Visit each cell in a random order. For each visited cell:
Tentatively remove its digit
Solve the puzzle twice using a recursive backtracking algorithm. One solver tries the digits 1-9 in forward order, the other in reverse order. In a sense, the solvers are traversing a search tree containing all possible configurations, but from opposite ends. This means that the two solutions will match iff the puzzle has a unique solution.
If the puzzle has a unique solution, remove the digit permanently; otherwise, put it back in.
This method is guaranteed to produce a minimal puzzle, but it's quite slow (100 ms on my computer, several seconds on a smartphone). I would like to reduce the number of solves, but all the obvious ways I can think of are incorrect. For example:
Adding digits instead of removing them. The advantage of this is that since minimal puzzles require at least 17 filled digits, the first 17 digits are guaranteed to not have a unique solution, reducing the amount of solving. Unfortunately, because the cells are visited in a random order, many unnecessary digits will be added before the one important digit that "locks down" a unique solution. For instance, if the first 9 cells added are all in the same column, there's a great deal of redundant information there.
If no other digit can replace the current one, keep it in and do not solve the puzzle. Because checking if a placement is legal is thousands of times faster than solving the puzzle twice, this could be a huge time-saver. However, just because there's no other legal digit now doesn't mean there won't be later, once we remove other digits.
Since we generated the original solution, solve only once for each cell and see if it matches the original. This doesn't work because the original solution could be anywhere within the search tree of possible solutions. For example, if the original solution is near the "left" side of the tree, and we start searching from the left, we will miss solutions on the right side of the tree.
I would also like to optimize the solving algorithm itself. The hard part is determining if a solution is unique. I can make micro-optimizations like creating a bitmask of legal placements for each cell, as described in this wonderful post. However, more advanced algorithms like Dancing Links or simulated annealing are not designed to determine uniqueness, but just to find any solution.
How can I optimize my minimal Sudoku generator?

I have an idea on the 2nd option your had suggested will be better for that provided you add 3 extra checks for the 1st 17 numbers
find a list of 17 random numbers between 1-9
add each item at random location provided
these new number added dont fail the 3 basic criteria of sudoku
there is no same number in same row
there is no same number in same column
there is no same number in same 3x3 matrix
if condition 1 fails move to the next column or row and check for the 3 basic criteria again.
if there is no next row (ie at 9th column or 9th row) add to the 1st column
once the 17 characters are filled, run you solver logic on this and look for your unique solution.

Here are the main optimizations I implemented with (highly approximate) percentage increases in speed:
Using bitmasks to keep track of which constraints (row, column, box) are satisfied in each cell. This makes it much faster to look up whether a placement is legal, but slower to make a placement. A complicating factor in generating puzzles with bitmasks, rather than just solving them, is that digits may have to be removed, which means you need to keep track of the three types of constraints as distinct bits. A small further optimization is to save the masks for each digit and each constraint in arrays. 40%
Timing out the generation and restarting if it takes too long. See here. The optimal strategy is to increase the timeout period after each failed generation, to reduce the chance that it goes on indefinitely. 30%, mainly from reducing the worst-case runtimes.
mbeckish and user295691's suggestions (see the comments to the original post). 25%

Related

Hungarian assignment alternative with large and unbalanced profit matrix

I need help with solving assignment problem in particular cases. In one scenario the profit matrix dimension 2000 by 23000 (2000 items and 23000 bins, where each bin can only contain one item, and there's no negative profit). If Hungarian assignment is applied, the algorithm will firstly create a square matrix of 23000 by 23000, and caused OutOfMemory exception.
The problem I want to solve is just what the maximum profit that the optimal assignment scheme can produce. Hence there is no need to output the actual optimal assignment, just the optimal value is needed. Also this value can just be an approximation. I wonder if there exists an alternative way which can save memory and computation cost.
Thanks in advance.
You can actually simulate all of the dummy columns with just one column.
I don't know what specific instructions you're following, but one of the steps in the algorithm is to cover every zero in the matrix using the fewest number of horizontal and vertical lines possible. At the start when all of the dummy columns are still zero, the most efficient way to cover them would be to cover them column-wise. Some rows may be covered as well, and the values that are covered twice are incremented by an amount. More importantly, the same row in every dummy column will be incremented by the same amount.
Continuing on now that some of the rows of the dummy columns are not zero, we reach this step again. Since all of the dummy columns are still identical, it follows that if it is efficient for one dummy column to be covered, they will all be covered. So even though the values may change, every dummy column will always be identical to every other dummy column, so you can represent them all using only one array.
You may still run into problems if you have a lot of real data, but this should help in situations like the one you mentioned.

How to efficiently diversify Dijkstra's algorithm (while preserving shortest path(s))?

Please look at the images and their descriptions below.
P.S. ignore the gray circular boundary (that's just max radius for debug testing).
Figure 1: No shuffling of branches. Branches are in order: Top, Left, Down, Right
Figure 2: Has branch shuffling: every time a node branches to its 4 potential children, the order is randomized.
So, as you can see the four images have the same path length. The lower 3 are more diverse, and are preferred. Shuffling the order of the array at every branch seems a bit inefficient. Any ways to improve it?
My idea is that I could create a list of all the possible shuffles (since there are 4 elements, that should be 24* permutations, right?), and generate a random number which will be used as an index to the list.
Are there any alternatives? Or perhaps I should look into a different algorithm altogether?
P.S. this is for game development purposes, so the diversity for paths is highly preferred.
Every time you calculate the path length to a node, before comparing against its previous best length, add a small random number so that the calculated length is between real_length and real_length+0.5. This will randomize the choices between paths of equal length.

A good randomizer for puzzle-15

I have implemented a puzzle 15 for people to compete online. My current randomizer works by starting from the good configuration and moving tiles around for 100 moves (arbitrary number)
Everything is fine, however, once in a little while the tiles are shuffled too easy and it takes only a few moves to solve the puzzle, therefore the game is really unfair for some people reaching better scores in a much higher speed.
What would be a good way to randomize the initial configuration so it is not "too easy"?
You can generate a completely random configuration (that is solvable) and then use some solver to determine the optimal sequence of moves. If the sequence is long enough for you, good, otherwise generate a new configuration and repeat.
Update & details
There is an article on Wikipedia about the 15-puzzle and when it is (and isn't) solvable. In short, if the empty square is in the lower-right corner, then the puzzle is solvable if and only if the number of inversions (an inversion is a swap of two elements in the sequence, not necessarily adjacent elements) with respect to the goal permutation is even.
You can then easily generate a solvable start state by doing an even number of inversions, which may lead to a not-so-easy-to-solve state far quicker than by doing regular moves, and it is guaranteed that it will remain solvable.
In fact, you don't need to use a search algorithm as I mentioned above, but an admissible heuristic. Such one always underestimates never overestimates the number of moves needed to solve the puzzle, i.e. you are guaranteed that it will not take less moves that the heuristic tells you.
A good heuristic is the sum of manhattan distances of each number to its goal position.
Summary
In short, a possible (very simple) algorithm for generating starting positions might look like this:
1: current_state <- goal_state
2: swap two arbitrary (randomly selected) pieces
3: swap two arbitrary (randomly selected) pieces again (to ensure solvability)
4: h <- heuristic(current_state)
5: if h > desired threshold
6: return current_state
7: else
8: go to 2.
To be absolutely certain about how difficult a state is, you need to find the optimal solution using some solver. Heuristics will give you only an estimate.
I would do this
start from solution (just like you did)
make valid turn in random direction
so you must keep track where the gap is and generate random direction (N,E,S,W) and do the move. I think this part you have done too.
compute the randomness of your placements
So compute some coefficient dependent on the order of the array. So ordered (solved) solutions will have low values and random will have high values. The equation for the coefficiet however is a matter of trial and error. Here some ideas what to use:
correlation coefficient
sum of average difference of value and its neighbors
1 2 4
3 6 5
9 8 7
coeff(6)= (|6-3|+|6-5|+|6-2|+|6-8|)/4
coeff=coeff(1)+coeff(2)+...coeff(15)
abs distance from ordered array
You can combine more approaches together. You can divide this to separated rows and columns and then combine the sub coefficients together.
loop #2 unit coefficient from #3 is high enough (treshold)
The treshold can be used also to change the difficulty.

Generating Binary Permutations With Constraints

I am working on physician scheduling application, we are using linear programming and solvers like cplex/lindo to solve our model. Due to some modelling limitations we need to generate binary patterns for just night shifts.
Typically we generate one month schedule so lets consider we need to generate patterns for 30 days for night shift.
Night shift have some constraints like if a person is coming on consecutive night shifts then physician could not work for next five days. So bellow are some examples of constrains.
111000001111100000111110000011 Valid
111000001100000000111110000011 Valid
111010001111101000111110000011 Invalid
Also there are other constraints like number of ones in pattern should be less than some defined value, number of consecutive ones should be less than some defined value etc.
First i tried simple algorithm which starts form 0 and use bitwise operator and add one to get next permutation and check the next permutations against all constraints if not valid get the next permutation and ignore the invalid pattern. As this pattern is of length 30 bits (230 = 1073741824) so number of patterns are huge to check go my simple algorithm. I guess it will take more then 24 hours to find out all valid patterns.
Now my questions are
Which algorithm shall i use for the given problem which find all permutation with constraints applied in time efficient way?
Is this problem a exact cover problem? Can i apply algorithms like dancing links to the problem i am facing?
Kindly provide some links to read about the solution you propose for this problem?
I have found very good solution in "The Art of Computer Programming - Volume 4A – Combinatorial Algorithms, Part 1 by Donald Knuth" section "7.2.1.2 Algorithm G(General permutation generator). In it the author describes technique Bypassing unwanted blocks. I am implementing the algorithm with incremental generation a tree of feasible region and bypass any infeasible path. I am going to implement in a way like for example starting with node with value 0 and this node has two children 0 and 1 and again every node has children 0 and 1 on every addition of new child node we will check with our constraints set if it fails to comply with constraint do not add the child node for example if algorithm is going to add node at level 5 and the resultant string at level 5 is 11101 and because of night shift constraint "101" at end of 11101 does not comply with night shift rule then don't add level 5 node of value 1. Keep adding the child node until we have root nodes. So eventually we will have only feasible region as we have bypass the unwanted blocks. In this way i will never touch infeasible region.

Is my heuristic algorithm correct? (Sudoku solver)

First of -yes this IS a homework- but it's primarily a theoretical question rather than a practical one, I am simply asking a confirmation if I am thinking correctly or any hints if I am not.
I have been asked to compile a simple Sudoku solver (on Prolog but that is not so important right now) with the only limitation being that it must utilize a heuristic function using Best-First Algorithm. The only heuristic function I have been able to come up with is explained below:
1. Select an empty cell.
1a. If there are no empty cells and there is a solution return solution.
Else return No.
2. Find all possible values it can hold. %% It can't take values currently assigned to cells on the same line/column/box.
3. Set to all those values a heuristic number starting from 1.
4. Pick the value whose heuristic number is the lowest && you haven't checked yet.
4a. If there are no more values return no.
5. If a solution is not found: GoTo 1.
Else Return Solution.
// I am sorry for errors in this "pseudo code." If you want any clarification let me know.
So am I doing this right or is there any other way around and mine is false?
Thanks in advance.
The heuristic I would use is this:
Repeatedly find any empty spaces where there is only one possible number you can insert. Fill them with the number 1-9 that fits.
If every empty space has two or more possibilities, push the game state onto a stack, then pick a random square to fill in with a random value.
Go to step 1.
If you manage to fill every square, you've found a valid solution.
If you get to a point where there are no valid options, pop the last game state off the stack (i.e. backtrack to the last time you made a random choice.) Make a different choice and try again.
As an interesting sidenote, you've been told to do this using a greedy heuristic approach, but Sudoku can actually be reduced to a boolean satisfiability problem (SAT problem) and solved using a general-purpose SAT solver. This is very elegant and can actually be faster than a heuristic approach.
When I wrote a sudoku solver myself in Prolog, the algorithm I used was the following:
filter out cells already solved (ie the given values at the start)
for each cell, build a list containing all its neighbours (that's 20 cells).
for each cell, build a list containing all the possible values it can take (easy to do once the above is done)
in the list containing all the cells to solve, put one with the minimum number of values available on top
if the top cell has 0 remaining possibility, go to 7, else, go to 6, if the list is empty, you have a solution.
for the cell of the top of the list: pick a random number in the possible values of the cell. Remove this value in all the possible values of its neighbours. Go to 5.
backtrack (ie, fail in Prolog)
This algorithm always sorts the "most solved" cell first and detects failure early enough. It reduces solving time quite a lot compared to an algorithm that solves a random cell.
What you have described is Most Constrained Variable heuristic. It picks up the cell that has least number of possibilities and then branches recursively in depth starting from that cell. This heuristic is extremely fast in depth-first search algorithms because it detects collisions early, near the root, while the search tree is still small.
Here is the implementation of Most Constrained Variable heuristic in C#: Exercise #2: Sudoku Solver
This text also contains the analysis of total number of visits to Sudoku cells by this algorithm - it is surprisingly small. It almost looks like the heuristic solves Sudoku in the first try.

Resources