For example, let's say we have a bounded 2D grid which we want to cover with square tiles of equal size. We have unlimited number of tiles that fall into a defined number of types. Each type of tile specifies the letters printed on that tile. Letters are printed next to each edge and only the tiles with matching letters on their adjacent edges can be placed next to one another on the grid. Tiles may be rotated.
Given the size of the grid and tile type definitions, what is the fastest method of arranging the tiles such that the above constraint is met and the entire/majority of the grid is covered? Note that my use case is for large grids (~20 in each dimension) and medium-large number of solutions (unlike Eternity II).
So far, I've tried DFS starting in the center and picking the locations around filled area that allow the least number of possibilities and backtrack in case no progress can be made. This only works for simple scenarios with one or two types. Any more and too much backtracking ensues.
Here's a trivial example, showing input and the final output:
This is a hard puzzle.
The Eternity 2 was a puzzle of this form with a 16 by 16 square grid.
Despite a 2 million dollar prize, no one found the solution in several years.
The paper "Jigsaw Puzzles, Edge Matching, and Polyomino Packing: Connections and Complexity" by Erik D. Demaine, Martin L. Demaine shows that this type of puzzle is NP-complete.
Given a problem of this sort with a square grid I would try a brute force list of all possible columns, and then a dynamic programming solution across all of the rows. This will find the number of solutions, and with a little extra work can be used to generate all of the solutions.
However if your rows are n long and there are m letters with k tiles, then the brute force list of all possible columns has up to mn possible right edges with up to m4k combinations/rotations of tiles needed to generate it. Then the transition from the right edge of one column to the right edge of the next next potentially has up to m2n possibilities in it. Those numbers are usually not worst case scenarios, but the size of those data structures will be the upper bound on the feasibility of that technique.
Of course if those data structures get too large to be feasible, then the recursive algorithm that you are describing will be too slow to enumerate all solutions. But if there are enough, it still might run acceptably fast even if this approach is infeasible.
Of course if you have more rows than columns, you'll want to reverse the role of rows and columns in this technique.
Related
I'm looking for an approach to this problem where you have to fill a n*m (n, m <=8) piece matrix with L-shaped three piece tiles. The tiles can't be placed on top of each other in any way.
I'm not necessarily looking for the whole answer, just a hint on how to approach it.
Source: https://cses.fi/dt/task/336
I solved this graph problem using a recursive backtracking algorithm plus memoization. My solution is not particularly fast and takes a minute or so to solve a 9x12 grid, but it should be sufficient for the 8x8 grid in your question (it takes about a second on a 9x9). There are no solutions for 7x7 and 8x8 grids because they are not divisible by the triomino size, 3.
The strategy is to start in a corner of the grid and move through it cell by cell, trying to place each block whenever it is legal to do so and thereby exploring the solution space methodically.
If placement of a block is legal but creates an unfillable air pocket in the grid, remove the block; we know ahead of time there will be no solutions to this state and can abandon exploring its children. For example, on a 3x6 grid,
abb.c.
aabcc.
......
is hopelessly unsolvable.
Once a state is reached where all cells have been filled, we can report a count of 1 solution to its parent state. Here's an example of a solved 3x6 grid:
aaccee
abcdef
bbddff
If every possible block has been placed at a position, backtrack, reporting the solution count to parent states along the way and exploring any states that are as yet unexplored.
In terms of memoization, call any two grid states equivalent if there is some arrangement of tiles such that they cover the exact same coordinates. For example:
aacc..
abdc..
bbdd..
and
aacc..
bacd..
bbdd..
are considered to be equivalent even though the two states were reached through different tile placements. Both states have the same substructure, so counting the number of solutions to one state is enough; add this to the memo, and if we reach the state again, we can simply report the number of solutions from the memo rather than re-computing everything.
My program reports 8 solutions on a 3x6 grid:
As I mentioned, my Python solution isn't fast or optimized. It's possible to solve a 9x12 grid less than a second. Large optimizations aside, there are basic things I neglected in my implementation. For example, I copied the entire grid for each tile placement; adding/removing tiles on a single grid would have been an easy improvement. I also did not check for unsolvable gaps in the grid, which can be seen in the animation.
After you solve the the problem, be sure to hunt around for some of the mind-blowing solutions people have come up with. I don't want to give away much more than this!
There's a trick that's applicable to a lot of recursive enumeration problems. In whichever way you like, define a deterministic procedure for removing one piece from a nonempty partial solution. Then the recursive enumeration works in the opposite direction, building the possible solutions from the empty solution, but each time it places a piece, that same piece has to be the one that would be removed by the deterministic procedure.
If you verify that the board size is divisible by three before beginning the enumeration, you shouldn't have any problem with the time limit.
Suppose I know an algorithm, that partitions a boolean matrix into a minimal set of disjoint rectangles that cover all "ones" ("trues").
The task is to find a permutation of rows and columns of the matrix, such that a matrix built by shuffling the columns and rows according to the permutations can be partitioned into a minimal set of rectangles.
For illustration, one can think about the problem this way:
Suppose I have a set of objects and a set of properties. Each object can have any number of (distinct) properties. The task is to summarize (report) this mapping using the least amount of sentences. Each sentence has a form "<list of objects> have properties <list of properties>".
I know I can brute-force the solution by applying the permutations and run the algorithm on each try. But the time complexity explodes exponentially making this approach non-practical for matrices bigger than 15×15.
I know I can simplify the matrices before running the algorithm by removing duplicated rows and columns.
This problem feels like it is NP-hard, and there might be no fast (polynomial in time) solutions. If that is so, I'd be interested to learn about some approximate solutions.
This is isomorphic to reducing logic circuits, given the full set of inputs (features) and the required truth table (which rows have which feature). You can solve the problem with classic Boolean algebra. The process is called logic optimization.
When I was in school, we drew Karnaugh maps on the board and drew colored boundaries to form our rectangles. However, it sounds as if you have something larger than one would handle on the board; try the QM algorithm and the cited heuristics for a "good enough" solution for many applications.
My solution so far:
First let us acknowledge, that the problem is symmetric with respect to swapping rows with columns (features with objects).
Let us represent the problem with the binary matrix, where rows are objects and columns are features and ones in the matrix represent matched pairs (object, feature).
My idea so far is to run two steps in sequence until there is no 1s left in the matrix:
Heuristically find a good unshuffling permutation of rows and columns on which I can run 2D maximal rectangle
Find the maximal rectangle, save it to the answer list and zero all 1s belonging to it.
Maximal rectangle problem
It can be simply any of the implementations of the maximal rectangle problem found on the net, for instance https://www.geeksforgeeks.org/maximum-size-rectangle-binary-sub-matrix-1s/
Unshuffling the rows (and columns)
Unshuffling rows are independent of unshuffling columns and both tasks can be run separately (concurrently). Let us assume I am looking for the unshuffling permutation of columns.
Also, it is worth noting, that unshuffling a matrix should yield the same results if we swap ones with zeroes.
Build a distance matrix of columns. A distance between two columns is defined as Manhattan distance between the two columns represented numerically (i.e. 0 - the absence of a relationship between object and feature, 1 - presence)
Run hierarchical clustering using the distance matrix. The complexity is O(n^2), as I believe single linkage should be good enough.
The order of objects returned from the hierarchical clustering is the unshuffling permutation.
The algorithm works good enough for my use cases. The implementation in R can be found in https://github.com/adamryczkowski/rectpartitions
I have implemented a puzzle 15 for people to compete online. My current randomizer works by starting from the good configuration and moving tiles around for 100 moves (arbitrary number)
Everything is fine, however, once in a little while the tiles are shuffled too easy and it takes only a few moves to solve the puzzle, therefore the game is really unfair for some people reaching better scores in a much higher speed.
What would be a good way to randomize the initial configuration so it is not "too easy"?
You can generate a completely random configuration (that is solvable) and then use some solver to determine the optimal sequence of moves. If the sequence is long enough for you, good, otherwise generate a new configuration and repeat.
Update & details
There is an article on Wikipedia about the 15-puzzle and when it is (and isn't) solvable. In short, if the empty square is in the lower-right corner, then the puzzle is solvable if and only if the number of inversions (an inversion is a swap of two elements in the sequence, not necessarily adjacent elements) with respect to the goal permutation is even.
You can then easily generate a solvable start state by doing an even number of inversions, which may lead to a not-so-easy-to-solve state far quicker than by doing regular moves, and it is guaranteed that it will remain solvable.
In fact, you don't need to use a search algorithm as I mentioned above, but an admissible heuristic. Such one always underestimates never overestimates the number of moves needed to solve the puzzle, i.e. you are guaranteed that it will not take less moves that the heuristic tells you.
A good heuristic is the sum of manhattan distances of each number to its goal position.
Summary
In short, a possible (very simple) algorithm for generating starting positions might look like this:
1: current_state <- goal_state
2: swap two arbitrary (randomly selected) pieces
3: swap two arbitrary (randomly selected) pieces again (to ensure solvability)
4: h <- heuristic(current_state)
5: if h > desired threshold
6: return current_state
7: else
8: go to 2.
To be absolutely certain about how difficult a state is, you need to find the optimal solution using some solver. Heuristics will give you only an estimate.
I would do this
start from solution (just like you did)
make valid turn in random direction
so you must keep track where the gap is and generate random direction (N,E,S,W) and do the move. I think this part you have done too.
compute the randomness of your placements
So compute some coefficient dependent on the order of the array. So ordered (solved) solutions will have low values and random will have high values. The equation for the coefficiet however is a matter of trial and error. Here some ideas what to use:
correlation coefficient
sum of average difference of value and its neighbors
1 2 4
3 6 5
9 8 7
coeff(6)= (|6-3|+|6-5|+|6-2|+|6-8|)/4
coeff=coeff(1)+coeff(2)+...coeff(15)
abs distance from ordered array
You can combine more approaches together. You can divide this to separated rows and columns and then combine the sub coefficients together.
loop #2 unit coefficient from #3 is high enough (treshold)
The treshold can be used also to change the difficulty.
A Sudoku puzzle is minimal (also called irreducible) if it has a unique solution, but removing any digit would yield a puzzle with multiple solutions. In other words, every digit is necessary to determine the solution.
I have a basic algorithm to generate minimal Sudokus:
Generate a completed puzzle.
Visit each cell in a random order. For each visited cell:
Tentatively remove its digit
Solve the puzzle twice using a recursive backtracking algorithm. One solver tries the digits 1-9 in forward order, the other in reverse order. In a sense, the solvers are traversing a search tree containing all possible configurations, but from opposite ends. This means that the two solutions will match iff the puzzle has a unique solution.
If the puzzle has a unique solution, remove the digit permanently; otherwise, put it back in.
This method is guaranteed to produce a minimal puzzle, but it's quite slow (100 ms on my computer, several seconds on a smartphone). I would like to reduce the number of solves, but all the obvious ways I can think of are incorrect. For example:
Adding digits instead of removing them. The advantage of this is that since minimal puzzles require at least 17 filled digits, the first 17 digits are guaranteed to not have a unique solution, reducing the amount of solving. Unfortunately, because the cells are visited in a random order, many unnecessary digits will be added before the one important digit that "locks down" a unique solution. For instance, if the first 9 cells added are all in the same column, there's a great deal of redundant information there.
If no other digit can replace the current one, keep it in and do not solve the puzzle. Because checking if a placement is legal is thousands of times faster than solving the puzzle twice, this could be a huge time-saver. However, just because there's no other legal digit now doesn't mean there won't be later, once we remove other digits.
Since we generated the original solution, solve only once for each cell and see if it matches the original. This doesn't work because the original solution could be anywhere within the search tree of possible solutions. For example, if the original solution is near the "left" side of the tree, and we start searching from the left, we will miss solutions on the right side of the tree.
I would also like to optimize the solving algorithm itself. The hard part is determining if a solution is unique. I can make micro-optimizations like creating a bitmask of legal placements for each cell, as described in this wonderful post. However, more advanced algorithms like Dancing Links or simulated annealing are not designed to determine uniqueness, but just to find any solution.
How can I optimize my minimal Sudoku generator?
I have an idea on the 2nd option your had suggested will be better for that provided you add 3 extra checks for the 1st 17 numbers
find a list of 17 random numbers between 1-9
add each item at random location provided
these new number added dont fail the 3 basic criteria of sudoku
there is no same number in same row
there is no same number in same column
there is no same number in same 3x3 matrix
if condition 1 fails move to the next column or row and check for the 3 basic criteria again.
if there is no next row (ie at 9th column or 9th row) add to the 1st column
once the 17 characters are filled, run you solver logic on this and look for your unique solution.
Here are the main optimizations I implemented with (highly approximate) percentage increases in speed:
Using bitmasks to keep track of which constraints (row, column, box) are satisfied in each cell. This makes it much faster to look up whether a placement is legal, but slower to make a placement. A complicating factor in generating puzzles with bitmasks, rather than just solving them, is that digits may have to be removed, which means you need to keep track of the three types of constraints as distinct bits. A small further optimization is to save the masks for each digit and each constraint in arrays. 40%
Timing out the generation and restarting if it takes too long. See here. The optimal strategy is to increase the timeout period after each failed generation, to reduce the chance that it goes on indefinitely. 30%, mainly from reducing the worst-case runtimes.
mbeckish and user295691's suggestions (see the comments to the original post). 25%
Ok this is an abstract algorithmic challenge and it will remain abstract since it is a top secret where I am going to use it.
Suppose we have a set of objects O = {o_1, ..., o_N} and a symmetric similarity matrix S where s_ij is the pairwise correlation of objects o_i and o_j.
Assume also that we have an one-dimensional space with discrete positions where objects may be put (like having N boxes in a row or chairs for people).
Having a certain placement, we may measure the cost of moving from the position of one object to that of another object as the number of boxes we need to pass by until we reach our target multiplied with their pairwise object similarity. Moving from a position to the box right after or before that position has zero cost.
Imagine an example where for three objects we have the following similarity matrix:
1.0 0.5 0.8
S = 0.5 1.0 0.1
0.8 0.1 1.0
Then, the best ordering of objects in the tree boxes is obviously:
[o_3] [o_1] [o_2]
The cost of this ordering is the sum of costs (counting boxes) for moving from one object to all others. So here we have cost only for the distance between o_2 and o_3 equal to 1box * 0.1sim = 0.1, the same as:
[o_3] [o_1] [o_2]
On the other hand:
[o_1] [o_2] [o_3]
would have cost = cost(o_1-->o_3) = 1box * 0.8sim = 0.8.
The target is to determine a placement of the N objects in the available positions in a way that we minimize the above mentioned overall cost for all possible pairs of objects!
An analogue is to imagine that we have a table and chairs side by side in one row only (like the boxes) and you need to put N people to sit on the chairs. Now those ppl have some relations that is -lets say- how probable is one of them to want to speak to another. This is to stand up pass by a number of chairs and speak to the guy there. When the people sit on two successive chairs then they don't need to move in order to talk to each other.
So how can we put those ppl down so that every distance-cost between two ppl are minimized. This means that during the night the overall number of distances walked by the guests are close to minimum.
Greedy search is... ok forget it!
I am interested in hearing if there is a standard formulation of such problem for which I could find some literature, and also different searching approaches (e.g. dynamic programming, tabu search, simulated annealing etc from combinatorial optimization field).
Looking forward to hear your ideas.
PS. My question has something in common with this thread Algorithm for ordering a list of Objects, but I think here it is better posed as problem and probably slightly different.
That sounds like an instance of the Quadratic Assignment Problem. The speciality is due to the fact that the locations are placed on one line only, but I don't think this will make it easier to solve. The QAP in general is NP hard. Unless I misinterpreted your problem you can't find an optimal algorithm that solves the problem in polynomial time without proving P=NP at the same time.
If the instances are small you can use exact methods such as branch and bound. You can also use tabu search or other metaheuristics if the problem is more difficult. We have an implementation of the QAP and some metaheuristics in HeuristicLab. You can configure the problem in the GUI, just paste the similarity and the distance matrix into the appropriate parameters. Try starting with the robust Taboo Search. It's an older, but still quite well working algorithm. Taillard also has the C code for it on his website if you want to implement it for yourself. Our implementation is based on that code.
There has been a lot of publications done on the QAP. More modern algorithms combine genetic search abilities with local search heuristics (e. g. Genetic Local Search from Stützle IIRC).
Here's a variation of the already posted method. I don't think this one is optimal, but it may be a start.
Create a list of all the pairs in descending cost order.
While list not empty:
Pop the head item from the list.
If neither element is in an existing group, create a new group containing
the pair.
If one element is in an existing group, add the other element to whichever
end puts it closer to the group member.
If both elements are in existing groups, combine them so as to minimize
the distance between the pair.
Group combining may require reversal of order in a group, and the data structure should
be designed to support that.
Let me help the thread (of my own) with a simplistic ordering approach.
1. Order the upper half of the similarity matrix.
2. Start with the pair of objects having the highest similarity weight and place them in the center positions.
3. The next object may be put on the left or the right side of them. So each time you may select the object that when put to left or right
has the highest cost to the pre-placed objects. Goto Step 2.
The selection of Step 3 is because if you left this object and place it later this cost will be again the greatest of the remaining, and even more (farther to the pre-placed objects). So the costly placements should be done as earlier as it can be.
This is too simple and of course does not discover a good solution.
Another approach is to
1. start with a complete ordering generated somehow (random or from another algorithm)
2. try to improve it using "swaps" of object pairs.
I believe local minima would be a huge deterrent.