The “pattern-filling with tiles” puzzle

The “pattern-filling with tiles” puzzle - algorithm

I've encountered an interesting problem while programming a random level generator for a tile-based game. I've implemented a brute-force solver for it but it is exponentially slow and definitely unfit for my use case. I'm not necessarily looking for a perfect solution, I'll be satisfied with a “good enough” solution that performs well.
Problem Statement:
Say you have all or a subset of the following tiles available (this is the combination of all possible 4-bit patterns mapped to the right, up, left and down directions):
alt text http://img189.imageshack.us/img189/3713/basetileset.png
You are provided a grid where some cells are marked (true) and others not (false). This could be generated by a perlin noise algorithm, for example. The goal is to fill this space with tiles so that there are as many complex tiles as possible. Ideally, all tiles should be connected. There might be no solution for some input values (available tiles + pattern). There is always at least one solution if the top-left, unconnected tile is available (that is, all pattern cells can be filled with that tile).
Example:
Images left to right: tile availability (green tiles can be used, red cannot), pattern to fill and a solution
alt text http://img806.imageshack.us/img806/2391/sampletileset.png + alt text http://img841.imageshack.us/img841/7/samplepattern.png = alt text http://img690.imageshack.us/img690/2585/samplesolution.png
What I tried:
My brute-force implementation attempts every possible tile everywhere and keeps track of the solutions that were found. Finally, it chooses the solution that maximizes the total number of connections outgoing from each of the tiles. The time it takes is exponential with regard to the number of tiles in the pattern. A pattern of 12 tiles takes a few seconds to solve.
Notes:
As I said, performance is more important than perfection. However, the final solution must be properly connected (no tile pointing to a tile which doesn't point to the original tile). To give an idea of scope, I'd like to handle a pattern of 100 tiles under about 2 seconds.

For 100-tile instances, I believe that a dynamic program based on a carving decomposition of the input graph could fit the bill.
Carving decomposition
In graph theory, a carving decomposition of a graph is a recursive binary partition of its vertices. For example, here's a graph
1--2--3
| |
| |
4--5
and one of its carving decompositions
{1,2,3,4,5}
/ \
{1,4} {2,3,5}
/ \ / \
{1} {4} {2,5} {3}
/ \
{2} {5}.
The width of a carving decomposition is the maximum number of edges leaving one of its partitions. In this case, {2,5} has outgoing edges 2--1, 2--3, and 5--4, so the width is 3. The width of a kd-tree-style partition of a 10 x 10 grid is 13.
The carving-width of a graph is the minimum width of a carving decomposition. It is known that planar graphs (in particular, subgraphs of grid graphs) with n vertices have carving-width O(√n), and the big-O constant is relatively small.
Dynamic program
Given an n-vertex input graph and a carving decomposition of width w, there is an O(2w n)-time algorithm to compute the optimal tile choice. This running time grows rapidly in w, so you should try decomposing some sample inputs by hand to get an idea of what kind of performance to expect.
The algorithm works on the decomposition tree from the bottom up. Let X be a partition, and let F be the set of edges that leave X. We make a table mapping each of 2|F| possibilities for the presence or absence of edges in F to the optimal sum on X under the specified constraints (-Infinity if there is no solution). For example, with the partition {1,4}, we have entries
{} -> ??
{1--2} -> ??
{4--5} -> ??
{1--2,4--5} -> ??
For the leaf partitions with only one vertex, the subset of F completely determines the tile, so it's easy to fill in the number of connections (if the tile is valid) or -Infinity otherwise. For the other partitions, when computing an entry of the table, try all different connectivity patterns for the edges that go between the two children.
For example, suppose we have pieces
|
. .- .- -. .
|
The table for {1} is
{} -> 0
{1--2} -> 1
{1--4} -> -Infinity
{1--2,1--4} -> 2
The table for {4} is
{} -> 0
{1--4} -> 1
{4--5} -> 1
{1--4,4--5} -> -Infinity
Now let's compute the table for {1,4}. For {}, without the edge 1--4 we have score 0 for {1} (entry {}) plus score 0 for {4} (entry {}). With edge 1--4 we have score -Infinity + 1 = -Infinity (entries {1--4}).
{} -> 0
For {1--2}, the scores are 1 + 0 = 1 without 1--4 and 2 + 1 = 3 with.
{1--2} -> 3
Continuing.
{4--5} -> 0 + 1 = 1 (> -Infinity = -Infinity + (-Infinity))
{1--2,4--5} -> 1 + 1 = 2 (> -Infinity = 2 + (-Infinity))
At the end we can use the tables to determine an optimal solution.
Finding a carving decomposition
There are sophisticated algorithms for finding good carving decompositions, but you might not need them. Try a simple binary space partitioning scheme.

As a base, take a look at an earlier answer I gave on searching. Hill-climbing search programs are a tool every programmer should have in their arsenal as they work much better than plain brute-force solvers.
Here, even a relatively bad search algorithm has as an advantage the fact that it won't generate illegal boards, greatly reducing the expected run time.

I think I may have a better idea. I didn't test it, but I'm pretty sure it will be faster than a purely brute-force-ish solution for large zones.
First, create an empty set (a "set" being a collection that only contains unique objects) of nodes. This collection will be used to identify which tiles have broken connections that need to be fixed.
Fill the data structures to represent the board with the pieces that are available, using the ones you see the most fit based on your personal criteria with no regard to the correctness of the solution. This will almost certainly lead you to an invalid state, but it's okay for now. Iterate through the board, and find all tiles that have connections leading to nowhere. Add them to the set of broken tiles.
Now, iterate through the set. Change the tiles it refers to by reducing their number of connections (otherwise you could get into an infinite loop) so they have no broken connection, respecting the currently available pieces. Check their neighbors again, and if you broke connections to other tiles, add these to the set of broken ones too.
Once the set of broken connections will be empty, you should have a fine-looking pattern. Note however that it has an important caveat: it might to tend to oversimplify patterns, since the "fixing" phase will always attempt to reduce the number of connections. You may have to be lucky to get interesting patterns since this could be greatly affected by first piece you put on each tile.

Related

Find Minimum Time to Occupy Grid

Problem:
Consider a patient suffering from skin infection and germs are spreading all over rapidly. Assume that skin surface is scaled as a rectangular grid of size MxN and cells are marked by 0 and 1 where 0 represents non affected region on skin and 1 represents affected region on skin. Germs can move from one cell of grid to another in 4 possible directions (right, left, up, down) but can move to only one cell at a time in one direction and affect that cell in 1 sec. Doctor currently who is treating the patient see's status and wants to know the time left for him to save him before the germs spread all over the skin and patient dies. Can you help to estimate the minimum time taken for the germs to completely occupy skin surface?
Input: : Current status of skin. (A matrix of size MxN with 1's and 0's which represents affected and non affected area)
Output: : Min time in sec to cover all over the grid.
Example:
Input:
[1 1 0 0 1]
[0 1 1 0 0]
[0 0 0 0 1]
[0 1 0 0 0]
Output: 2 seconds
Explanation:
After 1 sec from input, matrix could be as below
[1 1 1 0 1]
[1 1 1 0 1]
[0 1 1 0 1]
[0 1 1 0 1]
In next sec, matrix is completely filled by 1's

I will not present a detailed solution here, but some thoughts that hopefully may help you to write your own program.
First step is to determine the kind of algorithm to implement. The optimal way would be to find a simple and fast ad hoc solution for this problem. In the absence of such a solution, for this kind of problems, classical candidates are DFS, BFS, A* ...
As the goal is to find the shortest solution, it seems natural to consider BFS first, as once BFS finds a solution, we know that it is the shortest ones and we can stop the search. However, then, we have to consider avoiding inflation of the nodes, as it would lead not only to a huge calculation time, but also a huge memory.
First idea to avoid node inflation is to consider that some 1 cells can only be expended in one another cell. In the posted diagram, for example the cell (0, 0) (top left) can only be expended to cell (1, 0). Then, after this expansion, cell (1, 1) can only move to cell (2, 1). Therefore, we know it would be suboptimal to move cell (1,1) to cell (1,0). Therefore: move such cells first.
In a similar way, once an infected cell is surrounded by other infected cells only, it is no longer necessary to consider it for next moves.
At the end, it would be convenient to have a list of infected cells, together with the number of non-infected cells that each such cell can move to.
Another idea to limit the number of nodes is to detect duplicates, as it is likely here that many of them will exist. For that, we have to define a kind of hashing. The used hash function does not need to be 100% efficient, but need to be calculated rapidly, and if possible in a recursive manner. If we obtain B diagram from A diagram by adding a 1-cell at position (i, j), then I propose something like
H(B) = H(A)^f(i, j)
f(i, j) = a*(1024*i+j)%b
Here, I used the fact that N and M are less than 1000.
Each time a new diagram is consider, we have to calculate the corresponding H value and check if it exists already in the set of past diagrams.

I'm not sure how far I would get with this in an interview situation. After some thought, rather than considering solutions that store more than one full board state, I would rather consider a greedy priority queue since a strong heuristic for the next zero-cell candidates to fill seems to be:
(1) healthy cells that have the least neighbouring infected cells (but at least one, of course),
e.g., choose A over B
1 1 B 0 1
0 1 1 0 0
0 0 A 0 1
0 1 0 0 0
and (2) break ties by choosing first the healthy cells that when infected will block the least infected cells.
e.g., choose A over B
1 1 1 0 1
1 B 1 0 A
0 0 0 0 1
0 1 0 0 0
An interesting observation is that any healthy cell destination can technically be reached in time Manhattan-distance from the nearest infected cell, where the cell leading such a "crawl" continually chooses the single move that brings us closer to the destination. We know that at the same time, though, this same infected-cell "snake" produces new "crawlers" that could reach any equally far or closer neighbours. This makes me wonder if there may be a more efficient way to determine the lower-bound, based on counts of the farthest healthy cells.

This is a variant of the multi-agent pathfinding problem (MAPF). There is a ton of recent work on this topic, but earlier modern work is a good starting point for finding optimal solutions to this problem - for instance the operator decomposition approach.
To do this you would order the agents (germs) 1..k. Then, you would start a search where you generate all possible first moves for germ 1, followed by all possible first moves for germ 2, and so on, where moves for an agent are to stay in place, or to spread to an adjacent unoccupied location. With 4 possible actions for each germ, there are up to 4^k possible actions between complete states. (Partial states occur when you haven't yet assigned actions to all k agents.) The number of actions is exponential, meaning you may run up against resource constraints (time or space) fairly quickly. But, there are only 2^(MxN) states possible. (Since agents don't go away, it's actually 2^(MxN-i) where i is the number of initial germs.)
Every time all (k) germs have considered a possible action, you have a new complete state. (And k then increases for the next iteration.) The minimum time left comes from the shallowest complete state which has the grid filled. A bit of brute-force computation will find the shortest solution. (Quite a bit in the case of large grids.)
You could use a BFS to find the first state that is completely filled. But, A* might do much better. As a heuristic, you could consider that all adjacent locations of all cells were filled in each step, and then compute the number of steps required to fill the grid under that model. That gives a lower bound on the time required to fill the full grid.
But, there are many more optimizations. The reason to do operator decomposition is that you could order the moves to take the best moves first and not consider the weaker possibilities (eg all germs don't spread). You could also use a partial-expansion approach (EPEA*) to avoid generating a lot of clearly suboptimal policies for the germs.
If I was asking this as an interview questions I might be looking to see someone formulate the problem (what are actions, what are states), come up with the lower bound on the solution (every germ expands to every adjacent cell), come up with an algorithm, and perhaps analyze how hard the problem is, in order of increasing difficulty.

What to use for flow free-like game random level creation?

I need some advice. I'm developing a game similar to Flow Free wherein the gameboard is composed of a grid and colored dots, and the user has to connect the same colored dots together without overlapping other lines, and using up ALL the free spaces in the board.
My question is about level-creation. I wish to make the levels generated randomly (and should at least be able to solve itself so that it can give players hints) and I am in a stump as to what algorithm to use. Any suggestions?
Note: image shows the objective of Flow Free, and it is the same objective of what I am developing.
Thanks for your help. :)

Consider solving your problem with a pair of simpler, more manageable algorithms: one algorithm that reliably creates simple, pre-solved boards and another that rearranges flows to make simple boards more complex.
The first part, building a simple pre-solved board, is trivial (if you want it to be) if you're using n flows on an nxn grid:
For each flow...
Place the head dot at the top of the first open column.
Place the tail dot at the bottom of that column.
Alternatively, you could provide your own hand-made starter boards to pass to the second part. The only goal of this stage is to get a valid board built, even if it's just trivial or predetermined, so it's worth keeping it simple.
The second part, rearranging the flows, involves looping over each flow, seeing which one can work with its neighboring flow to grow and shrink:
For some number of iterations...
Choose a random flow f.
If f is at the minimum length (say 3 squares long), skip to the next iteration because we can't shrink f right now.
If the head dot of f is next to a dot from another flow g (if more than one g to choose from, pick one at random)...
Move f's head dot one square along its flow (i.e., walk it one square towards the tail). f is now one square shorter and there's an empty square. (The puzzle is now unsolved.)
Move the neighboring dot from g into the empty square vacated by f. Now there's an empty square where g's dot moved from.
Fill in that empty spot with flow from g. Now g is one square longer than it was at the beginning of this iteration. (The puzzle is back to being solved as well.)
Repeat the previous step for f's tail dot.
The approach as it stands is limited (dots will always be neighbors) but it's easy to expand upon:
Add a step to loop through the body of flow f, looking for trickier ways to swap space with other flows...
Add a step that prevents a dot from moving to an old location...
Add any other ideas that you come up with.
The overall solution here is probably less than the ideal one that you're aiming for, but now you have two simple algorithms that you can flesh out further to serve the role of one large, all-encompassing algorithm. In the end, I think this approach is manageable, not cryptic, and easy to tweek, and, if nothing else, a good place to start.
Update: I coded a proof-of-concept based on the steps above. Starting with the first 5x5 grid below, the process produced the subsequent 5 different boards. Some are interesting, some are not, but they're always valid with one known solution.
Starting Point
5 Random Results (sorry for the misaligned screenshots)
And a random 8x8 for good measure. The starting point was the same simple columns approach as above.

Updated answer: I implemented a new generator using the idea of "dual puzzles". This allows much sparser and higher quality puzzles than any previous method I know of. The code is on github. I'll try to write more details about how it works, but here is an example puzzle:
Old answer:
I have implemented the following algorithm in my numberlink solver and generator. In enforces the rule, that a path can never touch itself, which is normal in most 'hardcore' numberlink apps and puzzles
First the board is tiled with 2x1 dominos in a simple, deterministic way.
If this is not possible (on an odd area paper), the bottom right corner is
left as a singleton.
Then the dominos are randomly shuffled by rotating random pairs of neighbours.
This is is not done in the case of width or height equal to 1.
Now, in the case of an odd area paper, the bottom right corner is attached to
one of its neighbour dominos. This will always be possible.
Finally, we can start finding random paths through the dominos, combining them
as we pass through. Special care is taken not to connect 'neighbour flows'
which would create puzzles that 'double back on themselves'.
Before the puzzle is printed we 'compact' the range of colours used, as much as possible.
The puzzle is printed by replacing all positions that aren't flow-heads with a .
My numberlink format uses ascii characters instead of numbers. Here is an example:
$ bin/numberlink --generate=35x20
Warning: Including non-standard characters in puzzle
35 20
....bcd.......efg...i......i......j
.kka........l....hm.n....n.o.......
.b...q..q...l..r.....h.....t..uvvu.
....w.....d.e..xx....m.yy..t.......
..z.w.A....A....r.s....BB.....p....
.D.........E.F..F.G...H.........IC.
.z.D...JKL.......g....G..N.j.......
P...a....L.QQ.RR...N....s.....S.T..
U........K......V...............T..
WW...X.......Z0..M.................
1....X...23..Z0..........M....44...
5.......Y..Y....6.........C.......p
5...P...2..3..6..VH.......O.S..99.I
........E.!!......o...."....O..$$.%
.U..&&..J.\\.(.)......8...*.......+
..1.......,..-...(/:.."...;;.%+....
..c<<.==........)./..8>>.*.?......#
.[..[....]........:..........?..^..
..._.._.f...,......-.`..`.7.^......
{{......].....|....|....7.......#..
And here I run it through my solver (same seed):
$ bin/numberlink --generate=35x20 | bin/numberlink --tubes
Found a solution!
┌──┐bcd───┐┌──efg┌─┐i──────i┌─────j
│kka│└───┐││l┌─┘│hm│n────n┌o│┌────┐
│b──┘q──q│││l│┌r└┐│└─h┌──┐│t││uvvu│
└──┐w┌───┘d└e││xx│└──m│yy││t││└──┘│
┌─z│w│A────A┌┘└─r│s───┘BB││┌┘└p┌─┐│
│D┐└┐│┌────E│F──F│G──┐H┐┌┘││┌──┘IC│
└z└D│││JKL┌─┘┌──┐g┌─┐└G││N│j│┌─┐└┐│
P──┐a││││L│QQ│RR└┐│N└──┘s││┌┘│S│T││
U─┐│┌┘││└K└─┐└─┐V││└─────┘││┌┘││T││
WW│││X││┌──┐│Z0││M│┌──────┘││┌┘└┐││
1┐│││X│││23││Z0│└┐││┌────M┌┘││44│││
5│││└┐││Y││Y│┌─┘6││││┌───┐C┌┘│┌─┘│p
5││└P│││2┘└3││6─┘VH│││┌─┐│O┘S┘│99└I
┌┘│┌─┘││E┐!!│└───┐o┘│││"│└─┐O─┘$$┌%
│U┘│&&│└J│\\│(┐)┐└──┘│8││┌*└┐┌───┘+
└─1└─┐└──┘,┐│-└┐│(/:┌┘"┘││;;│%+───┘
┌─c<<│==┌─┐││└┐│)│/││8>>│*┌?│┌───┐#
│[──[└─┐│]││└┐│└─┘:┘│└──┘┌┘┌┘?┌─^││
└─┐_──_│f││└,│└────-│`──`│7┘^─┘┌─┘│
{{└────┘]┘└──┘|────|└───7└─────┘#─┘
I've tested replacing step (4) with a function that iteratively, randomly merges two neighboring paths. However it game much denser puzzles, and I already think the above is nearly too dense to be difficult.
Here is a list of problems I've generated of different size: https://github.com/thomasahle/numberlink/blob/master/puzzles/inputs3

The most straightforward way to create such a level is to find a way to solve it. This way, you can basically generate any random starting configuration and determine if it is a valid level by trying to have it solved. This will generate the most diverse levels.
And even if you find a way to generate the levels some other way, you'll still want to apply this solving algorithm to prove that the generated level is any good ;)
Brute-force enumerating
If the board has a size of NxN cells, and there are also N colours available, brute-force enumerating all possible configurations (regardless of wether they form actual paths between start and end nodes) would take:
N^2 cells total
2N cells already occupied with start and end nodes
N^2 - 2N cells for which the color has yet to be determined
N colours available.
N^(N^2 - 2N) possible combinations.
So,
For N=5, this means 5^15 = 30517578125 combinations.
For N=6, this means 6^24 = 4738381338321616896 combinations.
In other words, the number of possible combinations is pretty high to start with, but also grows ridiculously fast once you start making the board larger.
Constraining the number of cells per color
Obviously, we should try to reduce the number of configurations as much as possible. One way of doing that is to consider the minimum distance ("dMin") between each color's start and end cell - we know that there should at least be this much cells with that color. Calculating the minimum distance can be done with a simple flood fill or Dijkstra's algorithm.
(N.B. Note that this entire next section only discusses the number of cells, but does not say anything about their locations)
In your example, this means (not counting the start and end cells)
dMin(orange) = 1
dMin(red) = 1
dMin(green) = 5
dMin(yellow) = 3
dMin(blue) = 5
This means that, of the 15 cells for which the color has yet to be determined, there have to be at least 1 orange, 1 red, 5 green, 3 yellow and 5 blue cells, also making a total of 15 cells.
For this particular example this means that connecting each color's start and end cell by (one of) the shortest paths fills the entire board - i.e. after filling the board with the shortest paths no uncoloured cells remain. (This should be considered "luck", not every starting configuration of the board will cause this to happen).
Usually, after this step, we have a number of cells that can be freely coloured, let's call this number U. For N=5,
U = 15 - (dMin(orange) + dMin(red) + dMin(green) + dMin(yellow) + dMin(blue))
Because these cells can take any colour, we can also determine the maximum number of cells that can have a particular colour:
dMax(orange) = dMin(orange) + U
dMax(red) = dMin(red) + U
dMax(green) = dMin(green) + U
dMax(yellow) = dMin(yellow) + U
dMax(blue) = dMin(blue) + U
(In this particular example, U=0, so the minimum number of cells per colour is also the maximum).
Path-finding using the distance constraints
If we were to brute force enumerate all possible combinations using these color constraints, we would have a lot less combinations to worry about. More specifically, in this particular example we would have:
15! / (1! * 1! * 5! * 3! * 5!)
= 1307674368000 / 86400
= 15135120 combinations left, about a factor 2000 less.
However, this still doesn't give us the actual paths. so a better idea would be to a backtracking search, where we process each colour in turn and attempt to find all paths that:
doesn't cross an already coloured cell
Is not shorter than dMin(colour) and not longer than dMax(colour).
The second criteria will reduce the number of paths reported per colour, which causes the total number of paths to be tried to be greatly reduced (due to the combinatorial effect).
In pseudo-code:
function SolveLevel(initialBoard of size NxN)
{
foreach(colour on initialBoard)
{
Find startCell(colour) and endCell(colour)
minDistance(colour) = Length(ShortestPath(initialBoard, startCell(colour), endCell(colour)))
}
//Determine the number of uncoloured cells remaining after all shortest paths have been applied.
U = N^(N^2 - 2N) - (Sum of all minDistances)
firstColour = GetFirstColour(initialBoard)
ExplorePathsForColour(
initialBoard,
firstColour,
startCell(firstColour),
endCell(firstColour),
minDistance(firstColour),
U)
}
}
function ExplorePathsForColour(board, colour, startCell, endCell, minDistance, nrOfUncolouredCells)
{
maxDistance = minDistance + nrOfUncolouredCells
paths = FindAllPaths(board, colour, startCell, endCell, minDistance, maxDistance)
foreach(path in paths)
{
//Render all cells in 'path' on a copy of the board
boardCopy = Copy(board)
boardCopy = ApplyPath(boardCopy, path)
uRemaining = nrOfUncolouredCells - (Length(path) - minDistance)
//Recursively explore all paths for the next colour.
nextColour = NextColour(board, colour)
if(nextColour exists)
{
ExplorePathsForColour(
boardCopy,
nextColour,
startCell(nextColour),
endCell(nextColour),
minDistance(nextColour),
uRemaining)
}
else
{
//No more colours remaining to draw
if(uRemaining == 0)
{
//No more uncoloured cells remaining
Report boardCopy as a result
}
}
}
}
FindAllPaths
This only leaves FindAllPaths(board, colour, startCell, endCell, minDistance, maxDistance) to be implemented. The tricky thing here is that we're not searching for the shortest paths, but for any paths that fall in the range determined by minDistance and maxDistance. Hence, we can't just use Dijkstra's or A*, because they will only record the shortest path to each cell, not any possible detours.
One way of finding these paths would be to use a multi-dimensional array for the board, where
each cell is capable of storing multiple waypoints, and a waypoint is defined as the pair (previous waypoint, distance to origin). The previous waypoint is needed to be able to reconstruct the entire path once we've reached the destination, and the distance to origin
prevents us from exceeding the maxDistance.
Finding all paths can then be done by using a flood-fill like exploration from the startCell outwards, where for a given cell, each uncoloured or same-as-the-current-color-coloured neigbour is recursively explored (except the ones that form our current path to the origin) until we reach either the endCell or exceed the maxDistance.
An improvement on this strategy is that we don't explore from the startCell outwards to the endCell, but that we explore from both the startCell and endCell outwards in parallel, using Floor(maxDistance / 2) and Ceil(maxDistance / 2) as the respective maximum distances. For large values of maxDistance, this should reduce the number of explored cells from 2 * maxDistance^2 to maxDistance^2.

I think you'll want to do this in two steps. Step 1) find a set of non-intersecting paths that connect all your points, then 2) Grow/shift those paths to fill the entire board
My thoughts on Step 1 are to essentially perform Dijkstra like algorithm on all points simultaneously, growing together the paths. Similar to Dijkstra, I think you'll want to flood-fill out from each of your points, chosing which node to search next using some heuristic (My hunch says chosing points with the least degrees of freedom first, then by distance, might be a good one). Very differently from Dijkstra though I think we might be stuck with having to backtrack when we have multiple paths attempting to grow into the same node. (This could of course be fairly problematic on bigger maps, but might not be a big deal on small maps like the one you have above.)
You may also solve for some of the easier paths before you start the above algorithm, mainly to cut down on the number of backtracks needed. In specific, if you can make a trace between points along the edge of the board, you can guarantee that connecting those two points in that fashion would never interfere with other paths, so you can simply fill those in and take those guys out of the equation. You could then further iterate on this until all of these "quick and easy" paths are found by tracing along the borders of the board, or borders of existing paths. That algorithm would actually completely solve the above example board, but would undoubtedly fail elsewhere .. still, it would be very cheap to perform and would reduce your search time for the previous algorithm.
Alternatively
You could simply do a real Dijkstra's algorithm between each set of points, pathing out the closest points first (or trying them in some random orders a few times). This would probably work for a fair number of cases, and when it fails simply throw out the map and generate a new one.
Once you have Step 1 solved, Step 2 should be easier, though not necessarily trivial. To grow your paths, I think you'll want to grow your paths outward (so paths closest to walls first, growing towards the walls, then other inner paths outwards, etc.). To grow, I think you'll have two basic operations, flipping corners, and expanding into into adjacent pairs of empty squares.. that is to say, if you have a line like
.v<<.
v<...
v....
v....
First you'll want to flip the corners to fill in your edge spaces
v<<<.
v....
v....
v....
Then you'll want to expand into neighboring pairs of open space
v<<v.
v.^<.
v....
v....
v<<v.
>v^<.
v<...
v....
etc..
Note that what I've outlined wont guarantee a solution if one exists, but I think you should be able to find one most of the time if one exists, and then in the cases where the map has no solution, or the algorithm fails to find one, just throw out the map and try a different one :)

You have two choices:
Write a custom solver
Brute force it.
I used option (2) to generate Boggle type boards and it is VERY successful. If you go with Option (2), this is how you do it:
Tools needed:
Write a A* solver.
Write a random board creator
To solve:
Generate a random board consisting of only endpoints
while board is not solved:
get two endpoints closest to each other that are not yet solved
run A* to generate path
update board so next A* knows new board layout with new path marked as un-traversable.
At exit of loop, check success/fail (is whole board used/etc) and run again if needed
The A* on a 10x10 should run in hundredths of a second. You can probably solve 1k+ boards/second. So a 10 second run should get you several 'usable' boards.
Bonus points:
When generating levels for a IAP (in app purchase) level pack, remember to check for mirrors/rotations/reflections/etc so you don't have one board a copy of another (which is just lame).
Come up with a metric that will figure out if two boards are 'similar' and if so, ditch one of them.

Combinations of binary features (vectors)

The source data for the subject is an m-by-n binary matrix (only 0s and 1s are allowed).
m Rows represent observations, n columns - features. Some observations are marked as targets which need to be separated from the rest.
While it looks like a typical NN, SVM, etc problem, I don't need generalization. What I need is an efficient algorithm to find as many as possible combinations of columns (features) that completely separate targets from other observations, classify, that is.
For example:
f1 f2 f3
o1 1 1 0
t1 1 0 1
o2 0 1 1
Here {f1, f3} is an acceptable combo which separates target t1 from the rest (o1, o2) (btw, {f2} is NOT as by task definition a feature MUST be present in a target). In other words,
t1(f1) & t1(f3) = 1 and o1(f1) & o1(f3) = 0, o2(f1) & o2(f3) = 0
where '&' represents logical conjunction (AND).
The m is about 100,000, n is 1,000. Currently the data is packed into 128bit words along m and the search is optimized with sse4 and whatnot. Yet it takes way too long to obtain those feature combos.
After 2 billion calls to the tree descent routine it has covered about 15% of root nodes. And found about 8,000 combos which is a decent result for my particular application.
I use some empirical criteria to cut off less probable descent paths, not without limited success, but is there something radically better? Im pretty sure there gotta be?.. Any help, in whatever form, reference or suggestion, would be appreciated.

I believe the problem you describe is NP-Hard so you shouldn't expect to find the optimum solution in a reasonable time. I do not understand your current algorithm, but here are some suggestions on the top of my head:
1) Construct a decision tree. Label targets as A and non-targets as B and let the decision tree learn the categorization. At each node select the feature such that a function of P(target | feature) and P(target' | feature') is maximum. (i.e. as many targets as possible fall to positive side and as many non-targets as possible fall to negative side)
2) Use a greedy algorithm. Start from the empty set and at each time step add the feauture that kills the most non-target rows.
3) Use a randomized algorithm. Start from a small subset of positive features of some target, use the set as the seed for the greedy algorithm. Repeat many times. Pick the best solution. Greedy algorithm will be fast so it will be ok.
4) Use a genetic algorithm. Generate random seeds for the greedy algorithm as in 3 to generate good solutions and cross-product them (bitwise-and probably) to generate new candidates seeds. Remember the best solution. Keep good solutions as the current population. Repeat for many generations.
You will need to find the answer "how many of the given rows have the given feature f" fast so probably you'll need specialized data structures, perhaps using a BitArray for each feature.

Algorithm design to assign nodes to graphs

I have a graph-theoretic (which is also related to combinatorics) problem that is illustrated below, and wonder what is the best approach to design an algorithm to solve it.
Given 4 different graphs of 6 nodes (by different, I mean different structures, e.g. STAR, LINE, COMPLETE, etc), and 24 unique objects, design an algorithm to assign these objects to these 4 graphs 4 times, so that the number of repeating neighbors on the graphs over the 4 assignments is minimized. For example, if object A and B are neighbors on 1 of the 4 graphs in one assignment, then in the best case, A and B will not be neighbors again in the other 3 assignments.
Obviously, the degree to which such minimization can go is dependent on the specific graph structures given. But I am more interested in a general solution here so that given any 4 graph structures, such minimization is guaranteed as the result of the algorithm.
Any suggestion/idea of solving this problem is welcome, and some pseudo-code may well be sufficient to illustrate the design. Thank you.

Representation:
You have 24 elements, I will name this elements from A to X (24 first letters).
Each of these elements will have a place in one of the 4 graphs. I will assign a number to the 24 nodes of the 4 graphs from 1 to 24.
I will identify the position of A by a 24-uple =(xA1,xA2...,xA24), and if I want to assign A to the node number 8 for exemple, I will write (xa1,Xa2..xa24) = (0,0,0,0,0,0,0,1,0,0...0), where 1 is on position 8.
We can say that A =(xa1,...xa24)
e1...e24 are the unit vectors (1,0...0) to (0,0...1)
note about the operator '.':
A.e1=xa1
...
X.e24=Xx24
There are some constraints on A,...X with these notations :
Xii is in {0,1}
and
Sum(Xai)=1 ... Sum(Xxi)=1
Sum(Xa1,xb1,...Xx1)=1 ... Sum(Xa24,Xb24,... Xx24)=1
Since one element can be assign to only one node.
I will define a graph by defining the neighbors relation of each node, lets say node 8 has neighbors node 7 and node 10
to check that A and B are neighbors on node 8 for exemple I nedd:
A.e8=1 and B.e7 or B.e10 =1 then I just need A.e8*(B.e7+B.e10)==1
in the function isNeighborInGraphs(A,B) I test that for every nodes and I get one or zero depending on the neighborhood.
Notations:
4 graphs of 6 nodes, the position of each element is defined by an integer from 1 to 24.
(1 to 6 for first graph, etc...)
e1... e24 are the unit vectors (1,0,0...0) to (0,0...1)
Let A, B ...X be the N elements.
A=(0,0...,1,...,0)=(xa1,xa2...xa24)
B=...
...
X=(0,0...,1,...,0)
Graph descriptions:
IsNeigborInGraphs(A,B)=A.e1*B.e2+...
//if 1 and 2 are neigbors in one graph
for exemple
State of the system:
L(A)=[B,B,C,E,G...] // list of
neigbors of A (can repeat)
actualise(L(A)):
for element in [B,X]
if IsNeigbotInGraphs(A,Element)
L(A).append(Element)
endIf
endfor
Objective functions
N(A)=len(L(A))+Sum(IsneigborInGraph(A,i),i in L(A))
...
N(X)= ...
Description of the algorithm
start with an initial position
A=e1... X=e24
Actualize L(A),L(B)... L(X)
Solve this (with a solveur, ampl for
exemple will work I guess since it's
a nonlinear optimization
problem):
Objective function
min(Sum(N(Z),Z=A to X)
Constraints:
Sum(Xai)=1 ... Sum(Xxi)=1
Sum(Xa1,xb1,...Xx1)=1 ...
Sum(Xa24,Xb24,... Xx24)=1
You get the best solution
4.Repeat step 2 and 3, 3 more times.

If all four graphs are K_6, then the best you can do is choose 4 set partitions of your 24 objects into 4 sets each of cardinality 6 so that the pairwise intersection of any two sets has cardinality at most 2. You can do this by choosing set partitions that are maximally far apart in the Hasse diagram of set partitions with partial order given by refinement. The general case is much harder, but perhaps you can still begin with this crude approximation of a solution and then be clever with which vertex is assigned which object in the four assignments.

Assuming you don't want to cycle all combinations and calculate the sum every time and choose the lowest, you can implement a minimum problem (solved depending on your constraints using either a linear programming solver i.e. symplex algorithm engines or a non-linear solver, much harder talking in terms of time) with constraints on your variables (24) depending on the shape of your path. You can also use free software like LINGO/LINDO to create rapidly a decision theory model and test its correctness (you need decision theory notions though)

If this has anything to do with the real world, then it's unlikely that you absolutely must have a solution that is the true minimum. Close to the minimum should be good enough, right? If so, you could repeatedly randomly make the 4 assignments and check the results until you either run out of time or have a good-enough solution or appear to have stopped improving your best solution.

Eliminating symmetry from graphs

I have an algorithmic problem in which I have derived a transfer matrix between a lot of states. The next step is to exponentiate it, but it is very large, so I need to do some reductions on it. Specifically it contains a lot of symmetry. Below are some examples on how many nodes can be eliminated by simple observations.
My question is whether there is an algorithm to efficiently eliminate symmetry in digraphs, similarly to the way I've done it manually below.
In all cases the initial vector has the same value for all nodes.
In the first example we see that b, c, d and e all receive values from a and one of each other. Hence they will always contain an identical value, and we can merge them.
In this example we quickly spot, that the graph is identical from the point of view of a, b, c and d. Also for their respective sidenodes, it doesn't matter to which inner node it is attached. Hence we can reduce the graph down to only two states.
Update: Some people were reasonable enough not quite sure what was meant by "State transfer matrix". The idea here is, that you can split a combinatorial problem up into a number of state types for each n in your recurrence. The matrix then tell you how to get from n-1 to n.
Usually you are only interested about the value of one of your states, but you need to calculate the others as well, so you can always get to the next level. In some cases however, multiple states are symmetrical, meaning they will always have the same value. Obviously it's quite a waste to calculate all of these, so we want to reduce the graph until all nodes are "unique".
Below is an example of the transfer matrix for the reduced graph in example 1.
[S_a(n)] [1 1 1] [S_a(n-1)]
[S_f(n)] = [1 0 0]*[S_f(n-1)]
[S_B(n)] [4 0 1] [S_B(n-1)]
Any suggestions or references to papers are appreciated.

Brendan McKay's nauty ( http://cs.anu.edu.au/~bdm/nauty/) is the best tool I know of for computing automorphisms of graphs. It may be too expensive to compute the whole automorphism group of your graph, but you might be able to reuse some of the algorithms described in McKay's paper "Practical Graph Isomorphism" (linked from the nauty page).

I'll just add an extra answer building on what userOVER9000 suggested, if anybody else are interested.
The below is an example of using nauty on Example 2, through the dreadnaut tool.
$ ./dreadnaut
Dreadnaut version 2.4 (64 bits).
> n=8 d g -- Starting a new 8-node digraph
0 : 1 3 4; -- Entering edge data
1 : 0 2 5;
2 : 3 1 6;
3 : 0 2 7;
4 : 0;
5 : 1;
6 : 2;
7 : 3;
> cx -- Calling nauty
(1 3)(5 7)
level 2: 6 orbits; 5 fixed; index 2
(0 1)(2 3)(4 5)(6 7)
level 1: 2 orbits; 4 fixed; index 4
2 orbits; grpsize=8; 2 gens; 6 nodes; maxlev=3
tctotal=8; canupdates=1; cpu time = 0.00 seconds
> o -- Output "orbits"
0:3; 4:7;
Notice it suggests joining nodes 0:3 which are a:d in Example 2 and 4:7 which are e:h.
The nauty algorithm is not well documented, but the authors describe it as exponential worst case, n^2 average.

Computing symmetries seems to be a bit of a second order problem. Taking just a,b,c and d in your second graph, the symmetry would have to be expressed
a(b,c,d) = b(a,d,c)
and all its permutations, or some such. Consider a second subgraph a', b', c', d' added to it. Again, we have the symmetries, but parameterised differently.
For computing people (rather than math people), could we express the problem like so?
Each graph node contains a set of letters. At each iteration, all of the letters in each node are copied to its neighbours by the arrows (some arrows take more than one iteration and can be treated as a pipe of anonymous nodes).
We are trying to find efficient ways of determining things such as
* what letters each set/node contains after N iterations.
* for each node the N after which its set no longer changes.
* what sets of nodes wind up containing the same sets of letters (equivalence class)
?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio