I like playing the puzzle game Flood-It, which can be played online at:
https://www.lemoda.net/javascript/flood-it/game.html
It's also available as an iGoogle gadget. The aim is to fill the whole board with the least number of successive flood-fills.
I'm trying to write a program which can solve this puzzle optimally. What's the best way to approach this problem? Ideally I want to use the A* algorithm, but I have no idea what should be the function estimating the number of steps left. I did write a program which conducted a depth-4 brute force search to maximize the filled area. It worked reasonably well and beat me in solving the puzzle, but I'm not completely satisfied with that algorithm.
Any suggestions? Thanks in advance.
As a heuristic, you could construct a graph where each node represents a set of contiguous, same-colour squares, and each node is connected to those it touches. (Each edge weighted as 1). You could then use a path-finding algorithm to calculate the "distance" from the top left to all other nodes. Then, by looking the results of flood-filling using each of the other 5 colours, determine which one minimizes the distance to the "furthest" node, since that will likely be your bottleneck.
Add the result of that calculation to the number of fills done so far, and use that as your A* heuristic.
A naive 'greedy' algorithm is to pick the next step that maximizes the overall perimeter of the main region.
(A couple of smart friends of mine were thinking about this the other day and decided the optimium may be NP-hard (e.g. you must brute force it) - I do not know if they're correct (wasn't around to hear the reasoning and haven't thought through it myself).)
Note that for computing steps, I presume the union-find algorithm is your friend, it makes computing 'one step' very fast (see e.g. this blog post).
After playing the game a few times, I noticed that a good strategy is to always go "deep", to go for the colour which goes farthest into the unflooded territory.
A* is just a prioritized graph search. Each node is a game state, you rank nodes based on some heuristic, and always expand the lowest-expected-final-cost node. As long as your heuristic doesn't underestimate costs, the first solution you find is guaranteed to be optimal.
After playing the games a few times, I found that trying to drill to the opposite corner then all corners tended to result in a win. So a good starting cost estimate would be (cost so far) + a sufficient number of fills to reach the opposite corner [note: not minimum, just sufficient. Just greedily fill towards the corner to compute the heuristic].
I have been working on this, and after I got my solver working I took a look at the approaches others had taken.
Most of the solvers out there are heuristic and do not guarantee optimality. Heuristics look at the number of squares and distribution of colors left unchosen, or the distance to the "farthest away" square. Combining a good heuristic with bounded DFS (or BFS with lookahead) results in solutions that are quite fast for the standard 14x14 grid.
I took a slightly different approach because I was interested in finding the provably optimal path, not just a 'good' one. I observed that the search space actually grows much slower than the branching factor of the search tree, because there are quite a lot of duplicate positions. (With a depth-first strategy it is therefore important to maintain a history to avoid a redundant work.) The effective branching factor seems closer to 3 than to 5.
The search strategy I took is to perform BFS up to a "midpoint" depth where the number of states would become infeasible, somewhere between 11 and 13 moves works best. Then, I examine each state at the midpoint depth and perform a new BFS starting with that as the root. Both of these BFS searches can be pruned by eliminating states found in previous depths, and the latter search can be bounded by the depth of the best-known solution. (A heuristic applied to the order of the subtrees examined in the second step would probably help some, as well.)
The other pruning technique which proved to be key to a fast solver is simply checking whether there are more than N colors left, if you are N or fewer steps away from the current best solution.
Once we know which midpoint state is on the path to an optimal solution, the program can perform DFS using that midpoint state as a goal (and pruning any path that selects a square not in the midpoint.) Or, it might be feasible to just build up the paths in the BFS steps, at the cost of some additional memory.
My solver is not super-fast but it can find a guaranteed optimal solution in no more than a couple minutes. (See http://markgritter.livejournal.com/673948.html, or the code at http://pastebin.com/ZcrS286b.)
Smashery's answer can be slightly tweaked. For the total number of moves estimate, if there are 'k' colors at maximum distance, add 'k-1' to the number of moves estimate.
More generally, for each color, consider the maximum distance at which the color can be cleared. This gives us a dictionary mapping some maximum distances to a non-zero number of colors that can be cleared at that distance. Sum value-1 across the keys and add this to the maximum distance to get a number of moves estimate.
Also, there are certain free cases. If at any point we can clear a color in one move, we can take that move without considering the other moves.
Here's an idea for implementing the graph to support Smashery's heuristic.
Represent each group of contiguous, same-colour squares in a disjoint set, and a list of adjacent groups of squares. A flood fill merges a set to all its adjacent sets, and merges the adjacency lists. This implicit graph structure will let you find the distance from the upper left corner to the farthest node.
I think you could consider the number of squares that match or don't match the current color. So, your heuristic measure of "distance" would be the number of squares on the board that are -not- the same color as your chosen color, rather than the number of steps.
A naive heuristic could be to use the number of colours left (minus 1) - this is admissible because it will take at least that many clicks to clear off the board.
I'm not certain, but I'm fairly sure that this could be solved greedily. You're trying to reduce the number of color fields to 1, so reducing more color fields earlier shouldn't be any less efficient than reducing fewer earlier.
1) Define a collection of existing like-colored groups.
2) For each collection, count the number of neighboring collections by color. The largest count of neighboring collections with a single color is the weight of this collection.
3) Take the collection with the highest count of neighbors with a single color, and fill it to that color. Merge the collections, and update the sort for all the collections affected by the merge (all the new neighbors of the merged collection).
Overall, I think this should actually compute in O(n log n) time, where n is the number of pixels and the log(n) only comes from maintaining the sorted list of weights.
I'm not sure if there needs to be a tie-breaker for when multiple fields have the same weight though. Maybe the tie-breaker goes to the color that's common to the most groups on the map.
Anyway, note that the goal of the game is to reduce the number of distinct color fields and not to maximize the perimeter, as different color schemes can occasionally make a larger field a sub-optimal choice. Consider the field:
3 3 3 3 3
1 1 1 1 1
1 1 1 1 1
2 2 2 2 2
1 2 2 2 2
The color 1 has the largest perimeter by any measure, but the color 2 is the optimal choice.
EDIT>
Scratch that. The example:
3 1 3 1 3
1 1 1 1 1
1 1 1 1 1
2 2 2 2 2
1 2 2 2 2
Invalidates my own greedy algorithm. But I'm not convinced that this is a simple graph traversal, since changing to a color shared by 2 neighbors visits 2 nodes, and not 1.
Color elimination should probably play some role in the heuristic.
1) It is never correct to fill with a color that is not already on the graph.
2) If there is one color field with a unique color, at least one fill will be required for it. It cannot be bundled with any other fills. I think this means that it's safe to fill it sooner rather than later.
3) The greedy algorithm for neighbor field count makes sense for a 2 color map.
Related
I'm writing a program that needs to quickly check whether a contiguous region of space is fillable by tetrominoes (any type, any orientation). My first attempt was to simply check if the number of squares was divisible by 4. However, situations like this can still come up:
As you can see, even though these regions have 8 squares each, they are impossible to tile with tetrominoes.
I've been thinking for a bit and I'm not sure how to proceed. It seems to me that the "hub" squares, or squares that lead to more than two "tunnels", are the key to this. It's easy in the above examples, since you can quickly count the spaces in each such tunnel — 3, 1, and 3 in the first example, and 3, 1, 1, and 2 in the second — and determine that it's impossible to proceed due to the fact that each tunnel needs to connect to the hub square to fit a tetromino, which can't happen for all of them. However, you can have more complicated examples like this:
...where a simple counting technique just doesn't work. (At least, as far as I can tell.) And that's to say nothing of more open spaces with a very small number of hub squares. Plus, I don't have any proof that hub squares are the only trick here. For all I know, there may be tons of other impossible cases.
Is some sort of search algorithm (A*?) the best option for solving this? I'm very concerned about performance with hundreds, or even thousands, of squares. The algorithm needs to be very efficient, since it'll be used for real-time tiling (more or less), and in a browser at that.
Perfect matching on a perfect matching
[EDIT 28/10/2014: As noticed by pix, this approach never tries to use T-tetrominoes, so it is even more likely to give an incorrect "No" answer than I thought...]
This will not guarantee a solution on an arbitrary shape, but it will work quickly and well most of the time.
Imagine a graph in which there is a vertex for each white square, and an edge between two vertices if and only if their corresponding white squares are adjacent. (Each vertex can therefore touch at most 4 edges.) A perfect matching in this graph is a subset of edges such that every vertex touches exactly one edge in the subset. In other words, it is a way of pairing up adjacent vertices -- or in yet other words, a domino tiling of the white squares. Later I'll explain how to find a nicely random-looking perfect matching; for now, let's just assume that it can be done.
Then, starting from this domino tiling, we can just repeat the matching process, gluing dominos together into tetrominos! The only differences the second time around are that instead of having a vertex per white square, we have a vertex per domino; and because we must add an edge whenever two dominos are adjacent, a vertex can now have as many as 6 edges.
The first step (domino tiling) step cannot fail: if a domino tiling for the given shape exists, then one will be found. However, it is possible for the second step (gluing dominos together into tetrominos) to fail, because it has to work with the already-decided domino tiling, and this may limit its options. Here is an example showing how different domino tilings of the same shape can enable or spoil the tetromino tiling:
AABCDD --> XXXYYY Success :)
BC XY
AABBDD --> Failure.
CC
Solving the maximum matching problems
In order to generate a random pattern of dominos, the edges in the graph can be given random weights, and the maximum weighted matching problem can be solved. The weights should be in the range [1, V/(V-2)), to guarantee that it is never possible to achieve a higher score by leaving some vertices unpaired. The graph is in fact bipartite as it contains no odd-length cycles, meaning that the faster O(V^2*E) algorithm for the maximum weighted bipartite matching problem can be used for this step. (This is not true for the second matching problem: one domino can touch two other dominos that touch each other.)
If the second step fails to find a complete set of tetrominos, then either no solution is possible*, or a solution is possible using a different set of dominos. You can try randomly reweighting the graph used to find the domino tiling, and then rerunning the first step. Alternatively, instead of completely reweighting it from scratch, you could just increase the weights for the problematic dominos, and try again.
* For a plain square with even side lengths, we know that a solution is always possible: just fill it with 2x2 square tetrominos.
here is the video of game I am interested in.
http://www.youtube.com/watch?v=UhWeLmSf6pA
I would like to know the algorithm that is used to make a pattern for challenge as shown in video.
Can anyone tell me, which algorithm i should use to make a clone of this game in windows.
Thanks
This is basically a Hamiltonian path problem. If someone can find a general solution better than bruteforce, I would be interested.
The board can be translated to a graph. Every piece is a node, the connections are edges. There are many graph packages and algorithms to find a Hamiltonian path. You can easily create your own model and recursive or iterative solution.
The upper bound for the number of solution search can be guesses: We have 4 nodes with 2 edges, makes only 1 choice and a factor of 1^2. 8 nodes with 3 edges, makes 2 choices and a factor of 2^8. And 4 nodes with 4 edges, makes 3 choices and a factor of 3^4. The starting point has one option more, makes a factor of 2. For sure, we have 16 different starting points. In total, the upper bound would be 1^2*2^8*3^4*2*16 = 663.552. The set of solutions would be smaller, because we will have dead ends.
For this specific problem, we can even reduce it a little bit more, because we need only three starting points, (0,0), (0,1) and (1,1). If we have all solutions for this three points, we can use a function to generate mirrored and rotated solutions. Makes an upper bound of 124.416.
After we have all solutions, we can place the 2 and 3 points somewhere in between each solution. We can even create a function to guess the "hardness" of a solution by counting all possibilities to have the 2 and 3 at the same node and same place.
If we just want to create different puzzles, a backtracking with random directions would be totally fine. Easy to implement and fast running time should be expected.
how do I select a subset of points at a regular density? More formally,
Given
a set A of irregularly spaced points,
a metric of distance dist (e.g., Euclidean distance),
and a target density d,
how can I select a smallest subset B that satisfies below?
for every point x in A,
there exists a point y in B
which satisfies dist(x,y) <= d
My current best shot is to
start with A itself
pick out the closest (or just particularly close) couple of points
randomly exclude one of them
repeat as long as the condition holds
and repeat the whole procedure for best luck. But are there better ways?
I'm trying to do this with 280,000 18-D points, but my question is in general strategy. So I also wish to know how to do it with 2-D points. And I don't really need a guarantee of a smallest subset. Any useful method is welcome. Thank you.
bottom-up method
select a random point
select among unselected y for which min(d(x,y) for x in selected) is largest
keep going!
I'll call it bottom-up and the one I originally posted top-down. This is much faster in the beginning, so for sparse sampling this should be better?
performance measure
If guarantee of optimality is not required, I think these two indicators could be useful:
radius of coverage: max {y in unselected} min(d(x,y) for x in selected)
radius of economy: min {y in selected != x} min(d(x,y) for x in selected)
RC is minimum allowed d, and there is no absolute inequality between these two. But RC <= RE is more desirable.
my little methods
For a little demonstration of that "performance measure," I generated 256 2-D points distributed uniformly or by standard normal distribution. Then I tried my top-down and bottom-up methods with them. And this is what I got:
RC is red, RE is blue. X axis is number of selected points. Did you think bottom-up could be as good? I thought so watching the animation, but it seems top-down is significantly better (look at the sparse region). Nevertheless, not too horrible given that it's much faster.
Here I packed everything.
http://www.filehosting.org/file/details/352267/density_sampling.tar.gz
You can model your problem with graphs, assume points as nodes, and connect two nodes with edge if their distance is smaller than d, Now you should find the minimum number of vertex such that they are with their connected vertices cover all nodes of graph, this is minimum vertex cover problem (which is NP-Hard in general), but you can use fast 2-approximation : repeatedly taking both endpoints of an edge into the vertex cover, then removing them from the graph.
P.S: sure you should select nodes which are fully disconnected from the graph, After removing this nodes (means selecting them), your problem is vertex cover.
A genetic algorithm may probably produce good results here.
update:
I have been playing a little with this problem and these are my findings:
A simple method (call it random-selection) to obtain a set of points fulfilling the stated condition is as follows:
start with B empty
select a random point x from A and place it in B
remove from A every point y such that dist(x, y) < d
while A is not empty go to 2
A kd-tree can be used to perform the look ups in step 3 relatively fast.
The experiments I have run in 2D show that the subsets generated are approximately half the size of the ones generated by your top-down approach.
Then I have used this random-selection algorithm to seed a genetic algorithm that resulted in a further 25% reduction on the size of the subsets.
For mutation, giving a chromosome representing a subset B, I randomly choose an hyperball inside the minimal axis-aligned hyperbox that covers all the points in A. Then, I remove from B all the points that are also in the hyperball and use the random-selection to complete it again.
For crossover I employ a similar approach, using a random hyperball to divide the mother and father chromosomes.
I have implemented everything in Perl using my wrapper for the GAUL library (GAUL can be obtained from here.
The script is here: https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_density.pl
It accepts a list of n-dimensional points from stdin and generates a collection of pictures showing the best solution for every iteration of the genetic algorithm. The companion script https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_gen.pl can be used to generate the random points with a uniform distribution.
Here is a proposal which makes an assumption of Manhattan distance metric:
Divide up the entire space into a grid of granularity d. Formally: partition A so that points (x1,...,xn) and (y1,...,yn) are in the same partition exactly when (floor(x1/d),...,floor(xn/d))=(floor(y1/d),...,floor(yn/d)).
Pick one point (arbitrarily) from each grid space -- that is, choose a representative from each set in the partition created in step 1. Don't worry if some grid spaces are empty! Simply don't choose a representative for this space.
Actually, the implementation won't have to do any real work to do step one, and step two can be done in one pass through the points, using a hash of the partition identifier (the (floor(x1/d),...,floor(xn/d))) to check whether we have already chosen a representative for a particular grid space, so this can be very, very fast.
Some other distance metrics may be able to use an adapted approach. For example, the Euclidean metric could use d/sqrt(n)-size grids. In this case, you might want to add a post-processing step that tries to reduce the cover a bit (since the grids described above are no longer exactly radius-d balls -- the balls overlap neighboring grids a bit), but I'm not sure how that part would look.
To be lazy, this can be casted to a set cover problem, which can be handled by mixed-integer problem solver/optimizers. Here is a GNU MathProg model for the GLPK LP/MIP solver. Here C denotes which point can "satisfy" each point.
param N, integer, > 0;
set C{1..N};
var x{i in 1..N}, binary;
s.t. cover{i in 1..N}: sum{j in C[i]} x[j] >= 1;
minimize goal: sum{i in 1..N} x[i];
With normally distributed 1000 points, it didn't find the optimum subset in 4 minutes, but it said it knew the true minimum and it selected only one more point.
I have a number of points on a relatively small 2-dimensional grid, which wraps around in both dimensions. The coordinates can only be integers. I need to divide them into sets of at most N points that are close together, where N will be quite a small cut-off, I suspect 10 at most.
I'm designing an AI for a game, and I'm 99% certain using minimax on all the game pieces will give me a usable lookahead of about 1 move, if that. However distant game pieces should be unable to affect each other until we're looking ahead by a large number of moves, so I want to partition the game into a number of sub-games of N pieces at a time. However, I need to ensure I select a reasonable N pieces at a time, i.e. ones that are close together.
I don't care whether outliers are left on their own or lumped in with their least-distant cluster. Breaking up natural clusters larger than N is inevitable, and only needs to be sort-of reasonable. Because this is used in a game AI with limited response time, I'm looking for as fast an algorithm as possible, and willing to trade off accuracy for performance.
Does anyone have any suggestions for algorithms to look at adapting? K-means and relatives don't seem appropriate, as I don't know how many clusters I want to find but I have a bound on how large clusters I want. I've seen some evidence that approximating a solution by snapping points to a grid can help some clustering algorithms, so I'm hoping the integer coordinates makes the problem easier. Hierarchical distance-based clustering will be easy to adapt to the wrap-around coordinates, as I just plug in a different distance function, and also relatively easy to cap the size of the clusters. Are there any other ideas I should be looking at?
I'm more interested in algorithms than libraries, though libraries with good documentation of how they work would be welcome.
EDIT: I originally asked this question when I was working on an entry for the Fall 2011 AI Challenge, which I sadly never got finished. The page I linked to has a reasonably short reasonably high-level description of the game.
The two key points are:
Each player has a potentially large number of ants
Every ant is given orders every turn, moving 1 square either north, south, east or west; this means the branching factor of the game is O(4ants).
In the contest there were also strict time constraints on each bot's turn. I had thought to approach the game by using minimax (the turns are really simultaneous, but as a heuristic I thought it would be okay), but I feared there wouldn't be time to look ahead very many moves if I considered the whole game at once. But as each ant moves only one square each turn, two ants cannot N spaces apart by the shortest route possibly interfere with one another until we're looking ahead N/2 moves.
So the solution I was searching for was a good way to pick smaller groups of ants at a time and minimax each group separately. I had hoped this would allow me to search deeper into the move-tree without losing much accuracy. But obviously there's no point using a very expensive clustering algorithm as a time-saving heuristic!
I'm still interested in the answer to this question, though more in what I can learn from the techniques than for this particular contest, since it's over! Thanks for all the answers so far.
The median-cut algorithm is very simple to implement in 2D and would work well here. Your outliers would end up as groups of 1 which you could discard or whatever.
Further explanation requested:
Median cut is a quantization algorithm but all quantization algorithms are special case clustering algorithms. In this case the algorithm is extremely simple: find the smallest bounding box containing all points, split the box along its longest side (and shrink it to fit the points), repeat until the target amount of boxes is achieved.
A more detailed description and coded example
Wiki on color quantization has some good visuals and links
Since you are writing a game where (I assume) only a constant number of pieces move between each clusering, you can take advantage of a Online algorithm to get consant update times.
The property of not locking yourself to a number of clusters is called Nonstationary, I believe.
This paper seams to have a good algorithm with both of the above two properties: Improving the Robustness of 'Online Agglomerative Clustering Method' Based on Kernel-Induce Distance Measures (You might be able to find it elsewhere as well).
Here is a nice video showing the algorithm in works:
Construct a graph G=(V, E) over your grid, and partition it.
Since you are interested in algorithms rather than libraries, here is a recent paper:
Daniel Delling, Andrew V. Goldberg, Ilya Razenshteyn, and Renato F. Werneck. Graph Partitioning with Natural Cuts. In 25th International Parallel and Distributed Processing Symposium (IPDPS’11). IEEE Computer
Society, 2011. [PDF]
From the text:
The goal of the graph partitioning problem is to find a minimum-cost partition P such that the size of each cell is bounded by U.
So you will set U=10.
You can calculate a minimum spanning tree and remove the longest edges. Then you can calculate the k-means. Remove another long edge and calculate the k-means. Rinse and repeat until you have N=10. I believe this algorithm is named single-link k-means and the cluster are similar to voronoi diagrams:
"The single-link k-clustering algorithm ... is precisely Kruskal's algorithm ... equivalent to finding an MST and deleting the k-1 most expensive edges."
See for example here: https://stats.stackexchange.com/questions/1475/visualization-software-for-clustering
Consider the case where you only want two clusters. If you run k-means, then you will get two points, and the division between the two clusters is a plane orthogonal to the line between the centres of the two clusters. You can find out which cluster a point is in by projecting it down to the line and then comparing its position on the line with a threshold (e.g. take the dot product between the line and a vector from either of the two cluster centres and the point).
For two clusters, this means that you can adjust the sizes of the clusters by moving the threshold. You can sort the points on their distance along the line connecting the two cluster centres and then move the threshold along the line quite easily, trading off the inequality of the split with how neat the clusters are.
You probably don't have k=2, but you can run this hierarchically, by dividing into two clusters, and then sub-dividing the clusters.
(After comment)
I'm not good with pictures, but here is some relevant algebra.
With k-means we divide points according to their distance from cluster centres, so for a point Xi and two centres Ai and Bi we might be interested in
SUM_i (Xi - Ai)^2 - SUM_i(Xi - Bi)^2
This is SUM_i Ai^2 - SUM_i Bi^2 + 2 SUM_i (Bi - Ai)Xi
So a point gets assigned to either cluster depending on the sign of K + 2(B - A).X - a constant plus the dot product between the vector to the point and the vector joining the two cluster circles. In two dimensions, the dividing line between the points on the plane that end up in one cluster and the points on the plane that end up in the other cluster is a line perpendicular to the line between the two cluster centres. What I am suggesting is that, in order to control the number of points after your division, you compute (B - A).X for each point X and then choose a threshold that divides all points in one cluster from all points in the other cluster. This amounts to sliding the dividing line up or down the line between the two cluster centres, while keeping it perpendicular to the line between them.
Once you have dot products Yi, where Yi = SUM_j (Bj - Aj) Xij, a measure of how closely grouped a cluster is is SUM_i (Yi - Ym)^2, where Ym is the mean of the Yi in the cluster. I am suggesting that you use the sum of these values for the two clusters to tell how good a split you have. To move a point into or out of a cluster and get the new sum of squares without recomputing everything from scratch, note that SUM_i (Si + T)^2 = SUM_i Si^2 + 2T SUM_i Si + T^2, so if you keep track of sums and sums of squares you can work out what happens to a sum of squares when you add or subtract a value to every component, as the mean of the cluster changes when you add or remove a point to it.
I'm in the process of creating a game where the user will be presented with 2 sets of colored tiles. In order to ensure that the puzzle is solvable, I start with one set, copy it to a second set, then swap tiles from one set to another. Currently, (and this is where my issue lies) the number of swaps is determined by the level the user is playing - 1 swap for level 1, 2 swaps for level 2, etc. This same number of swaps is used as a goal in the game. The user must complete the puzzle by swapping a tile from one set to the other to make the 2 sets match (by color). The order of the tiles in the (user) solved puzzle doesn't matter as long as the 2 sets match.
The problem I have is that as the number of swaps I used to generate the puzzle approaches the number of tiles in each set, the puzzle becomes easier to solve. Basically, you can just drag from one set in whatever order you need for the second set and solve the puzzle with plenty of moves left. What I am looking to do is after I finish building the puzzle, calculate the minimum number of moves required to solve the puzzle. Again, this is almost always less than the number of swaps used to create the puzzle, especially as the number of swaps approaches the number of tiles in each set.
My goal is to calculate the best case scenario and then give the user a "fudge factor" (i.e. 1.2 times the minimum number of moves). Solving the puzzle in under this number of moves will result in passing the level.
A little background as to how I currently have the game configured:
Levels 1 to 10: 9 tiles in each set. 5 different color tiles.
Levels 11 to 20: 12 tiles in each set. 7 different color tiles.
Levels 21 to 25: 15 tiles in each set. 10 different color tiles.
Swapping within a set is not allowed.
For each level, there will be at least 2 tiles of a given color (one for each set in the solved puzzle).
Is there any type of algorithm anyone could recommend to calculate the minimum number of moves to solve a given puzzle?
The minimum moves to solve a puzzle is essentially the shortest path from that unsolved state to a solved state. Your game implicitly defines a graph where the vertices are legal states, and there's an edge between two states if there's a legal move that enables that transition.
Depending on the size of your search space, a simple breadth-first search would be feasible, and would give you the minimum number of steps to reach any given state. In fact, you can generate the problems this way too: instead of making random moves to arrive at a state and checking its "distance" from the initial state, simply explore the search space in breadth-first/level-order, and pick a state at a given "distance" for your puzzle.
Related questions
Rush Hour - Solving the Game
BFS is used to solve Rush Hour, with source code in Java
Alternative
IF the search space is too huge for BFS (and I'm not yet convinced that it is), you can use iterative deepening depth-first search instead. It's space-efficient like DFS, but (cummulatively) level-order like BFS. Even though nodes would be visited many times, it is still asymptotically identical to BFS, but requiring much leser space.
I didn't quite understand the puzzle from your description, but two general ideas often useful in solving that kind of puzzles are backtracking and branch and bound.
The A* search algorithm. The idea is that you have some measure of how close a position is to the solution. A* is then a "best first" search in the sense that at each step it considers moves from the best position found so far. It's up to you to come up with some kind of measure of how close you are to a solution. (It doesn't have to be accurate, it's just a heuristic to guide the search.) In practice it often performs much better than a pure breadth first search because it's always guided by your closeness scoring function. But without understanding your problem description, it's hard to say. (A rule of thumb is that if there's a sense of "making progress" while doing a puzzle, rather than it all suddenly coming together at the end, then A* is a good choice.)